IPP-Bulgarian Academy of Science (IPP-BAS), Bulgaria
Linguistic Modelling Laboratory, Institute for Parallel Processing, Bulgarian Academy of Sciences, Bultreebank Group. The BulTreeBank team is a part of the Linguistic Modelling Laboratory. The members of the team have been working on several national and international projects within the area of computational corpus linguistics and ontology management. The BulTreeBank team is currently involved in two European projects: LT4eL (http://www.let.uu.nl/lt4el/) and AsIsKnown (http://www.asisknown.org/). In both projects our main responsibility is to create and maintain a domain ontology. The LT4eL project aims at using multilingual language technology tools and semantic web techniques for improving the retrieval of learning material. The team is a leader of WP3 “Enhancing eLearning with semantic knowledge”. We have created an ontology in the domain of Information Technology for non-specialists. The AsIsKnown project aims at creating a common knowledge base and thus reinforcing the European textiles industry, which is characterized by small and medium-sized enterprises. Our team is a leader of two work packages – WP3 “Common sense ontology engineering” and WP5 “Multimedia ontology”. The research team also is currently working on the extension of the HPSG-based treebank of Bulgarian with semantic information and on parser development. The project is funded by the German Volkswagen Foundation (2005-2007). One of the main goals of the team is to build, exploit and disseminate language resources for Bulgarian as well as software for supporting the creation and usage of the language resources. The current set of language resources of the team includes: (1) Bulgarian morphological lexicon of about 100 000 lexemes; (2) Partial syntactic grammar of Bulgarian; (3) Named Entity recognition module; (4) Text Archive. It consists of about 72 million running words.(15% fiction, 78% newspapers and 7% legal texts, government bulletins and other genres); (5) Linguistically interpreted corpus. This is a balanced corpus annotated at morphosyntactic level; (6) HPSG-based treebank of Bulgarian. At the moment we have about 15 000 sentences with constituent structures, dependency information and co-reference relations; (7) Semantic lexicon for Bulgarian (under creation). At the moment we have all the meanings with their definitions. We have started providing also synonymic relations and other relations, such as is-a, part-of etc. The team is further developing an XML-based system, called CLaRK. The current version of the system is available on the web: http:www.bultreebank.org/clark/index.html. The main aim behind the design of the system is the minimization of human intervention during the creation of language resources. It incorporates several technologies: XML technology; Unicode; Regular Cascaded Grammars; Constraints over XML Documents. Up to now the system has been downloaded by nearly 600 people and to our knowledge it is being used mainly for the creation of corpora for less processed languages and for named entity recognition. The Linguistic Modelling Laboratory, a department of the Institute for Parallel Processing (IPP) of the Bulgarian Academy of Sciences (BAS), provides a research environment for interdisciplinary investigation in the area of the computational linguistics and knowledge representation. Since its establishment in 1987, the LML has hosted a number of projects dealing with the application of knowledge representation to natural language processing from two different perspectives - using represented knowledge for tasks of semantic analysis and exploring methods for representation and acquisition of linguistic knowledge itself. The researchers at the LML have experience with knowledge representation languages and systems that support them like Conceptual Graphs, Description Logics and Typed Features Logics. The IPP was recognised as a Centre of Excellence in Information Technology (CEIT), financially supported by the EC under the Fifth and Sixth Framework Programme. The BulTreeBank team is involved in a work package within the Centre. The main role of the IPP-BAS team in the project is to contribute to WP4 “Positioning the learner” and WP6 “Supporting social and informal learning”. Moreover, IPP-BAS will contribute to the scenario activities (WP3) and play an important role in WP7 “Validation”.
NEWS
2008-12-19
TENCompetence Winterschool 2009
TENCompetence Winterschool 2009, Feb 1-6, Innsbruck, Austria Start: 1 Feb 2009 - 20:00 End: 6 Feb 2009 - 13:00