homepage
partners
workplan
results
events
examples
blog
links
contact

Home : IPP-Bulgarian Academy of Science

IPP-Bulgarian Academy of Science (IPP-BAS), Bulgaria

Linguistic Modelling Laboratory, Institute for Parallel Processing, Bulgarian Academy of Sciences,
Bultreebank Group. The BulTreeBank team is a part of the Linguistic Modelling Laboratory. The
members of the team have been working on several national and international projects within the area
of computational corpus linguistics and ontology management.
The BulTreeBank team is currently involved in two European projects: LT4eL
(http://www.let.uu.nl/lt4el/) and AsIsKnown (http://www.asisknown.org/). In both projects our main
responsibility is to create and maintain a domain ontology. The LT4eL project aims at using
multilingual language technology tools and semantic web techniques for improving the retrieval of
learning material. The team is a leader of WP3 “Enhancing eLearning with semantic knowledge”. We
have created an ontology in the domain of Information Technology for non-specialists. The
AsIsKnown project aims at creating a common knowledge base and thus reinforcing the European
textiles industry, which is characterized by small and medium-sized enterprises. Our team is a leader
of two work packages – WP3 “Common sense ontology engineering” and WP5 “Multimedia
ontology”.
The research team also is currently working on the extension of the HPSG-based treebank of
Bulgarian with semantic information and on parser development. The project is funded by the German
Volkswagen Foundation (2005-2007). One of the main goals of the team is to build, exploit and
disseminate language resources for Bulgarian as well as software for supporting the creation and usage
of the language resources. The current set of language resources of the team includes: (1) Bulgarian
morphological lexicon of about 100 000 lexemes; (2) Partial syntactic grammar of Bulgarian; (3)
Named Entity recognition module; (4) Text Archive. It consists of about 72 million running
words.(15% fiction, 78% newspapers and 7% legal texts, government bulletins and other genres); (5)
Linguistically interpreted corpus. This is a balanced corpus annotated at morphosyntactic level; (6)
HPSG-based treebank of Bulgarian. At the moment we have about 15 000 sentences with constituent
structures, dependency information and co-reference relations; (7) Semantic lexicon for Bulgarian
(under creation). At the moment we have all the meanings with their definitions. We have started
providing also synonymic relations and other relations, such as is-a, part-of etc.
The team is further developing an XML-based system, called CLaRK. The current version of the
system is available on the web: http:www.bultreebank.org/clark/index.html. The main aim behind the
design of the system is the minimization of human intervention during the creation of language
resources. It incorporates several technologies: XML technology; Unicode; Regular Cascaded
Grammars; Constraints over XML Documents. Up to now the system has been downloaded by nearly
600 people and to our knowledge it is being used mainly for the creation of corpora for less processed
languages and for named entity recognition.
The Linguistic Modelling Laboratory, a department of the Institute for Parallel Processing (IPP) of the
Bulgarian Academy of Sciences (BAS), provides a research environment for interdisciplinary
investigation in the area of the computational linguistics and knowledge representation. Since its
establishment in 1987, the LML has hosted a number of projects dealing with the application of
knowledge representation to natural language processing from two different perspectives - using
represented knowledge for tasks of semantic analysis and exploring methods for representation and
acquisition of linguistic knowledge itself. The researchers at the LML have experience with
knowledge representation languages and systems that support them like Conceptual Graphs,
Description Logics and Typed Features Logics.
The IPP was recognised as a Centre of Excellence in Information Technology (CEIT), financially
supported by the EC under the Fifth and Sixth Framework Programme. The BulTreeBank team is
involved in a work package within the Centre.
The main role of the IPP-BAS team in the project is to contribute to WP4 “Positioning the learner”
and WP6 “Supporting social and informal learning”. Moreover, IPP-BAS will contribute to the
scenario activities (WP3) and play an important role in WP7 “Validation”.

 

NEWS

2008-12-19

TENCompetence Winterschool 2009

TENCompetence Winterschool 2009, Feb 1-6, Innsbruck, Austria
Start: 1 Feb 2009 - 20:00
End: 6 Feb 2009 - 13:00

 

TOPICS

Personal Competence Development
<...
Read more