50 research outputs found
Using WordNet for Building WordNets
This paper summarises a set of methodologies and techniques for the fast
construction of multilingual WordNets. The English WordNet is used in this
approach as a backbone for Catalan and Spanish WordNets and as a lexical
knowledge resource for several subtasks.Comment: 8 pages, postscript file. In workshop on Usage of WordNet in NL
Combining Multiple Methods for the Automatic Construction of Multilingual WordNets
This paper explores the automatic construction of a multilingual Lexical
Knowledge Base from preexisting lexical resources. First, a set of automatic
and complementary techniques for linking Spanish words collected from
monolingual and bilingual MRDs to English WordNet synsets are described.
Second, we show how resulting data provided by each method is then combined to
produce a preliminary version of a Spanish WordNet with an accuracy over 85%.
The application of these combinations results on an increment of the extracted
connexions of a 40% without losing accuracy. Both coarse-grained (class level)
and fine-grained (synset assignment level) confidence ratios are used and
evaluated. Finally, the results for the whole process are presented.Comment: 7 pages, 4 postscript figure
Lexical and Grammar Resource Engineering for Runyankore & Rukiga: A Symbolic Approach
Current research in computational linguistics and natural language processing (NLP) requires the existence of language resources. Whereas these resources are available for a few well-resourced languages, there are many languages that have been neglected. Among the neglected and / or under-resourced languages are Runyankore and Rukiga (henceforth referred to as Ry/Rk). Recently, the NLP community has started to acknowledge that resources for under-resourced languages should also be given priority. Why? One reason being that as far as language typology is concerned, the few well-resourced languages do not represent the structural diversity of the remaining languages. The central focus of this thesis is about enabling the computational analysis and generation of utterances in Ry/Rk. Ry/Rk are two closely related languages spoken by about 3.4 and 2.4 million people respectively. They belong to the Nyoro-Ganda (JE10) language zone of the Great Lakes, Narrow Bantu of the Niger-Congo language family.The computational processing of these languages is achieved by formalising the grammars of these two languages using Grammatical Framework (GF) and its Resource Grammar Library (RGL). In addition to the grammar, a general-purpose computational lexicon for the two languages is developed. Although we utilise the lexicon to tremendously increase the lexical coverage of the grammars, the lexicon can be used for other NLP tasks.In this thesis a symbolic / rule-based approach is taken because the lack of adequate languages resources makes the use of data-driven NLP approaches unsuitable for these languages
Cornetto: A Combinatorial Lexical Semantic Database for Dutch
One of the goals of the STEVIN programme is the realisation of a digital infrastructure that will enforce the position of the Dutch language in the modern information and communication technology.A semantic database makes it possible to go from words to concepts and consequently, to develop technologies that access and use knowledge rather than textual representations