387 research outputs found
Using WordNet for Building WordNets
This paper summarises a set of methodologies and techniques for the fast
construction of multilingual WordNets. The English WordNet is used in this
approach as a backbone for Catalan and Spanish WordNets and as a lexical
knowledge resource for several subtasks.Comment: 8 pages, postscript file. In workshop on Usage of WordNet in NL
Towards a Universal Wordnet by Learning from Combined Evidenc
Lexical databases are invaluable sources of knowledge about words and their meanings, with numerous applications in areas like NLP, IR, and AI. We propose a methodology for the automatic construction of a large-scale multilingual lexical database where words of many languages are hierarchically organized in terms of their meanings and their semantic relations to other words. This resource is bootstrapped from WordNet, a well-known English-language resource. Our approach extends WordNet with around 1.5 million meaning links for 800,000 words in over 200 languages, drawing on evidence extracted from a variety of resources including existing (monolingual) wordnets, (mostly bilingual) translation dictionaries, and parallel corpora. Graph-based scoring functions and statistical learning techniques are used to iteratively integrate this information and build an output graph. Experiments show that this wordnet has a high level of precision and coverage, and that it can be useful in applied tasks such as cross-lingual text classification
Combining Multiple Methods for the Automatic Construction of Multilingual WordNets
This paper explores the automatic construction of a multilingual Lexical
Knowledge Base from preexisting lexical resources. First, a set of automatic
and complementary techniques for linking Spanish words collected from
monolingual and bilingual MRDs to English WordNet synsets are described.
Second, we show how resulting data provided by each method is then combined to
produce a preliminary version of a Spanish WordNet with an accuracy over 85%.
The application of these combinations results on an increment of the extracted
connexions of a 40% without losing accuracy. Both coarse-grained (class level)
and fine-grained (synset assignment level) confidence ratios are used and
evaluated. Finally, the results for the whole process are presented.Comment: 7 pages, 4 postscript figure
Linking a domain thesaurus to WordNet and conversion to WordNet-LMF
We present a methodology to link domain
thesauri to general-domain lexica. This is
applied in the framework of the KYOTO
project to link the Species2000 thesaurus
to the synsets of the English WordNet.
Moreover, we study the formalisation of
this thesaurus according to the ISO LMF
standard and its dialect WordNet-LMF.
This conversion will allow Species2000
to communicate with the other resources
available in the KYOTO architecture.Peer ReviewedPostprint (published version
Towards a universal index of meaning
The Inter-Lingual-Index (ILI) in the EuroWordNet
architecture is an initially unstructured fund of concepts which functions as the link between the various language wordnets.The ILI concepts originate from WordNet1.5, and have been restructured on the basis of aspects of the internal structure of Word-Net,links between WordNet and other resources,and multilingual mapping between the wordnets.
This leads to a differentiation of the status of ILI concepts,a reduction of the Wordnet polysemy,and a greater connectivity between the wordnets. The restructured ILI represents the first step towards a
standardized set of word meanings,is a working platform for further development and testing,and can be put to use in NLP tasks such as (multilingual)information retrieval
- …