Search CORE

257 research outputs found

Towards a Universal Wordnet by Learning from Combined Evidenc

Author: de Melo G.
Weikum G.
Publication venue: Max-Planck-Institut für Informatik
Publication date: 01/01/2009
Field of study

Lexical databases are invaluable sources of knowledge about words and their meanings, with numerous applications in areas like NLP, IR, and AI. We propose a methodology for the automatic construction of a large-scale multilingual lexical database where words of many languages are hierarchically organized in terms of their meanings and their semantic relations to other words. This resource is bootstrapped from WordNet, a well-known English-language resource. Our approach extends WordNet with around 1.5 million meaning links for 800,000 words in over 200 languages, drawing on evidence extracted from a variety of resources including existing (monolingual) wordnets, (mostly bilingual) translation dictionaries, and parallel corpora. Graph-based scoring functions and statistical learning techniques are used to iteratively integrate this information and build an output graph. Experiments show that this wordnet has a high level of precision and coverage, and that it can be useful in applied tasks such as cross-lingual text classification

MPG.PuRe

Examining the validity of cross-lingual word sense disambiguation

Author: Hoste Veronique
Lefever Els
Publication venue
Publication date: 01/01/2011
Field of study

Ghent University Academic Bibliography

Data-driven Synset Induction and Disambiguation for Wordnet Development

Author: Apidianaki Marianna
Sagot Benoît
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/11/2014
Field of study

International audienceAutomatic methods for wordnet development in languages other than English generally exploit information found in Princeton WordNet (PWN) and translations extracted from parallel corpora. A common approach consists in preserving the structure of PWN and transferring its content in new languages using alignments, possibly combined with information extracted from multilingual semantic resources. Even if the role of PWN remains central in this process, these automatic methods offer an alternative to the manual elaboration of new wordnets. However, their limited coverage has a strong impact on that of the resulting resources. Following this line of research, we apply a cross-lingual word sense disambiguation method to wordnet development. Our approach exploits the output of a data-driven sense induction method that generates sense clusters in new languages, similar to wordnet synsets, by identifying word senses and relations in parallel corpora. We apply our cross-lingual word sense disambiguation method to the task of enriching a French wordnet resource, the WOLF, and show how it can be efficiently used for increasing its coverage. Although our experiments involve the English-French language pair, the proposed methodology is general enough to be applied to the development of wordnet resources in other languages for which parallel corpora are available. Finally, we show how the disambiguation output can serve to reduce the granularity of new wordnets and the degree of polysemy present in PWN

Crossref

INRIA a CCSD electronic archive server

Hal-Diderot

Extending the Galician Wordnet Using a Multilingual Bible Through Lexical Alignment and Semantic Annotation

Author
Publication venue: OASIcs - OpenAccess Series in Informatics. 7th Symposium on Languages, Applications and Technologies (SLATE 2018)
Publication date: 01/01/2018
Field of study

In this paper we describe the methodology and evaluation of the expansion of Galnet - the Galician wordnet - using a multilingual Bible through lexical alignment and semantic annotation. For this experiment we used the Galician, Portuguese, Spanish, Catalan and English versions of the Bible. They were annotated with part-of-speech and WordNet sense using FreeLing. The resulting synsets were aligned, and new variants for the Galician language were extracted. After manual evaluation the approach presented a 96.8% accuracy

Dagstuhl Research Online Publication Server

An overview of portuguese wordnets

Author: Freitas Cláudia
Oliveira Hugo Gonçalo
Paiva Valeria de
Rademaker Alexandre
Real Livy
Simões Alberto
Publication venue: Global Wordnet Association
Publication date: 01/01/2016
Field of study

Semantic relations between words are key to building systems that aim to understand and manipulate language. For En- glish, the “de facto” standard for representing this kind of knowledge is Princeton’s WordNet. Here, we describe the wordnet-like resources currently available for Portuguese: their origins, methods of creation, sizes, and usage restrictions. We start tackling the problem of comparing them, but only in quantitative terms. Finally, we sketch ideas for potential collaboration between some of the projects.(undefined

Universidade do Minho: RepositoriUM

EuroWordNet as a multilingual database

Author: Vossen P.J.T.M.
Publication venue: Amsterdam: Vrije Universiteit
Publication date: 01/01/1999
Field of study

VU Research Portal

Chinese WordNet Domains: Bootstrapping Chinese WordNet with Semantic Domain Labels

Author: Huang Chu-Ren
Lee Lung-Hao
Yu Yu-Ting
Publication venue: City University of Hong Kong
Publication date: 01/01/2009
Field of study

PACLIC 23 / City University of Hong Kong / 3-5 December 200

The Hong Kong Polytechnic University Pao Yue-kong Library

Waseda University Repository

Representing the Translation Relation in a Bilingual Wordnet

Author: Linden Krister
Niemi Jyrki
Publication venue: European Language Resources Association (ELRA)
Publication date: 23/05/2012
Field of study

Proceeding volume: 8This paper describes representing translations in the Finnish wordnet, FinnWordNet (FiWN), and constructing the FiWN database. FiWN was created by translating all the word senses of the Princeton WordNet (PWN) into Finnish and by joining the translations with the semantic and lexical relations of PWN extracted into a relational (database) format. The approach naturally resulted in a translation relation between PWN and FiWN. Unlike many other multilingual wordnets, the translation relation in FiWN is primarily not on the level of synsets, but on the level of an individual word sense, which allows more precise translation correspondences. This can easily be projected into a synset-level translation relation, used for linking with other wordnets via Core WordNet. Synset-level translations are also used as a default in the absence of word sense translations. The FiWN data in the relational database can be converted to other formats. In the PWN database format, translations are attached to source-language words, allowing the implementation of a Web search interface also working as a bilingual dictionary. Another representation encodes the translation relation as a finite-state transducer.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Introducing the Arabic WordNet project

Author: Alkhalifa M.
Black W.
Fellbaum C.
Vossen P.J.T.M.
Publication venue: Amsterdam: Vrije Universiteit
Publication date: 01/01/2006
Field of study

VU Research Portal