Search CORE

205 research outputs found

EuroWordNet: final report

Author: Vossen P.J.T.M.
Publication venue: Amsterdam: Vrije Universiteit
Publication date: 01/01/1999
Field of study

VU Research Portal

EuroWordNet: final report

Author: Vossen P.J.T.M.
Publication venue: 'Vrije Universiteit Amsterdam Faculty of Law'
Publication date: 01/01/1999
Field of study

VU Research Portal

Grouping Synonyms by Definitions

Author: Falk Ingrid
Gardent Claire
Jacquey Evelyne
Venant Fabienne
Publication venue
Publication date: 14/09/2009
Field of study

We present a method for grouping the synonyms of a lemma according to its dictionary senses. The senses are defined by a large machine readable dictionary for French, the TLFi (Tr\'esor de la langue fran\c{c}aise informatis\'e) and the synonyms are given by 5 synonym dictionaries (also for French). To evaluate the proposed method, we manually constructed a gold standard where for each (word, definition) pair and given the set of synonyms defined for that word by the 5 synonym dictionaries, 4 lexicographers specified the set of synonyms they judge adequate. While inter-annotator agreement ranges on that task from 67% to at best 88% depending on the annotator pair and on the synonym dictionary being considered, the automatic procedure we propose scores a precision of 67% and a recall of 71%. The proposed method is compared with related work namely, word sense disambiguation, synonym lexicon acquisition and WordNet construction

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Extending, trimming and fusing WordNet for technical documents

Author: Vossen P.
Publication venue: The Association for Computational Linguistics
Publication date: 01/01/2001
Field of study

This paper describes a tool for the automatic extension and trimming of a multilingual WordNet database for cross-lingual retrieval and multilingual ontology building in intranets and domain-specific document collections. Hierarchies, built from automatically extracted terms and combined with the WordNet relations, are trimmed with a disambiguation method based on the document salience of the words in the glosses. The disambiguation is tested in a cross-lingual retrieval task, showing considerable improvement (7%-11%). The condensed hierarchies can be used as browse-interfaces to the documents complementary to retrieval

CiteSeerX

VU Research Portal

EuroWordNet as a multilingual database

Author: Vossen P.J.T.M.
Publication venue: Amsterdam: Vrije Universiteit
Publication date: 01/01/1999
Field of study

VU Research Portal

Web 2.0, language resources and standards to automatically build a multilingual named entity lexicon

Author: Ferrández Sergio
Monachini Monica
Muñoz Rafael
Toral Antonio
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 17/06/2011
Field of study

This paper proposes to advance in the current state-of-the-art of automatic Language Resource (LR) building by taking into consideration three elements: (i) the knowledge available in existing LRs, (ii) the vast amount of information available from the collaborative paradigm that has emerged from the Web 2.0 and (iii) the use of standards to improve interoperability. We present a case study in which a set of LRs for diﬀerent languages (WordNet for English and Spanish and Parole-Simple-Clips for Italian) are extended with Named Entities (NE) by exploiting Wikipedia and the aforementioned LRs. The practical result is a multilingual NE lexicon connected to these LRs and to two ontologies: SUMO and SIMPLE. Furthermore, the paper addresses an important problem which aﬀects the Computational Linguistics area in the present, interoperability, by making use of the ISO LMF standard to encode this lexicon. The diﬀerent steps of the procedure (mapping, disambiguation, extraction, NE identiﬁcation and postprocessing) are comprehensively explained and evaluated. The resulting resource contains 974,567, 137,583 and 125,806 NEs for English, Spanish and Italian respectively. Finally, in order to check the usefulness of the constructed resource, we apply it into a state-of-the-art Question Answering system and evaluate its impact; the NE lexicon improves the system’s accuracy by 28.1%. Compared to previous approaches to build NE repositories, the current proposal represents a step forward in terms of automation, language independence, amount of NEs acquired and richness of the information represented

DCU Online Research Access Service

Building a free French wordnet from multilingual resources

Author: Fišer Darja
Sagot Benoît
Publication venue: HAL CCSD
Publication date: 31/05/2008
Field of study

International audienceThis paper describes automatic construction a freely-available wordnet for French (WOLF) based on Princeton WordNet (PWN) by using various multilingual resources. Polysemous words were dealt with an approach in which a parallel corpus for five languages was word-aligned and the extracted multilingual lexicon was disambiguated with the existing wordnets for these languages. On the other hand, a bilingual approach sufficed to acquire equivalents for monosemous words. Bilingual lexicons were extracted from Wikipedia and thesauri. The results obtained from each resource were merged and ranked according to the number of resources yielding the same literal. Automatic evaluation of the merged wordnet was performed with the French WordNet (FREWN). Manual evaluation was also carried out on a sample of the generated synsets. Precision shows that the presented approach has proved to be very promising and applications to use the created wordnet are already intended

CiteSeerX

INRIA a CCSD electronic archive server

Hal-Diderot

Semantic knowledge in Question-Answering systems

Author: Barbier Vincent
Grau Brigitte
Ligozat Anne-Laure
Robba Isabelle
Vilnat Anne
Publication venue: HAL CCSD
Publication date: 01/08/2005
Field of study

International audienceQA systems need semantic knowledge to find in documents variations of the question terms. They benefit from the use of knowledge resources such as synonym dictionaries or ontologies like WordNet. Our goal here is to study to which extent variations are needed and to determine what kinds of variations are useful or necessary for these systems. This study is based on different corpora in which we analyze semantic term variations, based on reference sets of possible variations

Towards an environment for the production and the validation of lexical semantic resources

Author: Morardo Mikaël
Villemonte de La Clergerie Éric
Publication venue: HAL CCSD
Publication date: 26/05/2014
Field of study

International audienceWe present the components of a processing chain for the creation, visualization, and validation of lexical resources (formed of terms and relations between terms). The core of the chain is a component for building lexical networks relying on Harris' distributional hypothesis applied on the syntactic dependencies produced by the French parser FRMG on large corpora. Another important aspect concerns the use of an online interface for the visualization and collaborative validation of the resulting resources

INRIA a CCSD electronic archive server

Hal-Diderot