Search CORE

205 research outputs found

Extending, trimming and fusing WordNet for technical documents

Author: Vossen P.
Publication venue: The Association for Computational Linguistics
Publication date: 01/01/2001
Field of study

This paper describes a tool for the automatic extension and trimming of a multilingual WordNet database for cross-lingual retrieval and multilingual ontology building in intranets and domain-specific document collections. Hierarchies, built from automatically extracted terms and combined with the WordNet relations, are trimmed with a disambiguation method based on the document salience of the words in the glosses. The disambiguation is tested in a cross-lingual retrieval task, showing considerable improvement (7%-11%). The condensed hierarchies can be used as browse-interfaces to the documents complementary to retrieval

CiteSeerX

EuroWordNet: final report

Author: Vossen P.J.T.M.
Publication venue: 'Vrije Universiteit Amsterdam Faculty of Law'
Publication date: 01/01/1999
Field of study

Meaningful results for Information Retrieval in the MEANING project

Author: Agirre E.
Alegria I.
Farwell D.
Fuentes M.
Rigau G.
Vossen P.T.J.M.
Publication venue
Publication date: 01/01/2006
Field of study

The goal of the MEANING project (IST-2001-34460) is to develop tools for the automatic acquisition of lexical knowledge that will help Word Sense Disambiguation (WSD). The acquired lexical knowledge from various sources and various languages is stored in the Multilingual Central Repository (MCR) (Atserias et al 04), which is based on the design of the EuroWordNet database. The MCR holds wordnets in various languages (English, Spanish, Italian, Catalan and Basque), which are interconnected via an Inter-Lingual-Index (ILI). In addition, the MCR holds a number of ontologies and domain labels related to al

CiteSeerX

Automatic sense clustering in EuroWordNet

Author: Peters I.
Peters W.
Vossen P.
Publication venue: Paris : ELRA
Publication date: 01/01/1998
Field of study

This paper addresses ways in which we envisage to reduce the fine-grainedness of WordNet and express in a more systematic way the relations between its numerous sense distinctions. In the EuroWordNet project, we have distinguished various automatic methods for grouping senses into more coarse-grained sense groups. These resulting clusters reflect aspects of lexical organization, displaying a variety of semantic regularities or generalizations. In this way, the compatibility of the language-specific wordnets in the EuroWordNet multilingual knowledge base is increased

CiteSeerX

Enriching Ontologies with Multilingual Information

Author: Espinoza M.
Gómez-Pérez A.
Mena E.
Publication venue: Facultad de Informática (UPM)
Publication date: 01/06/2008
Field of study

Organizations working in a multilingual environment demand multilingual ontologies. To solve this problem we propose LabelTranslator, a system that automatically localizes ontologies. Ontology localization consists of adapting an ontology to a concrete language and cultural community. LabelTranslator takes as input an ontology whose labels are described in a source natural language and obtains the most probable translation into a target natural language of each ontology label. Our main contribution is the automatization of this process which reduces human efforts to localize an ontology manually. First, our system uses a translation service which obtains automatic translations of each ontology label (name of an ontology term) from/into English, German, or Spanish by consulting different linguistic resources such as lexical databases, bilingual dictionaries, and terminologies. Second, a ranking method is used to sort each ontology label according to similarity with its lexical and semantic context. The experiments performed in order to evaluate the quality of translation show that our approach is a good approximation to automatically enrich an ontology with multilingual information

Web 2.0, language resources and standards to automatically build a multilingual named entity lexicon

Author: Ferrández Sergio
Monachini Monica
Muñoz Rafael
Toral Antonio
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 17/06/2011
Field of study

This paper proposes to advance in the current state-of-the-art of automatic Language Resource (LR) building by taking into consideration three elements: (i) the knowledge available in existing LRs, (ii) the vast amount of information available from the collaborative paradigm that has emerged from the Web 2.0 and (iii) the use of standards to improve interoperability. We present a case study in which a set of LRs for diﬀerent languages (WordNet for English and Spanish and Parole-Simple-Clips for Italian) are extended with Named Entities (NE) by exploiting Wikipedia and the aforementioned LRs. The practical result is a multilingual NE lexicon connected to these LRs and to two ontologies: SUMO and SIMPLE. Furthermore, the paper addresses an important problem which aﬀects the Computational Linguistics area in the present, interoperability, by making use of the ISO LMF standard to encode this lexicon. The diﬀerent steps of the procedure (mapping, disambiguation, extraction, NE identiﬁcation and postprocessing) are comprehensively explained and evaluated. The resulting resource contains 974,567, 137,583 and 125,806 NEs for English, Spanish and Italian respectively. Finally, in order to check the usefulness of the constructed resource, we apply it into a state-of-the-art Question Answering system and evaluate its impact; the NE lexicon improves the system’s accuracy by 28.1%. Compared to previous approaches to build NE repositories, the current proposal represents a step forward in terms of automation, language independence, amount of NEs acquired and richness of the information represented

MEANING-full effects in information retrieval

Author: Glaser E.
Gradinaru M.
Steenwijk R. van
Vossen P.
Zutphen H. van
Publication venue: Delft : Irion Technologies BV
Publication date: 01/01/2005
Field of study

This deliverable reports on testing the use and effect of the integration of the MEANING technology in the TwentyOne search engine of Irion