Search CORE

152 research outputs found

Knowledge-rich Word Sense Disambiguation rivaling supervised systems

Author: NAVIGLI ROBERTO
PONZETTO SIMONE PAOLO
Publication venue
Publication date: 01/01/2010
Field of study

One of the main obstacles to high-performance Word Sense Disambiguation (WSD) is the knowledge acquisition bottleneck. In this paper, we present a methodology to automatically extend WordNet with large amounts of semantic relations from an encyclopedic resource, namely Wikipedia. We show that, when provided with a vast amount of high-quality semantic relations, simple knowledge-lean disambiguation algorithms compete with state-of-the-art supervised WSD systems in a coarse-grained all-words setting and outperform them on gold-standard domain-specific datasets. © 2010 Association for Computational Linguistics

Archivio della ricerca- Università di Roma La Sapienza

WikiSense: Supersense Tagging of Wikipedia Named Entities Based WordNet

Author: Chang Jason S.
Chang Joseph
Tsai Richard Tzong-Han
Publication venue: City University of Hong Kong
Publication date: 01/01/2009
Field of study

PACLIC 23 / City University of Hong Kong / 3-5 December 200

Waseda University Repository

Extraction de paraphrases désambiguïsées à partir d'un corpus d'articles encyclopédiques alignés automatiquement

Author: Chaumartin François-Régis
Publication venue: HAL CCSD
Publication date: 01/01/2007
Field of study

International audienceWe describe here how to automatically import encyclopedic articles into WordNet. This process makes it possible to create new entries, attached to their appropriate hypernym. In addition, the preexisting entries of WordNet can get enriched with complementary descriptions. Reiterating this process on several encyclopedias makes it possible to constitute a corpus of comparable articles; we can then automatically extract paraphrases from the couples of articles that have been created. The paraphrases components can finally be disambiguated, by means of a similarity measure (using the verbs WordNet hierarchy).Nous décrivons ici comment enrichir automatiquement WordNet en y important des articles encyclopédiques. Ce processus permet de créer des nouvelles entrées, en les rattachant au bon hyperonyme. Par ailleurs, les entrées préexistantes de WordNet peuvent être enrichies de descriptions complémentaires. La répétition de ce processus sur plusieurs encyclopédies permet de constituer un corpus d'articles comparables. On peut ensuite extraire automatiquement des paraphrases à partir des couples d'articles ainsi créés. Grâce à l'application d'une mesure de similarité, utilisant la hiérarchie de verbes de WordNet, les constituants de ces paraphrases peuvent être désambiguïsés

Hal-Diderot

Using a Bilingual Resource to Add Synonyms to a Wordnet : FinnWordNet and Wikipedia as an Example

Author: Hyvärinen Mirka
Linden Krister
Niemi Jyrki
Publication venue
Publication date: 09/01/2012
Field of study

This paper presents a simple method for finding new synonym candidates to a bilingual wordnet by using another bilingual resource. Our goal is to add new synonyms to the existing synsets of the Finnish WordNet, which has direct word sense translation correspondences to the Princeton WordNet. For this task, we use Wikipedia and its links between the articles of the same topic in Finnish and English.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

An effective, low-cost measure of semantic relatedness obtained from Wikipedia links

Author: Milne David N.
Witten Ian H.
Publication venue: AAAI Press
Publication date: 01/01/2008
Field of study

This paper describes a new technique for obtaining measures of semantic relatedness. Like other recent approaches, it uses Wikipedia to provide structured world knowledge about the terms of interest. Out approach is unique in that it does so using the hyperlink structure of Wikipedia rather than its category hierarchy or textual content. Evaluation with manually defined measures of semantic relatedness reveals this to be an effective compromise between the ease of computation of the former approach and the accuracy of the latter

CiteSeerX

Research Commons@Waikato

Onto.PT: Automatic Construction of a Lexical Ontology for Portuguese

Author: Gomes Paulo
Oliveira Hugo Gonçalo
Publication venue
Publication date: 01/08/2010
Field of study

This ongoing research presents an alternative to the man- ual creation of lexical resources and proposes an approach towards the automatic construction of a lexical ontology for Portuguese. Tex- tual sources are exploited in order to obtain a lexical network based on terms and, after clustering and mapping, a wordnet-like lexical on- tology is created. At the end of the paper, current results are shown

Estudo Geral

Mining Domain-Specific Thesauri from Wikipedia: A case study

Author: Medelyan Olena
Milne David N.
Witten Ian H.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

Domain-specific thesauri are high-cost, high-maintenance, high-value knowledge structures. We show how the classic thesaurus structure of terms and links can be mined automatically from Wikipedia. In a comparison with a professional thesaurus for agriculture we find that Wikipedia contains a substantial proportion of its concepts and semantic relations; furthermore it has impressive coverage of contemporary documents in the domain. Thesauri derived using our techniques capitalize on existing public efforts and tend to reflect contemporary language usage better than their costly, painstakingly-constructed manual counterparts

CiteSeerX

Crossref

Research Commons@Waikato

Automatic Extension of WOLF

Author: Fišer Darja
Sagot Benoît
Publication venue: HAL CCSD
Publication date: 09/01/2012
Field of study

International audienceIn this paper we present the extension of WOLF, a freely available, automatically creat- ed wordnet for French, the biggest drawback of which has until now been the lack of general concepts that are typically expressed with highly polysemous vocabulary that is on the one hand the most valuable for applications in human language technologies but also the most difficult to add to wordnet accurately with automatic methods on the other. Using a set of features, we train a Maximum Entropy classifier on the existing core wordnet to be able to assign appropriate synset ids to new words, extracted from multiple, multilingual sources of lexical knowledge, such as Wik- tionaries, Wikipedias and corpora. Automatic and manual evaluation shows high coverage as well as high quality of the resulting lexico-semantic repository of. Another important ad- vantage of the approach is that it is fully au- tomatic and language-independent and could therefore be applied to any other language still lacking a wordnet

INRIA a CCSD electronic archive server

Hal-Diderot

Automatising the learning of lexical patterns: An application to the enrichment of WordNet by extracting semantic relationships from Wikipedia

Author: Alfonseca
Arevalo
Baeza-Yates
Batali
Berners-Lee
Bonzi
Church
Ding
Enrique Alfonseca
Etzioni
Gruber
Harabagiu
Hearst
Maedche
Marcus
Maria Ruiz-Casado
Miller
Navigli
Pablo Castells
Ruiz-Casado
Soderland
Wagner
Wilks
Publication venue: 'Elsevier BV'
Publication date: 01/01/2007
Field of study

This is the author’s version of a work that was accepted for publication in Journal Data & Knowledge Engineering. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Journal Data & Knowledge Engineering, 61, 3, (2007) DOI: 10.1016/j.datak.2006.06.011This paper describes an automatic approach to identify lexical patterns that represent semantic relationships between concepts in an on-line encyclopedia. Next, these patterns can be applied to extend existing ontologies or semantic networks with new relations. The experiments have been performed with the Simple English Wikipedia and WordNet 1.7. A new algorithm has been devised for automatically generalising the lexical patterns found in the encyclopedia entries. We have found general patterns for the hyperonymy, hyponymy, holonymy and meronymy relations and, using them, we have extracted more than 2600 new relationships that did not appear in WordNet originally. The precision of these relationships depends on the degree of generality chosen for the patterns and the type of relation, being around 60-70% for the best combinations proposed.This work has been sponsored by MEC, project number TIN-2005-0688

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Biblos-e Archivo