4,439 research outputs found
Highlighting relevant concepts from Topic Signatures
This paper presents deepKnowNet, a new fully automatic method for building highly dense and accurate knowledge bases from existing
semantic resources. Basically, the method applies a knowledge-based Word Sense Disambiguation algorithm to assign the most appropriate WordNet sense to large sets of topically related words acquired from the web, named TSWEB. This Word Sense Disambiguation algorithm is the personalized PageRank algorithm implemented in UKB. This new method improves by automatic means the current content of WordNet by creating large volumes of new and accurate semantic relations between synsets. KnowNet was our first attempt towards the acquisition of large volumes of semantic relations. However, KnowNet had some limitations that have been overcomed with deepKnowNet. deepKnowNet disambiguates the first hundred words of all Topic Signatures from the web (TSWEB). In this case, the method highlights the most relevant word senses of each Topic Signature and filter out the ones that are not so related to the topic. In fact,
the knowledge it contains outperforms any other resource when is empirically evaluated in a common framework based on a similarity
task annotated with human judgementsPostprint (published version
Recommended from our members
Lexical Chaining and Word-Sense-Disambiguation
Lexical chains algorithms attempt to find sequences of words in a document that are closely related semantically. Such chains have been argued to provide a good indication of the topics covered by the document without requiring a deeper analysis of the text, and have been proposed for many NLP tasks. Different underlying lexical semantic relations based on WordNet have been used for this task. Since links in WordNet connect synsets rather than words, open word-sense disambiguation becomes a necessary part of any chaining algorithm, even if the intended application is not disambiguation. Previous chaining algorithms have combined the tasks of disambiguation and chaining by choosing those word senses that maximize chain connectivity, a strategy which yields poor disambiguation accuracy in practice.
We present a novel probabilistic algorithm for finding lexical chains. Our algorithm explicitly balances the requirements of maximizing chain connectivity with the choice of probable word-senses. The algorithm achieves better disambiguation results than all previous ones, but under its optimal settings shifts this balance totally in favor of probable senses, essentially ignoring the chains. This model points to an inherent conflict between chaining and word-sensedisambiguation. By establishing an upper bound on the disambiguation potential of lexical chains, we show that chaining is theoretically highly unlikely to achieve accurate disambiguation.
Moreover, by defining a novel intrinsic evaluation criterion for lexical chains, we show that poor disambiguation accuracy also implies poor chain accuracy. Our results have crucial implications for chaining algorithms. At the very least, they show that disentangling disambiguation from chaining significantly improves chaining accuracy. The hardness of all-words disambiguation, however, implies that finding accurate lexical chains is harder than suggested by the literature.Engineering and Applied Science
Word Sense Disambiguation Based on Large Scale Polish CLARIN Heterogeneous Lexical Resources
Word Sense Disambiguation Based on Large Scale Polish CLARIN Heterogeneous Lexical Resources
Lexical resources can be applied in many different Natural Language Engineering tasks, but the most fundamental task is the recognition of word senses used in text contexts. The problem is difficult, not yet fully solved and different lexical resources provided varied support for it. Polish CLARIN lexical semantic resources are based on the plWordNet — a very large wordnet for Polish — as a central structure which is a basis for linking together several resources of different types. In this paper, several Word Sense Disambiguation (henceforth WSD) methods developed for Polish that utilise plWordNet are discussed. Textual sense descriptions in the traditional lexicon can be compared with text contexts using Lesk’s algorithm in order to find best matching senses. In the case of a wordnet, lexico-semantic relations provide the main description of word senses. Thus, first, we adapted and applied to Polish a WSD method based on the Page Rank. According to it, text words are mapped on their senses in the plWordNet graph and Page Rank algorithm is run to find senses with the highest scores. The method presents results lower but comparable to those reported for English. The error analysis showed that the main problems are: fine grained sense distinctions in plWordNet and limited number of connections between words of different parts of speech. In the second approach plWordNet expanded with the mapping onto the SUMO ontology concepts was used. Two scenarios for WSD were investigated: two step disambiguation and disambiguation based on combined networks of plWordNet and SUMO. In the former scenario, words are first assigned SUMO concepts and next plWordNet senses are disambiguated. In latter, plWordNet and SUMO are combined in one large network used next for the disambiguation of senses. The additional knowledge sources used in WSD improved the performance. The obtained results and potential further lines of developments were discussed
Recommended from our members
Lexical Chaining and Word-Sense-Disambiguation
Lexical chains algorithms attempt to find sequences of words in a document that are closely related semantically. Such chains have been argued to provide a good indication of the topics covered by the document without requiring a deeper analysis of the text, and have been proposed for many NLP tasks. Different underlying lexical semantic relations based on WordNet have been used for this task. Since links in WordNet connect synsets rather than words, open word-sense disambiguation becomes a necessary part of any chaining algorithm, even if the intended application is not disambiguation. Previous chaining algorithms have combined the tasks of disambiguation and chaining by choosing those word senses that maximize chain connectivity, a strategy which yields poor disambiguation accuracy in practice. We present a novel probabilistic algorithm for finding lexical chains. Our algorithm explicitly balances the requirements of maximizing chain connectivity with the choice of probable word-senses. The algorithm achieves better disambiguation results than all previous ones, but under its optimal settings shifts this balance totally in favor of probable senses, essentially ignoring the chains. This model points to an inherent conflict between chaining and word-sense-disambiguation. By establishing an upper bound on the disambiguation potential of lexical chains, we show that chaining is theoretically highly unlikely to achieve accurate disambiguation. Moreover, by defining a novel intrinsic evaluation criterion for lexical chains, we show that poor disambiguation accuracy also implies poor chain accuracy. Our results have crucial implications for chaining algorithms. At the very least, they show that disentangling disambiguation from chaining significantly improves chaining accuracy. The hardness of all-words disambiguation, however, implies that finding accurate lexical chains is harder than suggested by the literature.Engineering and Applied Science
Enriching very large ontologies using the WWW
This paper explores the possibility to exploit text on the world wide web in
order to enrich the concepts in existing ontologies. First, a method to
retrieve documents from the WWW related to a concept is described. These
document collections are used 1) to construct topic signatures (lists of
topically related words) for each concept in WordNet, and 2) to build
hierarchical clusters of the concepts (the word senses) that lexicalize a given
word. The overall goal is to overcome two shortcomings of WordNet: the lack of
topical links among concepts, and the proliferation of senses. Topic signatures
are validated on a word sense disambiguation task with good results, which are
improved when the hierarchical clusters are used.Comment: 6 page
- …