60 research outputs found
ShotgunWSD: An unsupervised algorithm for global word sense disambiguation inspired by DNA sequencing
In this paper, we present a novel unsupervised algorithm for word sense
disambiguation (WSD) at the document level. Our algorithm is inspired by a
widely-used approach in the field of genetics for whole genome sequencing,
known as the Shotgun sequencing technique. The proposed WSD algorithm is based
on three main steps. First, a brute-force WSD algorithm is applied to short
context windows (up to 10 words) selected from the document in order to
generate a short list of likely sense configurations for each window. In the
second step, these local sense configurations are assembled into longer
composite configurations based on suffix and prefix matching. The resulted
configurations are ranked by their length, and the sense of each word is chosen
based on a voting scheme that considers only the top k configurations in which
the word appears. We compare our algorithm with other state-of-the-art
unsupervised WSD algorithms and demonstrate better performance, sometimes by a
very large margin. We also show that our algorithm can yield better performance
than the Most Common Sense (MCS) baseline on one data set. Moreover, our
algorithm has a very small number of parameters, is robust to parameter tuning,
and, unlike other bio-inspired methods, it gives a deterministic solution (it
does not involve random choices).Comment: In Proceedings of EACL 201
D-Bees: A Novel Method Inspired by Bee Colony Optimization for Solving Word Sense Disambiguation
Word sense disambiguation (WSD) is a problem in the field of computational
linguistics given as finding the intended sense of a word (or a set of words)
when it is activated within a certain context. WSD was recently addressed as a
combinatorial optimization problem in which the goal is to find a sequence of
senses that maximize the semantic relatedness among the target words. In this
article, a novel algorithm for solving the WSD problem called D-Bees is
proposed which is inspired by bee colony optimization (BCO)where artificial bee
agents collaborate to solve the problem. The D-Bees algorithm is evaluated on a
standard dataset (SemEval 2007 coarse-grained English all-words task corpus)and
is compared to simulated annealing, genetic algorithms, and two ant colony
optimization techniques (ACO). It will be observed that the BCO and ACO
approaches are on par
Entity Linking meets Word Sense Disambiguation: A Unified Approach
Entity Linking (EL) and Word Sense Disambiguation (WSD) both address the lexical ambiguity
of language. But while the two tasks are pretty similar, they differ in a fundamental respect: in EL the textual mention can be linked to a named entity which may or may not contain the exact mention, while in WSD there is a perfect match between the word form (better, its lemma) and a suitable word sense. In this paper we present Babelfy, a unified graph-based approach to EL and WSD based on a loose identification of candidate meanings coupled with a densest subgraph heuristic which selects high-coherence semantic interpretations. Our experiments show state-ofthe-art performances on both tasks on 6 different datasets, including a multilingual setting. Babelfy is online at http://babelfy.orgEntity Linking (EL) and Word Sense Disambiguation (WSD) both address the lexical ambiguity
of language. But while the two tasks are pretty similar, they differ in a fundamental respect: in EL the textual mention can be linked to a named entity which may or may not contain the exact mention, while in WSD there is a perfect match between the word form (better, its lemma) and a suitable word sense. In this paper we present Babelfy, a unified graph-based approach to EL and WSD based on a loose identification of candidate meanings coupled with a densest subgraph heuristic which selects high-coherence semantic interpretations. Our experiments show state-ofthe-art performances on both tasks on 6 different datasets, including a multilingual setting. Babelfy is online at http://babelfy.or
Gujarati Word Sense Disambiguation using Genetic Algorithm
Genetic algorithms (GAs) have widely been investigated to solve hard optimization problems, including the word sense disambiguation (WSD). This problem asks to determine which sense of a polysemous word is used in a given context. Several approaches have been investigated for WSD in English, French, German and some Indo-Aryan languages like Hindi, Marathi, Malayalam, etc. however, research on WSD in Guajarati Language is relatively limited. In this paper, an approach for Guajarati WSD using Genetic algorithm has been proposed which uses Knowledge based approach where Indo-Aryan WordNet for Guajarati is used as lexical database for WSD
Extension lexicale de définitions grâce à des corpus annotés en sens
International audienceLexical Expansion of definitions based on sense-annotated corpus For many natural language processing tasks and applications, it is necessary to determine the semantic relatedness between senses, words or text segments. In this article, we focus on a knowledge-based measure, the Lesk measure, which is certainly among the most commonly used. The similarity between two senses is computed as the number of overlapping words in the definitions of the senses from a dictionary. In this article, we study the expansion of definitions through the use of sense-annotated corpora. The idea is to take into account words that are most frequently used around a particular sense and to use the top of the frequency distribution to extend the corresponding definition. We show better performances on a Word Sense Disambiguation task surpassing state-of-the-artPour un certain nombre de tâches ou d'applications du TALN, il est nécessaire de déterminer la proximité sémantique entre des sens, des mots ou des segments textuels. Dans cet article, nous nous intéressons à une mesure basée sur des savoirs, la mesure de Lesk. La proximité sémantique de deux définitions est évaluée en comptant le nombre de mots communs dans les définitions correspondantes dans un dictionnaire. Dans cet article, nous étudions plus particulièrement l'extension de définitions grâce à des corpus annotés en sens. Il s'agit de prendre en compte les mots qui sont utilisés dans le voisinage d'un certain sens et d'étendre lexicalement la définition correspondante. Nous montrons une amélioration certaine des performances obtenues en désambiguïsation lexicale qui dépassent l'état de l'art
A Game-Theoretic Approach to Word Sense Disambiguation
This article presents a new model for word sense disambiguation formulated in terms of evolutionary game theory, where each word to be disambiguated is represented as a node on a graph whose edges represent word relations and senses are represented as classes. The words simultaneously update their class membership preferences according to the senses that neighboring words are likely to choose. We use distributional information to weigh the influence that each word has on the decisions of the others and semantic similarity information to measure the strength of compatibility among the choices. With this information we can formulate the word sense disambiguation problem as a constraint satisfaction problem and solve it using tools derived from game theory, maintaining the textual coherence. The model is based on two ideas: Similar words should be assigned to similar classes and the meaning of a word does not depend on all the words in a text but just on some of them. The article provides an in-depth motivation of the idea of modeling the word sense disambiguation problem in terms of game theory, which is illustrated by an example. The conclusion presents an extensive analysis on the combination of similarity measures to use in the framework and a comparison with state-of-the-art systems. The results show that our model outperforms state-of-the-art algorithms and can be applied to different tasks and in different scenarios
Spreading semantic information by Word Sense Disambiguation
This paper presents an unsupervised approach to solve semantic ambiguity based on the integration of the Personalized PageRank algorithm with word-sense frequency information. Natural Language tasks such as Machine Translation or Recommender Systems are likely to be enriched by our approach, which includes semantic information that obtains the appropriate word-sense via support from two sources: a multidimensional network that includes a set of different resources (i.e. WordNet, WordNet Domains, WordNet Affect, SUMO and Semantic Classes); and the information provided by word-sense frequencies and word-sense collocation from the SemCor Corpus. Our series of results were analyzed and compared against the results of several renowned studies using SensEval-2, SensEval-3 and SemEval-2013 datasets. After conducting several experiments, our procedure produced the best results in the unsupervised procedure category taking SensEval campaigns rankings as reference.This research work has been partially funded by the University of Alicante, Generalitat Valenciana , Spanish Government, Ministerio de Educación, Cultura y Deporte and ASAP - Ayudas Fundación BBVA a equipos de investigación científica 2016(FUNDACIONBBVA2-16PREMIO) through the projects, TIN2015- 65100-R, TIN2015-65136-C2-2-R, PROMETEOII/2014/001, GRE16- 01: “Plataforma inteligente para recuperación, análisis y representación de la información generada por usuarios en Internet” and PR16_SOC_0013
Recommended from our members
Automatic message annotation and semantic interface for context aware mobile computing
This thesis was submitted for the degree of Docter of Philosophy and awarded by Brunel University.In this thesis, the concept of mobile messaging awareness has been investigated by designing and implementing a framework which is able to annotate the short text messages with context ontology for semantic reasoning inference and classification purposes. The annotated metadata of text message keywords are identified and annotated with concepts, entities and knowledge that drawn from ontology without the need of learning process and the proposed framework supports semantic reasoning based messages awareness for categorization purposes. The first stage of the research is developing the framework of facilitating mobile communication with short text annotated messages (SAMS), which facilitates annotating short text message with part of speech tags augmented with an internal and external metadata. In the SAMS framework the annotation process is carried out automatically at the time of composing a message. The obtained metadata is collected from the device’s file system and the message header information which is then accumulated with the message’s tagged keywords to form an XML file, simultaneously. The significance of annotation process is to assist the proposed framework during the search and retrieval processes to identify the tagged keywords and The Semantic Web Technologies are utilised to improve the reasoning mechanism. Later, the proposed framework is further improved “Contextual Ontology based Short Text Messages reasoning (SOIM)”. SOIM further enhances the search capabilities of SAMS by adopting short text message annotation and semantic reasoning capabilities with domain ontology as Domain ontology is modeled into set of ontological knowledge modules that capture features of contextual entities and features of particular event or situation. Fundamentally, the framework SOIM relies on the hierarchical semantic distance to compute an approximated match degree of new set of relevant keywords to their corresponding abstract class in the domain ontology. Adopting contextual ontology leverages the framework performance to enhance the text comprehension and message categorization. Fuzzy Sets and Rough Sets theory have been integrated with SOIM to improve the inference capabilities and system efficiency. Since SOIM is based on the degree of similarity to choose the matched pattern to the message, the issue of choosing the best-retrieved pattern has arisen during the stage of decision-making. Fuzzy reasoning classifier based rules that adopt the Fuzzy Set theory for decision making have been applied on top of SOIM framework in order to increase the accuracy of the classification process with clearer decision. The issue of uncertainty in the system has been addressed by utilising the Rough Sets theory, in which the irrelevant and indecisive properties which affect the framework efficiency negatively have been ignored during the matching process.The Ministry of Higher Education and Scientific Research (IRAQ
- …