4 research outputs found
Gujarati Word Sense Disambiguation using Genetic Algorithm
Genetic algorithms (GAs) have widely been investigated to solve hard optimization problems, including the word sense disambiguation (WSD). This problem asks to determine which sense of a polysemous word is used in a given context. Several approaches have been investigated for WSD in English, French, German and some Indo-Aryan languages like Hindi, Marathi, Malayalam, etc. however, research on WSD in Guajarati Language is relatively limited. In this paper, an approach for Guajarati WSD using Genetic algorithm has been proposed which uses Knowledge based approach where Indo-Aryan WordNet for Guajarati is used as lexical database for WSD
ShotgunWSD: An unsupervised algorithm for global word sense disambiguation inspired by DNA sequencing
In this paper, we present a novel unsupervised algorithm for word sense
disambiguation (WSD) at the document level. Our algorithm is inspired by a
widely-used approach in the field of genetics for whole genome sequencing,
known as the Shotgun sequencing technique. The proposed WSD algorithm is based
on three main steps. First, a brute-force WSD algorithm is applied to short
context windows (up to 10 words) selected from the document in order to
generate a short list of likely sense configurations for each window. In the
second step, these local sense configurations are assembled into longer
composite configurations based on suffix and prefix matching. The resulted
configurations are ranked by their length, and the sense of each word is chosen
based on a voting scheme that considers only the top k configurations in which
the word appears. We compare our algorithm with other state-of-the-art
unsupervised WSD algorithms and demonstrate better performance, sometimes by a
very large margin. We also show that our algorithm can yield better performance
than the Most Common Sense (MCS) baseline on one data set. Moreover, our
algorithm has a very small number of parameters, is robust to parameter tuning,
and, unlike other bio-inspired methods, it gives a deterministic solution (it
does not involve random choices).Comment: In Proceedings of EACL 201
Recommended from our members
The multifaceted measurement of the individual through language
Historically, research within the psychological sciences has adopted a classical approach to understanding the individual. This approach regularly involves the observation and measurement of specific, isolated psychological phenomena in an attempt to better understand psychological features, tendencies, and processes at varying levels of interest. While the scope of the traditional approach can vary depending on the construct under investigation, the core methodology and analytic strategy typically adheres to the “isolate and/or manipulate” doctrine for seeking knowledge. In recent years, however, technology has revolutionized researchers’ access to computational power, analytic techniques, and even the quality and quantity of data that can be used in scientific pursuits. This dissertation consists of 3 sets of studies that are either a) already published in peer-reviewed journals or b) are currently under review in peer-reviewed journals. The primary theme to be found in the included studies is a transition from classical methods of assessment to one where the individual is simultaneously quantified in high-dimensional space using language analysis techniques. This approach essentially constitutes the quantification of the individual as a cluster of traits/processes by means of psychological traces that are embedded in (and can be measured indirectly via) a person’s language. This approach entails measuring psychological phenomena at both greater depth and breadth than commonly seen in the psychological sciences and, additionally, serves as a convenient and powerful replacement of traditional approaches to studying psychology in the real world. The studies included in this dissertation demonstrate the usefulness of a high-dimensional psychometric approach via language in realms of authorship attribution and value measurement. In 2 of the 3 studies, language analytic techniques are used to measure consistencies within the individual that can be capitalized upon in order to determine authorial identities. In the third study, the high-dimensional approach is applied to the realm of values, demonstrating greater utility in a classic research paradigm that vastly outperforms the traditional self-report method.Psycholog