31 research outputs found
Knowledge-based Word Sense Disambiguation using Topic Models
Word Sense Disambiguation is an open problem in Natural Language Processing
which is particularly challenging and useful in the unsupervised setting where
all the words in any given text need to be disambiguated without using any
labeled data. Typically WSD systems use the sentence or a small window of words
around the target word as the context for disambiguation because their
computational complexity scales exponentially with the size of the context. In
this paper, we leverage the formalism of topic model to design a WSD system
that scales linearly with the number of words in the context. As a result, our
system is able to utilize the whole document as the context for a word to be
disambiguated. The proposed method is a variant of Latent Dirichlet Allocation
in which the topic proportions for a document are replaced by synset
proportions. We further utilize the information in the WordNet by assigning a
non-uniform prior to synset distribution over words and a logistic-normal prior
for document distribution over synsets. We evaluate the proposed method on
Senseval-2, Senseval-3, SemEval-2007, SemEval-2013 and SemEval-2015 English
All-Word WSD datasets and show that it outperforms the state-of-the-art
unsupervised knowledge-based WSD system by a significant margin.Comment: To appear in AAAI-1
Semantics-based information extraction for detecting economic events
As today's financial markets are sensitive to breaking news on economic events, accurate and timely automatic identification of events in news items is crucial. Unstructured news items originating from many heterogeneous sources have to be mined in order to extract knowledge useful for guiding decision making processes. Hence, we propose the Semantics-Based Pipeline for Economic Event Detection (SPEED), focusing on extracting financial events from news articles and annotating these with meta-data at a speed that enables real-time use. In our implementation, we use some components of an existing framework as well as new components, e.g., a high-performance Ontology Gazetteer, a Word Group Look-Up component, a Word Sense Disambiguator, and components for detecting economic events. Through their interaction with a domain-specific ontology, our novel, semantically enabled components constitute a feedback loop which fosters future reuse of acquired knowledge in the event detection process
A Proposal for word sense disambiguation using conceptual distance
This paper presents a method for the resolution of lexical ambiguity and its
automatic evaluation over the Brown Corpus. The method relies on the use of
the wide-coverage noun taxonomy of WordNet and the notion of conceptual
distance among concepts, captured by a Conceptual Density formula developed
for this purpose. This fully automatic method requires no hand coding of
lexical entries, hand tagging of text nor any kind of training process. The
results of the experiment have been automatically evaluated against SemCor,
the sense-tagged version of the Brown Corpus.Postprint (published version
Word Sense Disambiguation: A Structured Learning Perspective
This paper explores the application of structured learning methods (SLMs) to word sense disambiguation (WSD). On one hand, the semantic dependencies between polysemous words in the sentence can be encoded in SLMs. On the other hand, SLMs obtained significant achievements in natural language processing, and so it is a natural idea to apply them to WSD. However, there are many theoretical and practical problems when SLMs are applied to WSD, due to characteristics of WSD. Beginning with the method based on hidden Markov model, this paper proposes for the first time a comprehensive and unified solution for WSD based on maximum entropy Markov model, conditional random field and tree-structured conditional random field, and reduces the time complexity and running time of the proposed methods to a reasonable level by beam search, approximate training, and parallel training. The update of models brings performance improvement, the introduction of one step dependency improves performance by 1--5 percent, the adoption of non-independent features improves performance by 2--3 percent, and the extension of underlying structure to dependency parsing tree improves performance by about 1 percent. On the English all-words WSD dataset of Senseval-2004, the method based on tree-structured conditional random field outperforms the best attendee system significantly. Nevertheless, almost all machine learning methods suffer from data sparseness due to the scarcity of sense tagged data, and so do SLMs. Besides improving structured learning methods according to the characteristics of WSD, another approach to improve disambiguation performance is to mine disambiguation knowledge from all kinds of sources, such as Wikipedia, parallel corpus, and to alleviate knowledge acquisition bottleneck of WSD