36,574 research outputs found
Breaking Sticks and Ambiguities with Adaptive Skip-gram
Recently proposed Skip-gram model is a powerful method for learning
high-dimensional word representations that capture rich semantic relationships
between words. However, Skip-gram as well as most prior work on learning word
representations does not take into account word ambiguity and maintain only
single representation per word. Although a number of Skip-gram modifications
were proposed to overcome this limitation and learn multi-prototype word
representations, they either require a known number of word meanings or learn
them using greedy heuristic approaches. In this paper we propose the Adaptive
Skip-gram model which is a nonparametric Bayesian extension of Skip-gram
capable to automatically learn the required number of representations for all
words at desired semantic resolution. We derive efficient online variational
learning algorithm for the model and empirically demonstrate its efficiency on
word-sense induction task
Memory-Based Lexical Acquisition and Processing
Current approaches to computational lexicology in language technology are
knowledge-based (competence-oriented) and try to abstract away from specific
formalisms, domains, and applications. This results in severe complexity,
acquisition and reusability bottlenecks. As an alternative, we propose a
particular performance-oriented approach to Natural Language Processing based
on automatic memory-based learning of linguistic (lexical) tasks. The
consequences of the approach for computational lexicology are discussed, and
the application of the approach on a number of lexical acquisition and
disambiguation tasks in phonology, morphology and syntax is described.Comment: 18 page
Text Segmentation Using Exponential Models
This paper introduces a new statistical approach to partitioning text
automatically into coherent segments. Our approach enlists both short-range and
long-range language models to help it sniff out likely sites of topic changes
in text. To aid its search, the system consults a set of simple lexical hints
it has learned to associate with the presence of boundaries through inspection
of a large corpus of annotated data. We also propose a new probabilistically
motivated error metric for use by the natural language processing and
information retrieval communities, intended to supersede precision and recall
for appraising segmentation algorithms. Qualitative assessment of our algorithm
as well as evaluation using this new metric demonstrate the effectiveness of
our approach in two very different domains, Wall Street Journal articles and
the TDT Corpus, a collection of newswire articles and broadcast news
transcripts.Comment: 12 pages, LaTeX source and postscript figures for EMNLP-2 pape
Inducing Features of Random Fields
We present a technique for constructing random fields from a set of training
samples. The learning paradigm builds increasingly complex fields by allowing
potential functions, or features, that are supported by increasingly large
subgraphs. Each feature has a weight that is trained by minimizing the
Kullback-Leibler divergence between the model and the empirical distribution of
the training data. A greedy algorithm determines how features are incrementally
added to the field and an iterative scaling algorithm is used to estimate the
optimal values of the weights.
The statistical modeling techniques introduced in this paper differ from
those common to much of the natural language processing literature since there
is no probabilistic finite state or push-down automaton on which the model is
built. Our approach also differs from the techniques common to the computer
vision literature in that the underlying random fields are non-Markovian and
have a large number of parameters that must be estimated. Relations to other
learning approaches including decision trees and Boltzmann machines are given.
As a demonstration of the method, we describe its application to the problem of
automatic word classification in natural language processing.
Key words: random field, Kullback-Leibler divergence, iterative scaling,
divergence geometry, maximum entropy, EM algorithm, statistical learning,
clustering, word morphology, natural language processingComment: 34 pages, compressed postscrip
Computational Approaches to Measuring the Similarity of Short Contexts : A Review of Applications and Methods
Measuring the similarity of short written contexts is a fundamental problem
in Natural Language Processing. This article provides a unifying framework by
which short context problems can be categorized both by their intended
application and proposed solution. The goal is to show that various problems
and methodologies that appear quite different on the surface are in fact very
closely related. The axes by which these categorizations are made include the
format of the contexts (headed versus headless), the way in which the contexts
are to be measured (first-order versus second-order similarity), and the
information used to represent the features in the contexts (micro versus macro
views). The unifying thread that binds together many short context applications
and methods is the fact that similarity decisions must be made between contexts
that share few (if any) words in common.Comment: 23 page
- …