1,053 research outputs found
Learning to Resolve Natural Language Ambiguities: A Unified Approach
We analyze a few of the commonly used statistics based and machine learning
algorithms for natural language disambiguation tasks and observe that they can
be re-cast as learning linear separators in the feature space. Each of the
methods makes a priori assumptions, which it employs, given the data, when
searching for its hypothesis. Nevertheless, as we show, it searches a space
that is as rich as the space of all linear separators. We use this to build an
argument for a data driven approach which merely searches for a good linear
separator in the feature space, without further assumptions on the domain or a
specific problem.
We present such an approach - a sparse network of linear separators,
utilizing the Winnow learning algorithm - and show how to use it in a variety
of ambiguity resolution problems. The learning approach presented is
attribute-efficient and, therefore, appropriate for domains having very large
number of attributes.
In particular, we present an extensive experimental comparison of our
approach with other methods on several well studied lexical disambiguation
tasks such as context-sensitive spelling correction, prepositional phrase
attachment and part of speech tagging. In all cases we show that our approach
either outperforms other methods tried for these tasks or performs comparably
to the best
Efficient Decomposed Learning for Structured Prediction
Structured prediction is the cornerstone of several machine learning
applications. Unfortunately, in structured prediction settings with expressive
inter-variable interactions, exact inference-based learning algorithms, e.g.
Structural SVM, are often intractable. We present a new way, Decomposed
Learning (DecL), which performs efficient learning by restricting the inference
step to a limited part of the structured spaces. We provide characterizations
based on the structure, target parameters, and gold labels, under which DecL is
equivalent to exact learning. We then show that in real world settings, where
our theoretical assumptions may not completely hold, DecL-based algorithms are
significantly more efficient and as accurate as exact learning.Comment: ICML201
An Inventory of Preposition Relations
We describe an inventory of semantic relations that are expressed by
prepositions. We define these relations by building on the word sense
disambiguation task for prepositions and propose a mapping from preposition
senses to the relation labels by collapsing semantically related senses across
prepositions.Comment: Supplementary material for Srikumar and Roth, 2013. Modeling Semantic
Relations Expressed by Prepositions, TAC
Solving General Arithmetic Word Problems
This paper presents a novel approach to automatically solving arithmetic word
problems. This is the first algorithmic approach that can handle arithmetic
problems with multiple steps and operations, without depending on additional
annotations or predefined templates. We develop a theory for expression trees
that can be used to represent and evaluate the target arithmetic expressions;
we use it to uniquely decompose the target arithmetic problem to multiple
classification problems; we then compose an expression tree, combining these
with world knowledge through a constrained inference framework. Our classifiers
gain from the use of {\em quantity schemas} that supports better extraction of
features. Experimental results show that our method outperforms existing
systems, achieving state of the art performance on benchmark datasets of
arithmetic word problems.Comment: EMNLP 201
A Winnow-Based Approach to Context-Sensitive Spelling Correction
A large class of machine-learning problems in natural language require the
characterization of linguistic context. Two characteristic properties of such
problems are that their feature space is of very high dimensionality, and their
target concepts refer to only a small subset of the features in the space.
Under such conditions, multiplicative weight-update algorithms such as Winnow
have been shown to have exceptionally good theoretical properties. We present
an algorithm combining variants of Winnow and weighted-majority voting, and
apply it to a problem in the aforementioned class: context-sensitive spelling
correction. This is the task of fixing spelling errors that happen to result in
valid words, such as substituting "to" for "too", "casual" for "causal", etc.
We evaluate our algorithm, WinSpell, by comparing it against BaySpell, a
statistics-based method representing the state of the art for this task. We
find: (1) When run with a full (unpruned) set of features, WinSpell achieves
accuracies significantly higher than BaySpell was able to achieve in either the
pruned or unpruned condition; (2) When compared with other systems in the
literature, WinSpell exhibits the highest performance; (3) The primary reason
that WinSpell outperforms BaySpell is that WinSpell learns a better linear
separator; (4) When run on a test set drawn from a different corpus than the
training set was drawn from, WinSpell is better able than BaySpell to adapt,
using a strategy we will present that combines supervised learning on the
training set with unsupervised learning on the (noisy) test set.Comment: To appear in Machine Learning, Special Issue on Natural Language
Learning, 1999. 25 page
Applying Winnow to Context-Sensitive Spelling Correction
Multiplicative weight-updating algorithms such as Winnow have been studied
extensively in the COLT literature, but only recently have people started to
use them in applications. In this paper, we apply a Winnow-based algorithm to a
task in natural language: context-sensitive spelling correction. This is the
task of fixing spelling errors that happen to result in valid words, such as
substituting {\it to\/} for {\it too}, {\it casual\/} for {\it causal}, and so
on. Previous approaches to this problem have been statistics-based; we compare
Winnow to one of the more successful such approaches, which uses Bayesian
classifiers. We find that: (1)~When the standard (heavily-pruned) set of
features is used to describe problem instances, Winnow performs comparably to
the Bayesian method; (2)~When the full (unpruned) set of features is used,
Winnow is able to exploit the new features and convincingly outperform Bayes;
and (3)~When a test set is encountered that is dissimilar to the training set,
Winnow is better than Bayes at adapting to the unfamiliar test set, using a
strategy we will present for combining learning on the training set with
unsupervised learning on the (noisy) test set.Comment: 9 page
- …