1,178 research outputs found
Learning to Resolve Natural Language Ambiguities: A Unified Approach
We analyze a few of the commonly used statistics based and machine learning
algorithms for natural language disambiguation tasks and observe that they can
be re-cast as learning linear separators in the feature space. Each of the
methods makes a priori assumptions, which it employs, given the data, when
searching for its hypothesis. Nevertheless, as we show, it searches a space
that is as rich as the space of all linear separators. We use this to build an
argument for a data driven approach which merely searches for a good linear
separator in the feature space, without further assumptions on the domain or a
specific problem.
We present such an approach - a sparse network of linear separators,
utilizing the Winnow learning algorithm - and show how to use it in a variety
of ambiguity resolution problems. The learning approach presented is
attribute-efficient and, therefore, appropriate for domains having very large
number of attributes.
In particular, we present an extensive experimental comparison of our
approach with other methods on several well studied lexical disambiguation
tasks such as context-sensitive spelling correction, prepositional phrase
attachment and part of speech tagging. In all cases we show that our approach
either outperforms other methods tried for these tasks or performs comparably
to the best
Forgetting Exceptions is Harmful in Language Learning
We show that in language learning, contrary to received wisdom, keeping
exceptional training instances in memory can be beneficial for generalization
accuracy. We investigate this phenomenon empirically on a selection of
benchmark natural language processing tasks: grapheme-to-phoneme conversion,
part-of-speech tagging, prepositional-phrase attachment, and base noun phrase
chunking. In a first series of experiments we combine memory-based learning
with training set editing techniques, in which instances are edited based on
their typicality and class prediction strength. Results show that editing
exceptional instances (with low typicality or low class prediction strength)
tends to harm generalization accuracy. In a second series of experiments we
compare memory-based learning and decision-tree learning methods on the same
selection of tasks, and find that decision-tree learning often performs worse
than memory-based learning. Moreover, the decrease in performance can be linked
to the degree of abstraction from exceptions (i.e., pruning or eagerness). We
provide explanations for both results in terms of the properties of the natural
language processing tasks and the learning algorithms.Comment: 31 pages, 7 figures, 10 tables. uses 11pt, fullname, a4wide tex
styles. Pre-print version of article to appear in Machine Learning 11:1-3,
Special Issue on Natural Language Learning. Figures on page 22 slightly
compressed to avoid page overloa
Ontology-Aware Token Embeddings for Prepositional Phrase Attachment
Type-level word embeddings use the same set of parameters to represent all
instances of a word regardless of its context, ignoring the inherent lexical
ambiguity in language. Instead, we embed semantic concepts (or synsets) as
defined in WordNet and represent a word token in a particular context by
estimating a distribution over relevant semantic concepts. We use the new,
context-sensitive embeddings in a model for predicting prepositional phrase(PP)
attachments and jointly learn the concept embeddings and model parameters. We
show that using context-sensitive embeddings improves the accuracy of the PP
attachment model by 5.4% absolute points, which amounts to a 34.4% relative
reduction in errors.Comment: ACL 201
ETRANS: A English-Thai translator
ETRANS is an experimental English-Thai machine translation (MT) system that translates a simple English sentence into a grammatically correct Thai sentence. The entire system is written in C-Prolog, and runs on UNIX systems. The MT strategy taken by ETRANS is an interlingual strategy with a parser for English and a generator for Thai. The parser creates a semantic representation equivalent to the meaning of the English sentence. A generator then interprets the semantic representation into Thai. ETRANS employs frames as a means for representing knowledge, and an augmented transition network (ATN) as the linguistic framework for analyzing and generating sentences
- …