652 research outputs found
POS Tagging Using Relaxation Labelling
Relaxation labelling is an optimization technique used in many fields to
solve constraint satisfaction problems. The algorithm finds a combination of
values for a set of variables such that satisfies -to the maximum possible
degree- a set of given constraints. This paper describes some experiments
performed applying it to POS tagging, and the results obtained. It also ponders
the possibility of applying it to word sense disambiguation.Comment: compressed & uuencoded postscript file. Paper length: 39 page
A Machine learning approach to POS tagging
We have applied inductive learning of statistical decision trees
and relaxation labelling to the Natural Language Processing (NLP)
task of morphosyntactic disambiguation (Part Of Speech Tagging).
The learning process is supervised and obtains a language
model oriented to resolve POS ambiguities. This model consists
of a set of statistical decision trees expressing distribution of
tags and words in some relevant contexts.
The acquired language models are complete enough to be directly
used as sets of POS disambiguation rules, and include more complex
contextual information than simple collections of n-grams usually
used in statistical taggers.
We have implemented a quite simple and fast tagger that has been
tested and evaluated on the Wall Street Journal (WSJ) corpus with
a remarkable accuracy.
However, better results can be obtained by translating the trees
into rules to feed a flexible relaxation labelling based tagger.
In this direction we describe a tagger which is able to use
information of any kind (n-grams, automatically acquired constraints,
linguistically motivated manually written constraints, etc.), and in
particular to incorporate the machine learned decision trees.
Simultaneously, we address the problem of tagging when only
small training material is available, which is crucial in any process
of constructing, from scratch, an annotated corpus. We show that quite
high accuracy can be achieved with our system in this situation.Postprint (published version
- …