1,315 research outputs found
Methods for Amharic part-of-speech tagging
The paper describes a set of experiments
involving the application of three state-of-
the-art part-of-speech taggers to Ethiopian
Amharic, using three different tagsets.
The taggers showed worse performance
than previously reported results for Eng-
lish, in particular having problems with
unknown words. The best results were
obtained using a Maximum Entropy ap-
proach, while HMM-based and SVM-
based taggers got comparable results
Token and Type Constraints for Cross-Lingual Part-of-Speech Tagging
We consider the construction of part-of-speech taggers for resource-poor languages. Recently, manually constructed tag dictionaries from Wiktionary and dictionaries projected via bitext have been used as type constraints to overcome the scarcity of annotated data in this setting. In this paper, we show that additional token constraints can be projected from a resource-rich source language to a resource-poor target language via word-aligned bitext. We present several models to this end; in particular a partially observed conditional random field model, where coupled token and type constraints provide a partial signal for training. Averaged across eight previously studied Indo-European languages, our model achieves a 25% relative error reduction over the prior state of the art. We further present successful results on seven additional languages from different families, empirically demonstrating the applicability of coupled token and type constraints across a diverse set of languages
Robust Multilingual Part-of-Speech Tagging via Adversarial Training
Adversarial training (AT) is a powerful regularization method for neural
networks, aiming to achieve robustness to input perturbations. Yet, the
specific effects of the robustness obtained from AT are still unclear in the
context of natural language processing. In this paper, we propose and analyze a
neural POS tagging model that exploits AT. In our experiments on the Penn
Treebank WSJ corpus and the Universal Dependencies (UD) dataset (27 languages),
we find that AT not only improves the overall tagging accuracy, but also 1)
prevents over-fitting well in low resource languages and 2) boosts tagging
accuracy for rare / unseen words. We also demonstrate that 3) the improved
tagging performance by AT contributes to the downstream task of dependency
parsing, and that 4) AT helps the model to learn cleaner word representations.
5) The proposed AT model is generally effective in different sequence labeling
tasks. These positive results motivate further use of AT for natural language
tasks.Comment: NAACL 201
A Machine learning approach to POS tagging
We have applied inductive learning of statistical decision trees
and relaxation labelling to the Natural Language Processing (NLP)
task of morphosyntactic disambiguation (Part Of Speech Tagging).
The learning process is supervised and obtains a language
model oriented to resolve POS ambiguities. This model consists
of a set of statistical decision trees expressing distribution of
tags and words in some relevant contexts.
The acquired language models are complete enough to be directly
used as sets of POS disambiguation rules, and include more complex
contextual information than simple collections of n-grams usually
used in statistical taggers.
We have implemented a quite simple and fast tagger that has been
tested and evaluated on the Wall Street Journal (WSJ) corpus with
a remarkable accuracy.
However, better results can be obtained by translating the trees
into rules to feed a flexible relaxation labelling based tagger.
In this direction we describe a tagger which is able to use
information of any kind (n-grams, automatically acquired constraints,
linguistically motivated manually written constraints, etc.), and in
particular to incorporate the machine learned decision trees.
Simultaneously, we address the problem of tagging when only
small training material is available, which is crucial in any process
of constructing, from scratch, an annotated corpus. We show that quite
high accuracy can be achieved with our system in this situation.Postprint (published version
- …