14,140 research outputs found
A Robust Transformation-Based Learning Approach Using Ripple Down Rules for Part-of-Speech Tagging
In this paper, we propose a new approach to construct a system of
transformation rules for the Part-of-Speech (POS) tagging task. Our approach is
based on an incremental knowledge acquisition method where rules are stored in
an exception structure and new rules are only added to correct the errors of
existing rules; thus allowing systematic control of the interaction between the
rules. Experimental results on 13 languages show that our approach is fast in
terms of training time and tagging speed. Furthermore, our approach obtains
very competitive accuracy in comparison to state-of-the-art POS and
morphological taggers.Comment: Version 1: 13 pages. Version 2: Submitted to AI Communications - the
European Journal on Artificial Intelligence. Version 3: Resubmitted after
major revisions. Version 4: Resubmitted after minor revisions. Version 5: to
appear in AI Communications (accepted for publication on 3/12/2015
Structured Prediction of Sequences and Trees using Infinite Contexts
Linguistic structures exhibit a rich array of global phenomena, however
commonly used Markov models are unable to adequately describe these phenomena
due to their strong locality assumptions. We propose a novel hierarchical model
for structured prediction over sequences and trees which exploits global
context by conditioning each generation decision on an unbounded context of
prior decisions. This builds on the success of Markov models but without
imposing a fixed bound in order to better represent global phenomena. To
facilitate learning of this large and unbounded model, we use a hierarchical
Pitman-Yor process prior which provides a recursive form of smoothing. We
propose prediction algorithms based on A* and Markov Chain Monte Carlo
sampling. Empirical results demonstrate the potential of our model compared to
baseline finite-context Markov models on part-of-speech tagging and syntactic
parsing
Part of Speech Tagging of Marathi Text Using Trigram Method
In this paper we present a Marathi part of speech tagger. It is a morphologically rich language. It is spoken by the native people of Maharashtra. The general approach used for development of tagger is statistical using trigram Method. The main concept of trigram is to explore the most likely POS for a token based on given information of previous two tags by calculating probabilities to determine which is the best sequence of a tag. In this paper we show the development of the tagger. Moreover we have also shown the evaluation done
Improving Data Driven Part-of-Speech Tagging by Morphologic Knowledge Induction
We present a Markov part-of-speech tagger for which the P (w|t) emission probabilities of word w given tag t are replaced by a linear interpolation of tag emission probabilities given a list of representations of w. As word representations, string su#xes of w are cut o# at the local maxima of the Normalized Backward Successor Variety. This procedure allows for the derivation of linguistically meaningful string suffixes that may relate to certain POS labels. Since no linguistic knowledge is needed, the procedure is language independent. Basic Markov model part-of-speech taggers are significantly outperformed by our model
- …