546 research outputs found

    Methods for Amharic part-of-speech tagging

    Get PDF
    The paper describes a set of experiments involving the application of three state-of- the-art part-of-speech taggers to Ethiopian Amharic, using three different tagsets. The taggers showed worse performance than previously reported results for Eng- lish, in particular having problems with unknown words. The best results were obtained using a Maximum Entropy ap- proach, while HMM-based and SVM- based taggers got comparable results

    A Robust Transformation-Based Learning Approach Using Ripple Down Rules for Part-of-Speech Tagging

    Full text link
    In this paper, we propose a new approach to construct a system of transformation rules for the Part-of-Speech (POS) tagging task. Our approach is based on an incremental knowledge acquisition method where rules are stored in an exception structure and new rules are only added to correct the errors of existing rules; thus allowing systematic control of the interaction between the rules. Experimental results on 13 languages show that our approach is fast in terms of training time and tagging speed. Furthermore, our approach obtains very competitive accuracy in comparison to state-of-the-art POS and morphological taggers.Comment: Version 1: 13 pages. Version 2: Submitted to AI Communications - the European Journal on Artificial Intelligence. Version 3: Resubmitted after major revisions. Version 4: Resubmitted after minor revisions. Version 5: to appear in AI Communications (accepted for publication on 3/12/2015

    Automatic correction of part-of-speech corpora

    Get PDF
    In this study a simple method for automatic correction of part-ofspeech corpora is presented, which works as follows: Initially two or more already available part-of-speech taggers are applied on the data. Then a sample of differing outputs is taken to train a classifier to predict for each difference which of the taggers (if any) delivered the correct output. As classifiers we employed instance-based learning, a C4.5 decision tree and a Bayesian classifier. Their performances ranged from 59.1 % to 67.3 %. Training on the automatically corrected data finally lead to significant improvements in tagger performance

    Bootstrapping a Tagged Corpus through Combination of Existing Heterogeneous Taggers

    Full text link
    This paper describes a new method, Combi-bootstrap, to exploit existing taggers and lexical resources for the annotation of corpora with new tagsets. Combi-bootstrap uses existing resources as features for a second level machine learning module, that is trained to make the mapping to the new tagset on a very small sample of annotated corpus material. Experiments show that Combi-bootstrap: i) can integrate a wide variety of existing resources, and ii) achieves much higher accuracy (up to 44.7 % error reduction) than both the best single tagger and an ensemble tagger constructed out of the same small training sample.Comment: 4 page

    Improving Data Driven Part-of-Speech Tagging by Morphologic Knowledge Induction

    Get PDF
    We present a Markov part-of-speech tagger for which the P (w|t) emission probabilities of word w given tag t are replaced by a linear interpolation of tag emission probabilities given a list of representations of w. As word representations, string su#xes of w are cut o# at the local maxima of the Normalized Backward Successor Variety. This procedure allows for the derivation of linguistically meaningful string suffixes that may relate to certain POS labels. Since no linguistic knowledge is needed, the procedure is language independent. Basic Markov model part-of-speech taggers are significantly outperformed by our model
    • …
    corecore