Search CORE

546 research outputs found

Methods for Amharic part-of-speech tagging

Author: Argaw Atelach Alemu
Asker Lars
Gambäck Björn
Olsson Fredrik
Publication venue
Publication date: 01/01/2009
Field of study

The paper describes a set of experiments involving the application of three state-of- the-art part-of-speech taggers to Ethiopian Amharic, using three different tagsets. The taggers showed worse performance than previously reported results for Eng- lish, in particular having problems with unknown words. The best results were obtained using a Maximum Entropy ap- proach, while HMM-based and SVM- based taggers got comparable results

Crossref

Publikationer från Stockholms universitet

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

A Robust Transformation-Based Learning Approach Using Ripple Down Rules for Part-of-Speech Tagging

Author: Nguyen Dai Quoc
Nguyen Dat Quoc
Pham Dang Duc
Pham Son Bao
Publication venue: 'IOS Press'
Publication date: 19/12/2015
Field of study

In this paper, we propose a new approach to construct a system of transformation rules for the Part-of-Speech (POS) tagging task. Our approach is based on an incremental knowledge acquisition method where rules are stored in an exception structure and new rules are only added to correct the errors of existing rules; thus allowing systematic control of the interaction between the rules. Experimental results on 13 languages show that our approach is fast in terms of training time and tagging speed. Furthermore, our approach obtains very competitive accuracy in comparison to state-of-the-art POS and morphological taggers.Comment: Version 1: 13 pages. Version 2: Submitted to AI Communications - the European Journal on Artificial Intelligence. Version 3: Resubmitted after major revisions. Version 4: Resubmitted after minor revisions. Version 5: to appear in AI Communications (accepted for publication on 3/12/2015

arXiv.org e-Print Archive

Macquarie University ResearchOnline

Data-driven part-of-speech tagging of Kiswahili

Author: De Pauw G
de Schryver Gilles-Maurice
Wagacha PW
Publication venue
Publication date: 01/01/2006
Field of study

Ghent University Academic Bibliography

Archivsystem Ask23

Automatic correction of part-of-speech corpora

Author: Bucar Shigemori Lia Saki
Reichel Uwe D.
Publication venue
Publication date: 01/01/2008
Field of study

In this study a simple method for automatic correction of part-ofspeech corpora is presented, which works as follows: Initially two or more already available part-of-speech taggers are applied on the data. Then a sample of differing outputs is taken to train a classifier to predict for each difference which of the taggers (if any) delivered the correct output. As classifiers we employed instance-based learning, a C4.5 decision tree and a Bayesian classifier. Their performances ranged from 59.1 % to 67.3 %. Training on the automatically corrected data finally lead to significant improvements in tagger performance

Open Access LMU

Bootstrapping a Tagged Corpus through Combination of Existing Heterogeneous Taggers

Author: Daelemans Walter
Zavrel Jakub
Publication venue
Publication date: 01/01/2000
Field of study

This paper describes a new method, Combi-bootstrap, to exploit existing taggers and lexical resources for the annotation of corpora with new tagsets. Combi-bootstrap uses existing resources as features for a second level machine learning module, that is trained to make the mapping to the new tagset on a very small sample of annotated corpus material. Experiments show that Combi-bootstrap: i) can integrate a wide variety of existing resources, and ii) achieves much higher accuracy (up to 44.7 % error reduction) than both the best single tagger and an ensemble tagger constructed out of the same small training sample.Comment: 4 page

arXiv.org e-Print Archive

CiteSeerX

Institutional Repository Universiteit Antwerpen

Tilburg University Repository

Improving Data Driven Part-of-Speech Tagging by Morphologic Knowledge Induction

Author: Reichel Uwe D.
Publication venue
Publication date: 01/01/2005
Field of study

We present a Markov part-of-speech tagger for which the P (w|t) emission probabilities of word w given tag t are replaced by a linear interpolation of tag emission probabilities given a list of representations of w. As word representations, string su#xes of w are cut o# at the local maxima of the Normalized Backward Successor Variety. This procedure allows for the derivation of linguistically meaningful string suffixes that may relate to certain POS labels. Since no linguistic knowledge is needed, the procedure is language independent. Basic Markov model part-of-speech taggers are significantly outperformed by our model

CiteSeerX

Open Access LMU