39 research outputs found
Training Set Properties and Decision-Tree Taggers: A Closer Look
This paper examines three ways to improve part-of-speech tagging accuracy: by increasing the number of training examples presented to the tree learner, by increasing the number of word-specific subtrees grown, and by increasing the number of ngrams (preceding parts of speech) per training example. Though experimental results indicate that additional training data generally leads to the greatest amount improved accuracy, they also demonstrate that including word-specific subtrees can be useful and that trees considering two or more previous parts of speech in their classification decision are superior to those examining just one.