A comparative evaluation of statistical part-of-speech taggers for Russian

Gareev R.; Ivanov V.

A comparative evaluation of statistical part-of-speech taggers for Russian

Authors: Gareev R.
Ivanov V.
Publication date: 1 January 2015
Publisher

Abstract

© Springer International Publishing Switzerland 2015. Part-of-speech (POS) tagging is an essential step in many text processing applications. Quite a few works focus on solving this task for Russian; their results are not directly comparable due to the lack of shared datasets and tools. We propose a POS tagging evaluation framework for Russian that comprises existing third-party resources available for researchers. We applied the framework to compare several implementations of statistical classifiers: HunPos, Stanford POS tagger, OpenNLP implementation of MaxEnt Markov Model, and our own reimplementation of Tiered Conditional Random Fields. The best tagger that was trained on a corpus with less than one million words achieved an accuracy above 93% .We expect that the evaluation framework will facilitate future studies and improvements on POS tagging for Russian

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Kazan Federal University Digital Repository

oai:dspace.kpfu.ru:net/140105

Last time updated on 07/05/2019