2 research outputs found

    A POS-Tagger generator for unknown languages

    No full text
    It is current belief that POS-taggers need huge amounts of hand tagged text for training (in the order of 10/5 pretagged words). In this paper we show how to generate POS-taggers trained with no more than 10/4 hand tagger words. These taggers achieve precision results that are as good as the best performant state-of-the-art POS-taggers. We overcome the huge training corpus problem by carefully combining a large lexicon with an efficient neural tagger. Experimental results are presented and discussed for the Susanne Corpus and three different Portuguese corpora. 96% precision rates are obtained when unknown words occur in the test set
    corecore