1 research outputs found

    Application of a POS Tagger to a Novel Chronological Division of Early Modern German Text

    Get PDF
    This paper describes the application of a part-of-speech tagger to a particular configuration of historical German documents. Most natural language processing (NLP) is done on contemporary documents, and historical documents can present difficulties for these tools. I compared the performance of a single high-quality tagger on two stages of historical German (Early Modern German) materials. I used the TnT (Trigrams 'n' Tags) tagger, a probabilistic tagger developed by Thorsten Brants in a 2000 paper. I applied this tagger to two subcorpora which I derived from the University of Manchester's GerManC corpus, divided by date of creation of the original document, with each one used for both training and testing. I found that the earlier half, from a period with greater variability in the language, was significantly more difficult to tag correctly. The broader tag categories of punctuation and "other" were overrepresented in the errors.Master of Science in Information Scienc
    corecore