3 research outputs found

    Rule-Based Normalisation of Historical Text – a Diachronic Study

    No full text
    Language technology tools can be very use- ful for making information concealed in historical documents more easily accessi- ble to historians, linguists and other re- searchers in humanities. For many lan- guages, there is however a lack of linguis- tically annotated historical data that could be used for training NLP tools adapted to historical text. One way of avoiding the data sparseness problem in this context is to normalise the input text to a more modern spelling, before applying NLP tools trained on contemporary corpora. In this paper, we explore the impact of a set of hand-crafted normalisation rules on Swedish texts rang- ing from 1527 to 1812. Normalisation ac- curacy as well as tagging and parsing per- formance are evaluated. We show that, even though the rules were generated on the basis of one 17th century text sample, the rules are applicable to all texts, regard- less of time period and text genre. This clearly indicates that spelling correction is a useful strategy for applying contemporary NLP tools to historical text.

    Rule-Based Normalisation of Historical Text – a Diachronic Study

    No full text
    Language technology tools can be very use- ful for making information concealed in historical documents more easily accessi- ble to historians, linguists and other re- searchers in humanities. For many lan- guages, there is however a lack of linguis- tically annotated historical data that could be used for training NLP tools adapted to historical text. One way of avoiding the data sparseness problem in this context is to normalise the input text to a more modern spelling, before applying NLP tools trained on contemporary corpora. In this paper, we explore the impact of a set of hand-crafted normalisation rules on Swedish texts rang- ing from 1527 to 1812. Normalisation ac- curacy as well as tagging and parsing per- formance are evaluated. We show that, even though the rules were generated on the basis of one 17th century text sample, the rules are applicable to all texts, regard- less of time period and text genre. This clearly indicates that spelling correction is a useful strategy for applying contemporary NLP tools to historical text.

    Rule-Based Normalisation of Historical Text – a Diachronic Study

    No full text
    Language technology tools can be very use- ful for making information concealed in historical documents more easily accessi- ble to historians, linguists and other re- searchers in humanities. For many lan- guages, there is however a lack of linguis- tically annotated historical data that could be used for training NLP tools adapted to historical text. One way of avoiding the data sparseness problem in this context is to normalise the input text to a more modern spelling, before applying NLP tools trained on contemporary corpora. In this paper, we explore the impact of a set of hand-crafted normalisation rules on Swedish texts rang- ing from 1527 to 1812. Normalisation ac- curacy as well as tagging and parsing per- formance are evaluated. We show that, even though the rules were generated on the basis of one 17th century text sample, the rules are applicable to all texts, regard- less of time period and text genre. This clearly indicates that spelling correction is a useful strategy for applying contemporary NLP tools to historical text.
    corecore