Search CORE

3 research outputs found

Rule-Based Normalisation of Historical Text – a Diachronic Study

Author: Megyesi Beata
Nivre Joakim
Pettersson Eva
Publication venue: Wien : Österreichische Gesellschaft für Artificial Intelligence (ÖGAI)
Publication date: 01/01/2012
Field of study

Language technology tools can be very use- ful for making information concealed in historical documents more easily accessi- ble to historians, linguists and other re- searchers in humanities. For many lan- guages, there is however a lack of linguis- tically annotated historical data that could be used for training NLP tools adapted to historical text. One way of avoiding the data sparseness problem in this context is to normalise the input text to a more modern spelling, before applying NLP tools trained on contemporary corpora. In this paper, we explore the impact of a set of hand-crafted normalisation rules on Swedish texts rang- ing from 1527 to 1812. Normalisation ac- curacy as well as tagging and parsing per- formance are evaluated. We show that, even though the rules were generated on the basis of one 17th century text sample, the rules are applicable to all texts, regard- less of time period and text genre. This clearly indicates that spelling correction is a useful strategy for applying contemporary NLP tools to historical text.

Publikationer från Uppsala Universitet

Rule-Based Normalisation of Historical Text – a Diachronic Study

Author: Megyesi Beata
Nivre Joakim
Pettersson Eva
Publication venue: Wien : Österreichische Gesellschaft für Artificial Intelligence (ÖGAI)
Publication date: 01/01/2012
Field of study

Publikationer från Uppsala Universitet

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Rule-Based Normalisation of Historical Text – a Diachronic Study

Author: Megyesi Beata
Nivre Joakim
Pettersson Eva
Publication venue: Wien : Österreichische Gesellschaft für Artificial Intelligence (ÖGAI)
Publication date
Field of study