4 research outputs found
A Large-Scale Comparison of Historical Text Normalization Systems
There is no consensus on the state-of-the-art approach to historical text
normalization. Many techniques have been proposed, including rule-based
methods, distance metrics, character-based statistical machine translation, and
neural encoder--decoder models, but studies have used different datasets,
different evaluation methods, and have come to different conclusions. This
paper presents the largest study of historical text normalization done so far.
We critically survey the existing literature and report experiments on eight
languages, comparing systems spanning all categories of proposed normalization
techniques, analysing the effect of training data quantity, and using different
evaluation methods. The datasets and scripts are made publicly available.Comment: Accepted at NAACL 201
LL(O)D and NLP perspectives on semantic change for humanities research
CC BY 4.0This paper presents an overview of the LL(O)D and NLP methods, tools and data for detecting and representing semantic change, with its main application in humanities research. The paper’s aim is to provide the starting point for the construction of a workflow and set of multilingual diachronic ontologies within the humanities use case of the COST Action Nexus Linguarum, European network for Web-centred linguistic data science, CA18209. The survey focuses on the essential aspects needed to understand the current trends and to build applications in this area of study