Datasets for toponym recognition and disambiguation for nineteenth-century English newspapers

Abstract

We present two datasets, one for the task of toponym recognition and one for the task of toponym disambiguation. The datasets are derived from the "Dataset for Toponym Resolution in Nineteenth-Century English Newspapers" (DOI: https://doi.org/10.23636/r7d4-kw08). The toponym recognition dataset consists of two JSON files (ner_fine_train.json and ner_fine_dev.json), whereas the toponym disambiguation dataset is provided as a TSV file (linking_df_split.tsv)

    Similar works

    Full text

    thumbnail-image

    Available Versions