3 research outputs found

    Explainable Publication Year Prediction of Eighteenth Century Texts with the BERT Model

    Get PDF
    In this paper, we describe a BERT model trained on the Eighteenth Century Collections Online (ECCO) dataset of digitized documents. The ECCO dataset poses unique modelling challenges due to the presence of Optical Character Recognition (OCR) artifacts. We establish the performance of the BERT model on a publication year prediction task against linear baseline models and human judgement, finding the BERT model to be superior to both and able to date the works, on average, with less than 7 years absolute error. We also explore how language change over time affects the model by analyzing the features the model uses for publication year predictions as given by the Integrated Gradients model explanation method.Peer reviewe

    Probabilistic Analysis of Early Modern British Book Prices

    Get PDF
    Books are a valuable exception to the general rule that quantitative information about early modern history is scarce, as their survival rate during the period has varied between low and high tens of percents, and descriptive information summarizing their properties has been collected to library catalogues. However, one critical element that is essential for the numeric characterisation of a print product is most often missing - its price. In this paper, we use an exceptionally large data set of price information extracted from the English Short Title Catalogue (ESTC) for the early modern period to train a probabilistic model that predicts the price of a print product based on its physical properties. Our results suggest that just the simple physical properties of the print products can explain a significant proportion of the variation in prices. We use the model to quantitatively address the debated question about development of print product prices in eighteenth century Britain. We interpret the predictions of the model as a data driven narrative, and many of the developments it brings up can be readily linked with the relevant historical literature.Peer reviewe
    corecore