10 research outputs found

    Detecting Sequential Genre Change in Eighteenth-Century Texts

    Get PDF
    Machine classification of historical books into genres is a common task for NLP-based classifiers and has a number of applications, from literary analysis to information retrieval. However it is not a straightforward task, as genre labels can be ambiguous and subject to temporal change, and moreoever many books consist of mixed or miscellaneous genres. In this paper we describe a work-in-progress method by which genre predictions can be used to determine longer sequences of genre change within books, which we test out with visualisations of some hand-picked texts. We apply state-of-the-art methods to the task, including a BERT-based transformer and character-level Perceiver model, both pre-trained on a large collection of eighteenth century works (ECCO), using a new set of hand-annotated documents created to reflect historical divisions. Results show that both models perform significantly better than a linear baseline, particularly when ECCO-BERT is combined with tfidf features, though for this task the character-level model provides no obvious advantage. Initial evaluation of the genre sequence method shows it may in the future be useful in determining and dividing the multiple genres of miscellaneous and hybrid historical texts.Peer reviewe

    Explainable Publication Year Prediction of Eighteenth Century Texts with the BERT Model

    Get PDF
    In this paper, we describe a BERT model trained on the Eighteenth Century Collections Online (ECCO) dataset of digitized documents. The ECCO dataset poses unique modelling challenges due to the presence of Optical Character Recognition (OCR) artifacts. We establish the performance of the BERT model on a publication year prediction task against linear baseline models and human judgement, finding the BERT model to be superior to both and able to date the works, on average, with less than 7 years absolute error. We also explore how language change over time affects the model by analyzing the features the model uses for publication year predictions as given by the Integrated Gradients model explanation method.Peer reviewe

    George Thomason’s Newsbooks

    No full text
    This is a review of George Thomason's Newsbooks

    A Comparative text similarity analysis of the works of Bernard Mandeville

    No full text
    Text similarity analysis entails studying identical and closely similar text passages across large corpora, with a particular focus on intentional and unintentional borrowing patterns. At a larger scale, detecting repeated passages takes on added importance, as the same text can convey different meanings in different contexts. This approach offers numerous benefits, enhancing intellectual and literary scholarship by simplifying the identification of textual overlaps. Consequently, scholars can focus on the theoretical aspects of reception with an expanded corpus of evidence at their disposal. This article adds to the expanding field of historical text reuse, applying it to intellectual history and showcasing its utility in examining reception, influence, popularity, authorship attribution, and the development of tools for critical editions. Focused on the works and various editions of Bernard Mandeville (1670–1733), the research applies comparative text similarity analysis to explore his borrowing habits and the reception of his works. Systematically examining text reuses across several editions of Mandeville’s works, it provides insights into the evolution of his output and influences over time. The article adopts a forward-looking perspective in historical research, advocating for the integration of archival and statistical evidence. This is illustrated through a detailed examination of the attribution of Publick Stews to Mandeville. Analysing cumulative negative evidence of borrowing patterns suggests that Mandeville might not have been the author of the piece. However, the article aims not to conclude the debate but rather to open it up, underscoring the importance of taking such evidence into consideration. Additionally, it encourages scholars to incorporate text reuse evidence when exploring other cases in early modern scholarship. This highlights the adaptability and scalability of text similarity analysis as a valuable tool for advancing literary studies and intellectual history

    Proceedings of the Computational Humanities Research Conference 2022

    No full text
    </p

    Reception Reader : Exploring Text Reuse in Early Modern British Publications

    No full text
    The Reception Reader is a web tool for studying text reuse in the Early English Books Online (EEBO-TCP) and Eighteenth Century Collections Online (ECCO) data. Users can: 1) explore a visual overview of the reception of a work, or its incoming connections, across time based on shared text segments, 2) interactively survey the details of connected documents, and 3) examine the context of reused text for “close reading”. We show examples of how the tool streamlines research and exploration tasks, and discuss the utility and limitations of the user interface along with its current data sources.Peer reviewe

    Explainable Publication Year Prediction of Eighteenth Century Texts with the BERT Model

    No full text
    In this paper, we describe a BERT model trained on the Eighteenth Century Collections Online (ECCO) dataset of digitized documents. The ECCO dataset poses unique modelling challenges due to the presence of Optical Character Recognition (OCR) artifacts. We establish the performance of the BERT model on a publication year prediction task against linear baseline models and human judgement, finding the BERT model to be superior to both and able to date the works, on average, with less than 7 years absolute error. We also explore how language change over time affects the model by analyzing the features the model uses for publication year predictions as given by the Integrated Gradients model explanation method.Peer reviewe
    corecore