22 research outputs found

    The structure and evolution of story networks

    Get PDF
    With this study, we advance the understanding about the processes through which stories are retold. A collection of story retellings can be considered as a network of stories, in which links between stories represent pre-textual (or ancestral) relationships. This study provides a mechanistic understanding of the structure and evolution of such story networks: we construct a story network for a large diachronic collection of Dutch literary retellings of Red Riding Hood, and compare this network to one derived from a corpus of paper chain letters. In the analysis, we first provide empirical evidence that the formation of these story networks is subject to age-dependent selection processes with a strong lopsidedness towards shorter time-spans between stories and their pre-texts (i.e. ‘young’ story versions are preferred in producing new versions). Subsequently, we systematically compare these findings with and among predictions of various formal models of network growth to determine more precisely which kinds of attractiveness are also at play or might even be preferred as explicatory models. By carefully studying the structure and evolution of the two story networks, then, we show that existing stories are differentially preferred to function as a new version's pre-text given three types of attractiveness: (i) frequency-based and (ii) model-based attractiveness which (iii) decays in time

    Check Your Privilege: The Digital Privilege Game

    Get PDF
    This paper describes the background and development of Check Your Privilege (https://privilege.huc.knaw.nl/), a digital privilege game designed to create awareness in the context of diversity and inclusion workshops

    Classifying Latin Inscriptions of the Roman Empire: A Machine-Learning Approach

    Get PDF
    Large-scale synthetic research in ancient history is often hindered by the incompatibility of tax- onomies used by different digital datasets. Using the example of enriching the Latin Inscriptions from the Roman Empire dataset (LIRE), we demonstrate that machine-learning classification mod- els can bridge the gap between two distinct classification systems and make comparative study possible. We report on training, testing and application of a machine learning classification model using inscription categories from the Epigraphic Database Heidelberg (EDH) to label inscriptions from the Epigraphic Database Claus-Slaby (EDCS). The model is trained on a labeled set of records included in both sources (N=46,171). Several different classification algorithms and parametriza- tions are explored. The final model is based on Extremely Randomized Trees algorithm (ET) and employs 10,055 features, based on several attributes. The final model classifies two thirds of a test dataset with 98% accuracy and 85% of it with 95% accuracy. After model selection and evaluation, we apply the model on inscriptions covered exclusively by EDCS (N=83,482) in an attempt to adopt one consistent system of classification for all records within the LIRE dataset

    Authenticating the writings of Julius Caesar

    Get PDF
    In this paper, we shed new light on the authenticity of the Corpus Caesarianum, a group of five commentaries describing the campaigns of Julius Caesar (100–44 BC), the founder of the Roman empire. While Caesar himself has authored at least part of these commentaries, the authorship of the rest of the texts remains a puzzle that has persisted for nineteen centuries. In particular, the role of Caesar’s general Aulus Hirtius, who has claimed a role in shaping the corpus, has remained in contention. Determining the authorship of documents is an increasingly important authentication problem in information and computer science, with valuable applications, ranging from the domain of art history to counter-terrorism research. We describe two state-of-the-art authorship verification systems and benchmark them on 6 present-day evaluation corpora, as well as a Latin benchmark dataset. Regarding Caesar’s writings, our analyses allow us to establish that Hirtius’s claims to part of the corpus must be considered legitimate. We thus demonstrate how computational methods constitute a valuable methodological complement to traditional, expert-based approaches to document authentication

    Demystifying Chao1 with Good-Turing

    No full text
    To estimate the ”biodiversity” in a particular area – or, in other words, the number of unique species living in a given environment – ecologists usually have no other option than to rely on incomplete samples. For all sorts of practical reasons, it is virtually impossible to spot all species that are actually present in an area, and hence certain species will be missing in the counts. Consequently, an important research question in ecology is how to reliably estimate the resulting ”bias” between the number of species that was observerd and the true number of unique species in an area. In this brief note, I try to demystify a famous unseen species model, Chao1, in terms of Turing’s work. I hope that this will help people unfamiliar with the method to understand exactly how it works, and, perhaps more importantly, how (not) to interpret the calculated value

    Proceedings of the Computational Humanities Research Conference 2022

    No full text

    Supplemental Materials for "Humanities Data Analysis"

    No full text
    Data discussed in the manuscript "Humanities Data Analysis: Case Studies with Python". Each folder in this dataset contains data used or discussed in one chapter. The data itself resides in a ``data`` directory inside each folder. Each ``data`` directory contains a ``README`` file which describes the files found in the directory. Most of the data are texts published before 1900. These texts are in the public domain
    corecore