65 research outputs found
DHBeNeLux : incubator for digital humanities in Belgium, the Netherlands and Luxembourg
Digital Humanities BeNeLux is a grass roots initiative to foster knowledge networking and dissemination in digital humanities in Belgium, the Netherlands, and Luxembourg. This special issue highlights a selection of the work that was presented at the DHBenelux 2015 Conference by way of anthology for the digital humanities currently being done in the Benelux area and beyond. The introduction describes why this grass roots initiative came about and how DHBenelux is currently supporting community building and knowledge exchange for digital humanities in the Benelux area and how this is integrating regional digital humanities in the larger international digital humanities environment
Vector space explorations of literary language
Literary novels are said to distinguish themselves from other novels through conventions associated with literariness. We investigate the task of predicting the literariness of novels as perceived by readers, based on a large reader survey of contemporary Dutch novels. Previous research showed that ratings of literariness are predictable from texts to a substantial extent using machine learning, suggesting that it may be possible to explain the consensus among readers on which novels are literary as a consensus on the kind of writing style that characterizes literature. Although we have not yet collected human judgments to establish the influence of writing style directly (we use a survey with judgments based on the titles of novels), we can try to analyze the behavior of machine learning models on particular text fragments as a proxy for human judgments. In order to explore aspects of the texts associated with literariness, we divide the texts of the novels in chunks of 2--3 pages and create vector space representations using topic models (Latent Dirichlet Allocation) and neural document embeddings (Distributed Bag-of-Words Paragraph Vectors). We analyze the semantic complexity of the novels using distance measures, supporting the notion that literariness can be partly explained as a deviation from the norm. Furthermore, we build predictive models and identify specific keywords and stylistic markers related to literariness. While genre plays a role, we find that the greater part of factors affecting judgments of literariness are explicable in bag-of-words terms,even in short text fragments and among novels with higher literary ratings. The code and notebook used to produce the results in this paper are available at https://github.com/andreasvc/litvecspace
What Are You Trying to Say? The Interface as an Integral Element of Argument
Graphical interfaces to digital scholarly editions are usually regarded as disconnected from the content of the edition, enough so that an argument has developed against the use of interfaces at all. We argue in this paper that the indifference and even hostility to interfaces is caused by a widespread incomprehension of their argumentative utility. In a pair of case studies of published digital editions, we conduct a detailed examination of the argument their interface makes, and compare these interface rhetorics with the stated intentions of the editors, exposing a number of contradictions between âwordâ and âdeedâ in the interface designs. We end by advocating for an explicit consideration of the semiotic significance of the elements of a user interface: that editors reflect on what aspect of the argument their interface expresses, and how that is adding, or perhaps subtracting, from the points they wish to make
Teacher's corner : evaluating informative hypotheses using the Bayes factor in structural equation models
This Teacher's Corner paper introduces Bayesian evaluation of informative hypotheses for structural equation models, using the free open-source R packages bain, for Bayesian informative hypothesis testing, and lavaan, a widely used SEM package. The introduction provides a brief non-technical explanation of informative hypotheses, the statistical underpinnings of Bayesian hypothesis evaluation, and the bain algorithm. Three tutorial examples demonstrate informative hypothesis evaluation in the context of common types of structural equation models: 1) confirmatory factor analysis, 2) latent variable regression, and 3) multiple group analysis. We discuss hypothesis formulation, the interpretation of Bayes factors and posterior model probabilities, and sensitivity analysis
From Review to Genre to Novel and Back. An Attempt To Relate Reader Impact to Phenomena of Novel Text
We are interested in the textual features that correlate with reported
impact by readers of novels. We operationalize impact measurement through a
rule-based reading impact model and apply it to 634,614 reader reviews mined from
seven review platforms. We compute co-occurrence of impact-related terms and
their keyness for genres represented in the corpus. The corpus consists of the full text
of 18,885 books from which we derived topic models. The topics we find correlate
strongly with genre, and we get strong indicators for what key impact terms are
connected to which genre. These key impact terms gives us a first evidence-based
insight into genre-related readersâ motivations
Classifying Latin Inscriptions of the Roman Empire: A Machine-Learning Approach
Large-scale synthetic research in ancient history is often hindered by the incompatibility of tax- onomies used by different digital datasets. Using the example of enriching the Latin Inscriptions from the Roman Empire dataset (LIRE), we demonstrate that machine-learning classification mod- els can bridge the gap between two distinct classification systems and make comparative study possible. We report on training, testing and application of a machine learning classification model using inscription categories from the Epigraphic Database Heidelberg (EDH) to label inscriptions from the Epigraphic Database Claus-Slaby (EDCS). The model is trained on a labeled set of records included in both sources (N=46,171). Several different classification algorithms and parametriza- tions are explored. The final model is based on Extremely Randomized Trees algorithm (ET) and employs 10,055 features, based on several attributes. The final model classifies two thirds of a test dataset with 98% accuracy and 85% of it with 95% accuracy. After model selection and evaluation, we apply the model on inscriptions covered exclusively by EDCS (N=83,482) in an attempt to adopt one consistent system of classification for all records within the LIRE dataset
Exploring data provenance in handwritten text recognition infrastructure:Sharing and reusing ground truth data, referencing models, and acknowledging contributions. Starting the conversation on how we could get it done
This paper discusses best practices for sharing and reusing Ground Truth in Handwritten Text Recognition infrastructures, and ways to reference and acknowledge contributions to the creation and enrichment of data within these Machine Learning systems. We discuss how one can publish Ground Truth data in a repository and, subsequently, inform others. Furthermore, we suggest appropriate citation methods for HTR data, models, and contributions made by volunteers. Moreover, when using digitised sources (digital facsimiles), it becomes increasingly important to distinguish between the physical object and the digital collection. These topics all relate to the proper acknowledgement of labour put into digitising, transcribing, and sharing Ground Truth HTR data. This also points to broader issues surrounding the use of Machine Learning in archival and library contexts, and how the community should begin toacknowledge and record both contributions and data provenance
Exploring Data Provenance in Handwritten Text Recognition Infrastructure: Sharing and Reusing Ground Truth Data, Referencing Models, and Acknowledging Contributions. Starting the Conversation on How We Could Get It Done
This paper discusses best practices for sharing and reusing Ground Truth in Handwritten Text Recognition infrastructures, as well as ways to reference and acknowledge contributions to the creation and enrichment of data within these systems. We discuss how one can place Ground Truth data in a repository and, subsequently, inform others through HTR-United. Furthermore, we want to suggest appropriate citation methods for ATR data, models, and contributions made by volunteers. Moreover, when using digitised sources (digital facsimiles), it becomes increasingly important to distinguish between the physical object and the digital collection. These topics all relate to the proper acknowledgement of labour put into digitising, transcribing, and sharing Ground Truth HTR data. This also points to broader issues surrounding the use of machine learning in archival and library contexts, and how the community should begin to acknowledge and record both contributions and data provenance
- âŠ