Search CORE

7 research outputs found

Piloting a workflow for extracting author citations from Samuel Johnson's Dictionary of the English Language

Author: Dubnicek Ryan
Wong Jasmine
Publication venue: 'iSchools'
Publication date: 23/03/2020
Field of study

Since the 18th century, English-language dictionaries have used quo- tations from written works to illustrate a word's use in context. These quotations form a link between language authority and literary authority. In this paper we pilot a workflow for identifying, extracting, and counting author citations in Samuel Johnson's Dictionary of the English Language to investigate how au- thors in a defined corpus are represented. We consider how these authors are distributed across the text and compare our results to past studies that used dif- ferent methodologies. We find a consistency that encourages the broader appli- cation of our workflow on other dictionary texts, enabling further study of au- thor citations in dictionaries across time

Illinois Digital Environment for Access to Learning and Scholarship Repository

Creating A Disability Corpus for Literary Analysis: Pilot Classification Experiments

Author: Downie J. Stephen
Dubnicek Ryan
Underwood Ted
Publication venue: 'iSchools'
Publication date: 01/01/2018
Field of study

As literary text opens to researchers for distant reading, the computational analysis of large corpora of text for literary scholarship, problems beyond typical data science roadblocks, such as data scale and statistical significance of findings have emerged. For scholars studying character and social representation in literature, the identification of characters within the given classes of study is crucial, painstaking, and often a manual process. However, for characters with disabilities, manual identification is prohibitively difficult to undertake at scale, and especially challenging given the coded textual markers that can be used to refer to disability. There currently exists no corpus of characters in fiction with disabilities, which is the first step to at-scale computational study of this topic. This project seeks to pilot a classification process using manually assigned ground truth on a subset of volumes from the HathiTrust. Having successfully built and evaluated a Naïve Bayes classifier, we suggest full-scale deployment of a statistical classifier on a large corpus of literature in order to assemble a disability corpus. This project also covers preliminary exploratory textual analysis of characters with disabilities to yield potential research questions for further exploration

Illinois Digital Environment for Access to Learning and Scholarship Repository

Evaluating a Machine Learning Approach to Identifying Expressive Content at Page Level in HathiTrust

Author: Kristina Hall
NIKOLAUS PARULIAN
Ryan Dubnicek
Stephen Downie
Yuerong Hu
Publication venue: 'Modern Language Association'
Publication date: 01/01/2020
Field of study

HathiTrust currently provides metadata, scanned images, and full text for all public domain volumes. However, it’s likely there is content that is of interest to scholars and free from restriction within the front matter of most volumes, regardless of rights status. For example, the title page or table of contents may contain information that is likely non-expressive and useful to understanding the content’s structure and subject matter. It’s also likely that some volumes include materials that have expressive/creative content in the first 20 pages, so front matter cannot be made open for all volumes without understanding the most frequent type of content within the first 20 pages. This task is time-prohibitive for entirely manual exploration, so we seek to evaluate a machine learning approach for this task

Humanities Commons

Extending the Utility of the HTRC Extracted Features Dataset Through Linked Data

Author: Boris Capitanu
Deren Kudeki
J Stephen Downie
Jacob Jett
Ryan Dubnicek
Timothy W Cole
Publication venue: 'Modern Language Association'
Publication date: 01/01/2020
Field of study

Poster accompanying previously submitted poster abstract

Humanities Commons

Exploring the Benefits for Users of Linked Open Data for Digitized Special Collections, White paper #2: Analysis of Early User Feedback

Author: Cole Timothy
Dubnicek Ryan
Fenlon Katrina
Jett Jacob
Kinnaman Alex
Kudeki Deren
Szylowicz Caroline
Zavala Melina
Publication venue
Publication date: 01/06/2018
Field of study

This paper reports on a research study conducted to evaluate experimental, LOD-based features of digital special collections, which investigated the question: how do these features affect the use of digital collections for research? Because humanities researchers are the primary user group for cultural collections, this study focused on what humanities researchers might gain from LOD-based enhancements to digital collections.Andrew W. Mellon Foundation Award No. 31500650Ope

Illinois Digital Environment for Access to Learning and Scholarship Repository

Bridging the information gap between structural and note-level musical datasets

Author: Downie J. Stephen
Dubnicek Ryan
Hu Yuerong
Page Kevin R.
Weigl David M.
Publication venue: 'iSchools'
Publication date: 15/03/2019
Field of study

While there are an increasing number of datasets containing various features of musical information, the lack of connections between them remains a barrier to their use in research. For example, one dataset might encode the identification of structural segments by musicologists in audio recordings, while another dataset could contain a symbolic encoding of the music notation being played in that audio recording. Without explicit connections, there is a significant extra work in realizing their potential for musicological study. In this paper we investigate how Linked Data can be used to implement such connections, specifically between the McGill Billboard corpus of structural annotations and the MIDI Linked Data Cloud (MIDI-LD). Firstly, we republish structural information from Billboard as RDF. We then align this structural data with matching symbolic encodings in MIDI-LD; before finally linking individual structural annotations from Billboard to note-level sections in the MIDI-LD. Our alignments enable cross-referencing and combined queries for musicological analysis across the enriched union dataset, and serve as a model for the creation of information resources comprising musical structures at varying granularity

Illinois Digital Environment for Access to Learning and Scholarship Repository

Bridging the information gap between structural and note-level musical datasets

Author: Downie J. Stephen
Dubnicek Ryan
Hu Yuerong
Page Kevin R.
Weigl David M.
Publication venue: 'iSchools'
Publication date
Field of study