39 research outputs found

    Towards a first digital edition of the oldest surviving manuscript of St Augustine's De civitate Dei

    Get PDF
    This thesis describes the creation of a pilot digital edition of MS XXVIII(26), the oldest surviving manuscript of Saint Augustine’s (354-430 AD) monumental De civitate Dei (The City of God). Also known as Manuscript V[eronensis], MS XXVIII(26) dates back to the early fifth century AD and is housed in the chapter library of Verona, Italy. As contemporary to Saint Augustine himself, it is a particularly treasured object of study. This thesis reassesses extant research about this manuscript, collecting information about its disputed provenance, historical context, materiality, tradition, and conservation. In doing so, it investigates how the manuscript can be best reproduced as a digital edition by way of two surveys designed to better understand how digital editions are respectively being created and used. The survey devoted to the study of how digital editions are being built has become a publicly available digital resource in collaboration with the Austrian Academy of Sciences. The resource, known as the Catalogue of Digital Editions, aggregates and catalogues a large number of digital editions in an effort to delineate the field’s status quo and spawn new quantitative and qualitative research. The community survey devoted to the study of how digital editions are being used is one of the very few as well as the largest in the field yet. The over 200 responses received give detailed information regarding the expectations of digital editions provided by the Digital Humanities community and point to many areas for further improvement. A comparative analysis of the results from the two surveys suggests that while creators are aware of and adhere to standards of creation, much work remains to be done to address the needs of a diverse range of users. With this information, digital editors in the Digital Humanities can better shape future projects and thus contribute to the production of ever-useful digital cultural resources. This information is also guiding the creation of a pilot digital edition of MS XXVIII(26), which remains to be user-tested but serves as the first digital reproduction of the oldest surviving manuscript of Saint Augustine’s De civitate Dei. The research described in this thesis has led to the formulation of recommendations for those embarking on the creation of a digital edition. Specifically, creators are advised to get access to the original documents and to high resolution images, to provide transcriptions of the text in multiple formats so as to enable further research and data reuse in a variety of academic contexts, to provide detailed documentation of the editorial and technological components of the project, to make as much data available under open licences and, finally, to conduct, and report on, user studies of the digital edition

    Mining and Analysing One Billion Requests to Linguistic Services

    Get PDF
    From 2004 to 2016 the Leipzig Linguistic Services (LLS) existed as a SOAP-based cyber infrastructure of atomic micro-services for the Wortschatz project, which covered different-sized textual corpora in more than 230 languages. The LLS were developed in 2004 and went live in 2005 in order to provide a Web service-based API to these corpus databases. In 2006, the LLS infrastructure began to systematically log and store requests made to the text collection, and in August 2016 the LLS were shut down. This article summarises the experience of the past ten years of running such a cyberinfrastructure with a total of nearly one billion requests. It includes an explanation of the technical decisions and limitations but also provides an overview of how the services were used

    Digital editions of text:Surveying user requirements in the Digital Humanities

    Get PDF
    This article presents the findings of a web survey designed to better understand the expectations and use of digital editions of texts. The survey, modelled upon a detailed analysis of 242 projects, recorded 218 complete responses, shedding light on user requirements of digital editions. Specifically, the survey indicates that issues of data reuse, licensing, image availability, and comprehensive documentation are the most requested features of digital editions, although ones which seldom are provided. This analysis feeds into previous studies on good practice in building Digital Humanities resources and puts forward practical recommendations for both creators and funders of digital editions in an effort to promote a stronger consideration of user needs. This survey will be of interest to those who produce digital editions of texts, including developers and engineers, and will also be of interest to those who commission and fund these projects, such as universities, libraries, and archives, whose documentary collections are often showcased in digital editions

    Visual Text Analysis in Digital Humanities

    Get PDF
    In 2005, Franco Moretti introduced Distant Reading to analyse entire literary text collections. This was a rather revolutionary idea compared to the traditional Close Reading, which focuses on the thorough interpretation of an individual work. Both reading techniques are the prior means of Visual Text Analysis. We present an overview of the research conducted since 2005 on supporting text analysis tasks with close and distant reading visualizations in the digital humanities. Therefore, we classify the observed papers according to a taxonomy of text analysis tasks, categorize applied close and distant reading techniques to support the investigation of these tasks and illustrate approaches that combine both reading techniques in order to provide a multi-faceted view of the textual data. In addition, we take a look at the used text sources and at the typical data transformation steps required for the proposed visualizations. Finally, we summarize collaboration experiences when developing visualizations for close and distant reading, and we give an outlook on future challenges in that research area

    The Linked Fragment: TEI and the encoding of text reuses of lost authors

    Get PDF
    This paper presents a joint project of the Humboldt Chair of Digital Humanities at the University of Leipzig, the Perseus Digital Library at Tufts University, and the Harvard Center for Hellenic Studies to produce a new open series of Greek and Latin fragmentary authors. Such authors are lost and their works are preserved only thanks to quotations and text reuses in later texts. The project is undertaking two tasks: (1) the digitization of paper editions of fragmentary works with links to the source texts from which the fragments have been extracted; (2) the production of born-digital editions of fragmentary works. The ultimate goals are the creation of open, linked, machine-actionable texts for the study and advancement of the field of Classical textual fragmentary heritage and the development of a collaborative environment for crowdsourced annotations. These goals are being achieved by implementing the Perseids Platform and by encoding the Fragmenta Historicorum Graecorum, one of the most important and comprehensive collections of fragmentary authors

    Open Philology at the University of Leipzig

    Get PDF
    The Open Philology Project at the University of Leipzig aspires to re-assert the value of philology in its broadest sense. Philology signifies the widest possible use of the linguistic record to enable a deep understanding of the complete lived experience of humanity. Pragmatically, we focus on Greek and Latin because (1) substantial collections and services are already available within these languages, (2) substantial user communities exist (c. 35,000 unique users a month at the Perseus Digital Library), and (3) a European-based project is better positioned to process extensive cultural heritage materials in these languages rather than in Chinese or Sanskrit. The Open Philology Project has been designed with the hope that it can contribute to any historical language that survives within the human record. It includes three tasks: (1) the creation of an open, extensible, repurposable collection of machine-readable linguistic sources; (2) the development of dynamic textbooks that use annotated corpora to customize the vocabulary and grammar of texts that learners want to read, and at the same time engage students in collaboratively producing new annotated data; (3) the establishment of new workflows for, and forms of, publication, from individual annotations with argumentation to traditional publications with integrated machine-actionable data

    Using and evaluating TRACER for an Index fontium computatus of the Summa contra Gentiles of Thomas Aquinas

    Get PDF
    This article describes a computational text reuse study on Latin texts designed to evaluate the performance of TRACER, a language-agnostic text reuse detection engine. As a case study, we use the Index Thomisticus as a gold standard to measure the performance of the tool in identifying text reuse between Thomas Aquinas’ Summa contra Gentiles and his sources.Questo articolo descrive un’analisi computazionale effettuata su testi latini volta a valutare le prestazioni di TRACER, uno strumento “language-agnostic” per l’identificazione automatica del riuso testuale. Il caso studio scelto a tale scopo si avvale dell’Index Thomisticus quale gold standard per verificare l’efficacia di TRACER nel recupero di citazioni delle fonti della Summa contra Gentiles di Tommaso d’Aquino

    Using and evaluating TRACER for an Index fontium computatus of the Summa contra Gentiles of Thomas Aquinas

    Get PDF
    This article describes a computational text reuse study on Latin texts designed to evaluate the performance of TRACER, a language-agnostic text reuse detection engine. As a case study, we use the Index Thomisticus as a gold standard to measure the performance of the tool in identifying text reuse between Thomas Aquinas’ Summa contra Gentiles and his sources.Questo articolo descrive un’analisi computazionale effettuata su testi latini volta a valutare le prestazioni di TRACER, uno strumento “language-agnostic” per l’identificazione automatica del riuso testuale. Il caso studio scelto a tale scopo si avvale dell’Index Thomisticus quale gold standard per verificare l’efficacia di TRACER nel recupero di citazioni delle fonti della Summa contra Gentiles di Tommaso d’Aquino

    Attributing Authorship in the Noisy Digitized Correspondence of Jacob and Wilhelm Grimm

    Get PDF
    This article presents the results of a multidisciplinary project aimed at better understanding the impact of different digitization strategies in computational text analysis. More specifically, it describes an effort to automatically discern the authorship of Jacob and Wilhelm Grimm in a body of uncorrected correspondence processed by HTR (Handwritten Text Recognition) and OCR (Optical Character Recognition), reporting on the effect this noise has on the analyses necessary to computationally identify the different writing style of the two brothers. In summary, our findings show that OCR digitization serves as a reliable proxy for the more painstaking process of manual digitization, at least when it comes to authorship attribution. Our results suggest that attribution is viable even when using training and test sets from different digitization pipelines. With regards to HTR, this research demonstrates that even though automated transcription significantly increases the risk of text misclassification when compared to OCR, a cleanliness above ≈ 20% is already sufficient to achieve a higher-than-chance probability of correct binary attribution

    Prefazione

    Get PDF
    Preface of the volum
    corecore