920 research outputs found
A Legal Perspective on Training Models for Natural Language Processing
A significant concern in processing natural language data is the often unclear legal status of the input and output data/resources. In this paper, we investigate this problem by discussing a typical activity in Natural Language Processing: the training of a machine learning
model from an annotated corpus. We examine which legal rules apply at relevant steps and how they affect the legal status of the results, especially in terms of copyright and copyright-related rights
From manuscript catalogues to a handbook of Syriac literature: Modeling an infrastructure for Syriaca.org
Despite increasing interest in Syriac studies and growing digital
availability of Syriac texts, there is currently no up-to-date infrastructure
for discovering, identifying, classifying, and referencing works of Syriac
literature. The standard reference work (Baumstark's Geschichte) is over ninety
years old, and the perhaps 20,000 Syriac manuscripts extant worldwide can be
accessed only through disparate catalogues and databases. The present article
proposes a tentative data model for Syriaca.org's New Handbook of Syriac
Literature, an open-access digital publication that will serve as both an
authority file for Syriac works and a guide to accessing their manuscript
representations, editions, and translations. The authors hope that by
publishing a draft data model they can receive feedback and incorporate
suggestions into the next stage of the project.Comment: Part of special issue: Computer-Aided Processing of Intertextuality
in Ancient Languages. 15 pages, 4 figure
The use of corpora and other electronic tools in historical research on translation
[EN] Translation history and historiographical approaches to translation have traditionally relied on the knowledge provided by the historical context and both contextual and paratextual features of the translated texts together with their reception. Nonetheless, only by correlating historiographical insights with empirical evidence obtained from the translated texts will it be possible to produce a coherent and sound translation history. In this line of work, technology and digital humanities offer tools to the translation historian which that can complement non-computational methods and more traditional approaches to the sources and which that can be very beneficial if implemented correctly. This chapter advocates the use of tools such as corpora derived from linguistics to complement the research carried out from a historiographical point of view, while also indicating some of their possible drawbacks or limitations. In this increasingly technological world, the translation history researcher should be aware of both the opportunities and challenges provided by these tools and embrace their use with the aim of facilitating interdisciplinary avenues and progress in the field
Linguistics in the digital humanities: (computational) corpus linguistics
Corpus linguistics has been closely intertwined with digital technology since the introduction of university computer mainframes in the 1960s. Making use of both digitized data in the form of the language corpus and computational methods of analysis involving concordancers and statistics software, corpus linguistics arguably has a place in the digital humanities. Still, it remains obscure and fi gures only sporadically in the literature on the digital humanities. Th is article provides an overview of the main principles of corpus linguistics and the role of computer technology in relation to data and method and also off ers a bird's-eye view of the history of corpus linguistics with a focus on its intimate relationship with digital technology and how digital technology has impacted the very core of corpus linguistics and shaped the identity of the corpus linguist. Ultimately, the article is oriented towards an acknowledgment of corpus linguistics' alignment with the digital humanities
- …