920 research outputs found

    Ex Machina: Electronic Resources for the Classics

    Get PDF

    A Legal Perspective on Training Models for Natural Language Processing

    Get PDF
    A significant concern in processing natural language data is the often unclear legal status of the input and output data/resources. In this paper, we investigate this problem by discussing a typical activity in Natural Language Processing: the training of a machine learning model from an annotated corpus. We examine which legal rules apply at relevant steps and how they affect the legal status of the results, especially in terms of copyright and copyright-related rights

    Proceedings of the Workshop on Challenges in the Management of Large Corpora (CMLC-10)

    Full text link

    From manuscript catalogues to a handbook of Syriac literature: Modeling an infrastructure for Syriaca.org

    Get PDF
    Despite increasing interest in Syriac studies and growing digital availability of Syriac texts, there is currently no up-to-date infrastructure for discovering, identifying, classifying, and referencing works of Syriac literature. The standard reference work (Baumstark's Geschichte) is over ninety years old, and the perhaps 20,000 Syriac manuscripts extant worldwide can be accessed only through disparate catalogues and databases. The present article proposes a tentative data model for Syriaca.org's New Handbook of Syriac Literature, an open-access digital publication that will serve as both an authority file for Syriac works and a guide to accessing their manuscript representations, editions, and translations. The authors hope that by publishing a draft data model they can receive feedback and incorporate suggestions into the next stage of the project.Comment: Part of special issue: Computer-Aided Processing of Intertextuality in Ancient Languages. 15 pages, 4 figure

    The use of corpora and other electronic tools in historical research on translation

    Get PDF
    [EN] Translation history and historiographical approaches to translation have traditionally relied on the knowledge provided by the historical context and both contextual and paratextual features of the translated texts together with their reception. Nonetheless, only by correlating historiographical insights with empirical evidence obtained from the translated texts will it be possible to produce a coherent and sound translation history. In this line of work, technology and digital humanities offer tools to the translation historian which that can complement non-computational methods and more traditional approaches to the sources and which that can be very beneficial if implemented correctly. This chapter advocates the use of tools such as corpora derived from linguistics to complement the research carried out from a historiographical point of view, while also indicating some of their possible drawbacks or limitations. In this increasingly technological world, the translation history researcher should be aware of both the opportunities and challenges provided by these tools and embrace their use with the aim of facilitating interdisciplinary avenues and progress in the field

    Manual to the LMEMT corpus

    Get PDF
    Peer reviewe

    Linguistics in the digital humanities: (computational) corpus linguistics

    Get PDF
    Corpus linguistics has been closely intertwined with digital technology since the introduction of university computer mainframes in the 1960s. Making use of both digitized data in the form of the language corpus and computational methods of analysis involving concordancers and statistics software, corpus linguistics arguably has a place in the digital humanities. Still, it remains obscure and fi gures only sporadically in the literature on the digital humanities. Th is article provides an overview of the main principles of corpus linguistics and the role of computer technology in relation to data and method and also off ers a bird's-eye view of the history of corpus linguistics with a focus on its intimate relationship with digital technology and how digital technology has impacted the very core of corpus linguistics and shaped the identity of the corpus linguist. Ultimately, the article is oriented towards an acknowledgment of corpus linguistics' alignment with the digital humanities
    corecore