41 research outputs found

    Dependency Parsing using Prosody Markers from a Parallel Text

    Get PDF
    Proceedings of the Ninth International Workshop on Treebanks and Linguistic Theories. Editors: Markus Dickinson, Kaili Müürisep and Marco Passarotti. NEALT Proceedings Series, Vol. 9 (2010), 127-138. © 2010 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/15891

    Digital Classical Philology

    Get PDF
    The buzzwords “Information Society” and “Age of Access” suggest that information is now universally accessible without any form of hindrance. Indeed, the German constitution calls for all citizens to have open access to information. Yet in reality, there are multifarious hurdles to information access – whether physical, economic, intellectual, linguistic, political, or technical. Thus, while new methods and practices for making information accessible arise on a daily basis, we are nevertheless confronted by limitations to information access in various domains. This new book series assembles academics and professionals in various fields in order to illuminate the various dimensions of information's inaccessability. While the series discusses principles and techniques for transcending the hurdles to information access, it also addresses necessary boundaries to accessability.This book describes the state of the art of digital philology with a focus on ancient Greek and Latin. It addresses problems such as accessibility of information about Greek and Latin sources, data entry, collection and analysis of Classical texts and describes the fundamental role of libraries in building digital catalogs and developing machine-readable citation systems

    The SIGMORPHON 2019 Shared Task: Morphological Analysis in Context and Cross-Lingual Transfer for Inflection

    Full text link
    The SIGMORPHON 2019 shared task on cross-lingual transfer and contextual analysis in morphology examined transfer learning of inflection between 100 language pairs, as well as contextual lemmatization and morphosyntactic description in 66 languages. The first task evolves past years' inflection tasks by examining transfer of morphological inflection knowledge from a high-resource language to a low-resource language. This year also presents a new second challenge on lemmatization and morphological feature analysis in context. All submissions featured a neural component and built on either this year's strong baselines or highly ranked systems from previous years' shared tasks. Every participating team improved in accuracy over the baselines for the inflection task (though not Levenshtein distance), and every team in the contextual analysis task improved on both state-of-the-art neural and non-neural baselines.Comment: Presented at SIGMORPHON 201

    The emerging digital infrastructure for research in the humanities

    Get PDF
    This article advances the thesis that three decades of investments by national and international funders, combined with those of scholars, technologists, librarians, archivists, and their institutions, have resulted in a digital infrastructure in the humanities that is now capable of supporting end-to-end research workflows. The article refers to key developments in the epigraphy and paleography of the premodern period. It draws primarily on work in classical studies but also highlights related work in the adjacent disciplines of Egyptology, ancient Near East studies, and medieval studies. The argument makes a case that much has been achieved but it does not declare “mission accomplished.” The capabilities of the infrastructure remain unevenly distributed within and across disciplines, institutions, and regions. Moreover, the components, including the links between steps in the workflow, are generally far from user-friendly and seamless in operation. Because further refinements and additional capacities are still much needed, the article concludes with a discussion of key priorities for future work

    Proceedings

    Get PDF
    Proceedings of the Ninth International Workshop on Treebanks and Linguistic Theories. Editors: Markus Dickinson, Kaili Müürisep and Marco Passarotti. NEALT Proceedings Series, Vol. 9 (2010), 268 pages. © 2010 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/15891

    Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020

    Get PDF
    On behalf of the Program Committee, a very warm welcome to the Seventh Italian Conference on Computational Linguistics (CLiC-it 2020). This edition of the conference is held in Bologna and organised by the University of Bologna. The CLiC-it conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after six years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges

    Statistical Parsing by Machine Learning from a Classical Arabic Treebank

    Get PDF
    Research into statistical parsing for English has enjoyed over a decade of successful results. However, adapting these models to other languages has met with difficulties. Previous comparative work has shown that Modern Arabic is one of the most difficult languages to parse due to rich morphology and free word order. Classical Arabic is the ancient form of Arabic, and is understudied in computational linguistics, relative to its worldwide reach as the language of the Quran. The thesis is based on seven publications that make significant contributions to knowledge relating to annotating and parsing Classical Arabic. Classical Arabic has been studied in depth by grammarians for over a thousand years using a traditional grammar known as i’rāb (إعغاة ). Using this grammar to develop a representation for parsing is challenging, as it describes syntax using a hybrid of phrase-structure and dependency relations. This work aims to advance the state-of-the-art for hybrid parsing by introducing a formal representation for annotation and a resource for machine learning. The main contributions are the first treebank for Classical Arabic and the first statistical dependency-based parser in any language for ellipsis, dropped pronouns and hybrid representations. A central argument of this thesis is that using a hybrid representation closely aligned to traditional grammar leads to improved parsing for Arabic. To test this hypothesis, two approaches are compared. As a reference, a pure dependency parser is adapted using graph transformations, resulting in an 87.47% F1-score. This is compared to an integrated parsing model with an F1-score of 89.03%, demonstrating that joint dependency-constituency parsing is better suited to Classical Arabic. The Quran was chosen for annotation as a large body of work exists providing detailed syntactic analysis. Volunteer crowdsourcing is used for annotation in combination with expert supervision. A practical result of the annotation effort is the corpus website: http://corpus.quran.com, an educational resource with over two million users per year

    Proceedings of the Eighth Italian Conference on Computational Linguistics CliC-it 2021

    Get PDF
    The eighth edition of the Italian Conference on Computational Linguistics (CLiC-it 2021) was held at Università degli Studi di Milano-Bicocca from 26th to 28th January 2022. After the edition of 2020, which was held in fully virtual mode due to the health emergency related to Covid-19, CLiC-it 2021 represented the first moment for the Italian research community of Computational Linguistics to meet in person after more than one year of full/partial lockdown
    corecore