6 research outputs found

    Late Latin Charter Treebank : contents and annotation

    Get PDF
    This paper describes the construction and annotation of the Late Latin Charter Treebank, a set of three dependency treebanks (LLCT1, LLCT2 and LLCT3) which together contain 1,261 Early Medieval Latin documentary texts (i.e., original charters) written in Italy between AD 714 and 1000 (about 594,000 tokens). The paper focusses on matters which a linguistically or philologically inclined user of LLCT needs to know: the criteria on which the charters were selected, the special characteristics of the annotation types utilised, and the geographical and chronological distribution of the data. In addition to normal queries on forms, lemmas, morphology and syntax, complex philological research settings are enabled by the textual annotation layer of LLCT, which indicates abbreviated and damaged words, as well as the formulaic and non-formulaic passages of each charter.Peer reviewe

    Annotation guidelines for morphological and morphosyntactic annotation of Merovingian Latin. Reference document for the Latin corpus PaLaFraLat. Version 1.2

    Get PDF
    The document provide the morphological and morphosyntactic annotation guidelines of the Merovingian Latin sub-corpus PaLaFraLat. PaLaFraLat is part of the bilingual diachronic corpus PaLaFra (http://www.palafra.org, http://txm.ish-lyon.cnrs.fr/bfm/); founded by DFG/ANR (2015-2018

    Annotation guidelines for morphological and morphosyntactic annotation of Merovingian Latin. Reference document for the Latin corpus PaLaFraLat. Version 1.2

    Get PDF
    The document provide the morphological and morphosyntactic annotation guidelines of the Merovingian Latin sub-corpus PaLaFraLat. PaLaFraLat is part of the bilingual diachronic corpus PaLaFra (http://www.palafra.org, http://txm.ish-lyon.cnrs.fr/bfm/); founded by DFG/ANR (2015-2018

    Machine Learning Algorithm for the Scansion of Old Saxon Poetry

    Get PDF
    Several scholars designed tools to perform the automatic scansion of poetry in many languages, but none of these tools deal with Old Saxon or Old English. This project aims to be a first attempt to create a tool for these languages. We implemented a Bidirectional Long Short-Term Memory (BiLSTM) model to perform the automatic scansion of Old Saxon and Old English poems. Since this model uses supervised learning, we manually annotated the Heliand manuscript, and we used the resulting corpus as labeled dataset to train the model. The evaluation of the performance of the algorithm reached a 97% for the accuracy and a 99% of weighted average for precision, recall and F1 Score. In addition, we tested the model with some verses from the Old Saxon Genesis and some from The Battle of Brunanburh, and we observed that the model predicted almost all Old Saxon metrical patterns correctly misclassified the majority of the Old English input verses

    Challenges in Annotating Medieval Latin Charters

    No full text
    No annotation guidelines concerning substandard Latin are presently available. This paper describes an annotation style of substandard Latin that supplements the method designed for standard Latin by the Perseus Latin Dependency Treebank and the Index Thomisticus Treebank. Each word of the corpus can be assigned only one morphological analysis. In our system, the analysis can be either functional or formal. Functional analysis is applied when a form is language-evolutionarily deducible from the corresponding standard Latin form used in the same (semantico-)syntactic function (e.g. solidus pro solidos ‘gold coins’ as a direct object: analysis “accusative”). Formal analysis applies when no connection to the functionally required classical form exists (e.g. heredibus pro heredes ‘heirs’ as a subject: analysis “ablative” or “dative”). When running queries on the corpus, the formally analysed forms can be isolated, and percentages of standard and substandard forms can be counted. In addition, further principles concerning syntax and specific morphological issues are introduced
    corecore