6 research outputs found
Late Latin Charter Treebank : contents and annotation
This paper describes the construction and annotation of the Late Latin Charter Treebank, a set of three dependency treebanks (LLCT1, LLCT2 and LLCT3) which together contain 1,261 Early Medieval Latin documentary texts (i.e., original charters) written in Italy between AD 714 and 1000 (about 594,000 tokens). The paper focusses on matters which a linguistically or philologically inclined user of LLCT needs to know: the criteria on which the charters were selected, the special characteristics of the annotation types utilised, and the geographical and chronological distribution of the data. In addition to normal queries on forms, lemmas, morphology and syntax, complex philological research settings are enabled by the textual annotation layer of LLCT, which indicates abbreviated and damaged words, as well as the formulaic and non-formulaic passages of each charter.Peer reviewe
Morphosyntactic realignment and markedness change in Late Latin : Evidence from charter texts
Peer reviewe
Annotation guidelines for morphological and morphosyntactic annotation of Merovingian Latin. Reference document for the Latin corpus PaLaFraLat. Version 1.2
The document provide the morphological and morphosyntactic annotation guidelines of the Merovingian Latin sub-corpus PaLaFraLat. PaLaFraLat is part of the bilingual diachronic corpus PaLaFra (http://www.palafra.org, http://txm.ish-lyon.cnrs.fr/bfm/); founded by DFG/ANR (2015-2018
Annotation guidelines for morphological and morphosyntactic annotation of Merovingian Latin. Reference document for the Latin corpus PaLaFraLat. Version 1.2
The document provide the morphological and morphosyntactic annotation guidelines of the Merovingian Latin sub-corpus PaLaFraLat. PaLaFraLat is part of the bilingual diachronic corpus PaLaFra (http://www.palafra.org, http://txm.ish-lyon.cnrs.fr/bfm/); founded by DFG/ANR (2015-2018
Machine Learning Algorithm for the Scansion of Old Saxon Poetry
Several scholars designed tools to perform the automatic scansion of poetry in many languages, but none of these tools
deal with Old Saxon or Old English. This project aims to be a first attempt to create a tool for these languages. We
implemented a Bidirectional Long Short-Term Memory (BiLSTM) model to perform the automatic scansion of Old Saxon
and Old English poems. Since this model uses supervised learning, we manually annotated the Heliand manuscript, and
we used the resulting corpus as labeled dataset to train the model. The evaluation of the performance of the algorithm
reached a 97% for the accuracy and a 99% of weighted average for precision, recall and F1 Score. In addition, we tested
the model with some verses from the Old Saxon Genesis and some from The Battle of Brunanburh, and we observed that
the model predicted almost all Old Saxon metrical patterns correctly misclassified the majority of the Old English input
verses
Challenges in Annotating Medieval Latin Charters
No annotation guidelines concerning substandard Latin are presently available.
This paper describes an annotation style of substandard Latin that supplements
the method designed for standard Latin by the Perseus Latin Dependency
Treebank and the Index Thomisticus Treebank. Each word of the corpus can be
assigned only one morphological analysis. In our system, the analysis can be
either functional or formal. Functional analysis is applied when a form is
language-evolutionarily deducible from the corresponding standard Latin form
used in the same (semantico-)syntactic function (e.g. solidus pro solidos âgold
coinsâ as a direct object: analysis âaccusativeâ). Formal analysis applies when
no connection to the functionally required classical form exists (e.g. heredibus
pro heredes âheirsâ as a subject: analysis âablativeâ or âdativeâ). When running
queries on the corpus, the formally analysed forms can be isolated, and
percentages of standard and substandard forms can be counted. In addition,
further principles concerning syntax and specific morphological issues are
introduced