101 research outputs found

    The iDAI.publication: extracting and linking information in the publications of the German Archaeological Institute (DAI)

    Get PDF
    We present the results of our attempt to use NLP tools in order to identify named entities in the publications of the Deutsches Archäologisches Institute (DAI) and link the identified locations to entries in the iDAI.gazetteer. Our case study focuses on articles written in German and published in the journal Chiron between 1971 and 2014. We describe the annotation pipeline that starts from the digitized texts published in the new portal of the DAI. We evaluate the performances of geoparsing and NER and test an approach to improve the accuracy of the latter.Il paper descrive i risultati dell’esperimento di applicazione di strumenti di NLP per annotare le Named Entities nelle pubblicazioni del Deutsches Archäologisches Institute (DAI) e collegare i toponimi identificati alle rispettive voci dell’iDAI.gazetteer. Il nostro studio si concentra sugli articoli in tedesco pubblicati nella rivista Chiron tra il 1974 e il 2014. Descriviamo la pipeline di annotazione impiegata per processare gli articoli disponibili nel nuovo portale per le pubblicazioni del DAI. Discutiamo i risultati della valutazione degli script di geoparsing e NER e, infine, proponiamo un approccio per migliorare l’accuratezza in quest’ultimo task

    Treebanking in the world of Thucydides. Linguistic annotation for the Hellespont Project

    Get PDF
    The Hellespont project (DAI, Tufts University) aims to structure the text of a passage from the ancient Greek historian Thucydides (1.89-118), in order to highlight events, persons and peoples that populate the world of the author and connect the different digital sources available for their study. Event annotation in the text in particular requires an in-depth linguistic analysis of morphology, syntax and semantics. However, the available resources for Ancient Greek do not provide adequate standards to support the encoding of semantic and pragmatic phenomena in Ancient Greek texts. In this paper, we discuss the motivation of the project and how we adapted the so called tectogrammatical annotation of the Prague Dependency Treebank to identify the events and describe their structure. The linguistic notion of valency, which is central to tectogrammatical sentence representation, proves very useful for this analysis of Ancient Greek

    The Ancient Greek Dependency Treebank: Linguistic Annotation in a Teaching Environment

    Get PDF
    This chapter argues that manual linguistic annotation of Ancient Greek texts can be effectively employed to teach of Greek literature and languages. Under the supervision of a teacher, students can be engaged into the ongoing creation of the Ancient Greek Dependency Treebank. With the help of one example from Sophocles (Tr. 962\u20133), we will illustrate how the collective work of treebanking in a class environment provides an ideal occasion to discuss the methods of Classical Philology and the history of interpretation of a given passage; more importantly, while producing a treebank annotation, students can learn how to read a complex text in its literary and communicative context following the methods of textual criticism. New and old research questions emerge from the work; at the same time, through the final annotation the students will produce a tangible contribution to a crucial initiative that is likely to change the way Greek grammar will be studied in the future

    Nominal vs copular clauses in a diachronic corpus of Ancient Greek historians. A treebank-based analysis

    Get PDF
    We study the distribution of the nominal and copular construction of predicate nominals in a subset of authors from the Ancient Greek Dependency Treebank (AGDT). We concentrate on the texts of the historians Herodotus, Thucydides (both 5th century BCE) and Polybius (2nd century BCE). The data comprise a sample of 440 sentences (Hdt = 175, Thuc = 91, Pol = 174). We analyze the impact of four features that have been discussed in the literature and can be observed in the annotation of AGDT: (1) order of constituents, (2) part of speech of the subjects, (3) type of clause and (4) length of the clause. Furthermore, we test how the predictive power of these factors varies in time from Herodotus and Thucydides to Polybius with the help of a logistic-regression model. The analysis shows that, contrary to a simplistic opinion, the nominal construction does not drop into irrelevance in Hellenistic Greek. Moreover, an analysis of the distributions in the authors highlights a remarkable continuity in the usage patterns. Further work is needed to improve the predictive power of our logistic-regression model and to integrate more data in view of a more comprehensive quantitative diachronic study

    Issues in Building the LiLa Knowledge Base of Interoperable Linguistic Resources for Latin

    Get PDF
    Purpose: This abstract presents the architecture and the current state of the LiLa Knowledge Base (https://lila-erc.eu), i.e., a collection of multifarious linguistic resources for Latin described with the same vocabulary of knowledge description, by using common data categories and ontologies developed by the Linguistic Linked Open Data (LLOD) community according to the principles of the Linked Data paradigm

    The Syntax of the Heroes? A Treebank-Based Approach to the Language of the Sophoclean Characters

    Get PDF
    This paper lays the foundation for a treebank-based studies of the syntax of the characters and choruses in Sophocles. The complete mopho-syntactic annotation encoded in the Ancient Greek and Latin Dependency Treebank (AGLDT), published by the Perseus Project, is used to extract information and statistics on the syntactic constructions from five of the seven extant tragedies of Sophocles (with the exclusion of Philoctetes and Oedipus at Colonus, which are not yet published in the AGLDT). Following the seminal approach applied by J.F. Burrows to the novels of Jane Austen, we investigate the distributions of the 30 most frequent dependency relations between part-of-speech and part-of-speech (like, for instance, noun-adjective or preposition-noun). This program entails a series of crucial methodological questions, concerning both practical and theoretical aspects, that are here discussed in full. By examining some of the most basic statistics used by Burrows, such as the correlation between characters based on the distributions of the constructions, it is already possible to isolate interesting syntactic phenomena that appear to characterize the diction of specific figures, such as Creon in the Antigone, or Electra and the Pedagogue in the Electra
    • …
    corecore