12 research outputs found

    Netgraph-A Tool for Searching in the Prague Dependency Treebank 2.0

    Get PDF
    Three sides existed whose connection is solved in this thesis. First, it was the Prague Dependency Treebank 2.0, one of the most advanced treebanks in the linguistic world. Second, there existed a very limited but extremely intuitive search tool - Netgraph 1.0. Third, there were users longing for such a simple and intuitive tool that would be powerful enough to search in the Prague Dependency Treebank. In the thesis, we study the annotation of the Prague Dependency Treebank 2.0, especially on the tectogrammatical layer, which is by far the most complex layer of the treebank, and assemble a list of requirements on a query language that would allow searching for and studying all linguistic phenomena annotated in the treebank. We propose an extension to the query language of the existing search tool Netgraph 1.0 and show that the extended query language satisfies the list of requirements. We also show how all principal linguistic phenomena annotated in the treebank can be searched for with the query language. The proposed query language has also been implemented - we present the search tool as well and talk about the data format for the tool. An attached CD-ROM contains the installation of the tool.Tato práce se zabývá spojením tří existujících stran. Na straně jedné byl Pražský závislostní korpus 2.0, jeden z nejvyspělejších korpusů lingvistického světa. Na straně druhé existoval omezený, ale velmi intuitivní vyhledávací nástroj Netgraph 1.0. A na straně třetí byli uživatelé toužící po takovém jednoduchém nástroji, který by však byl dostatečně silný pro vyhledávání v Pražském závislostním korpusu. V této práci zkoumáme anotaci Pražského závislostního korpusu 2.0, obzvláště tektogramatické roviny, jež je zdaleka nejsložitější rovinou tohoto korpusu, a vytváříme seznam požadavků na dotazovací jazyk, který by umožnil vyhledávání a studium všech lingvistických jevů v korpusu anotovaných. Navrhujeme rozšíření dotazovacího jazyka existujícího vyhledávacího nástroje Netgraphu 1.0 a ukazujeme, že tento rozšířený dotazovací jazyk vyhovuje formulovanému seznamu požadavků. Ukazujeme rovněž, jak pomocí tohoto dotazovacího jazyka mohou být vyhledány všechny podstatné lingvistické jevy anotované v korpusu. Navržený dotazovací jazyk byl rovněž implementován - zmiňujeme se tedy i o vyhledávacím nástroji a hovoříme o datech pro tento nástroj. Nástroj je možno nainstalovat z přiloženého CD-ROMu.Institute of Formal and Applied LinguisticsÚstav formální a aplikované lingvistikyFaculty of Mathematics and PhysicsMatematicko-fyzikální fakult

    Measuring Greekness: A novel computational methodology to analyze syntactical constructions and quantify the stylistic phenomenon of Attic oratory

    Get PDF
    This study is the result of a compilation and interpretation of data that derive from Classical studies, but are studied and analyzed using computational linguistics, Treebank annotation, and the development and post-processing of metrics. More specifically, the purpose of this work is to employ computational methods so as to analyze a particular form of Ancient Greek language that is Attic Greek, “measure” its attributes, and explore the socio-political connotations that its usage had in the era of the High Roman Empire. During the first centuries CE, the landscape of the Roman Empire is polyvalent. It consists of native Romans who can be fluent in Latin and Greek, Greeks who are Roman citizens, other easterners who are potentially trilingual and have also assumed Roman citizenship, and even Christians, who identify themselves as Roman citizens but with a different religious identity. It comes as no surprise that language is politicized, and identity, both individual and civic, is constantly reshaped through it. The question I attempt to answer is whether we can quantify Greekness of native and bilingual speakers based on an analytic computational study of Attic dialect. Chapter 1 provides a discussion of the three aforementioned scholarly fields, which were pertinent for the study. I present the precepts of computational linguistics, corpus linguistics, and digital humanities so as to further explicate what prompts this work and how the confluence of three methodologies significantly enhances our apprehension of the issue at hand. In Chapter 2, I approach Greekness, Latinity, and Atticism through the writings of Greek and Roman grammarians and lexicographers and provide the complete list of all the occurrences of the aforementioned notions. Chapters 3 and 4 explicate further the reasoning behind the usage of the Perseids framework and the Prague annotation system. They then proceed to relate the metrics developed, the computational methods, and their subsequent visualization to quantify and objectify the previously purely theoretical inferences. The metric system was developed after careful consideration of the stylistic attributes of Ancient Greek. Therefore, each metric “measures” something pertinent in the formation of the language. The visualizations then afford us a more understandable and interpretable format of the numerical results. For philologists, it is interesting to view the graphic presentation of humanistic ideas, and for the computer scientists the applicability of their methods on a topic that is predominantly philological and social. Finally, chapter 5 recontextualizes the numerical results and their interpretations, as were acquired in chapters 3 and 4, and thus sets the parameters necessary to discuss them in conjunction with readings of literary texts of the period of the High Empire. My intention is to show how numbers are “translated” into a different “language,” the language of the humanist.:Acknowledgments Page 6 Chapter 1: Introduction Page 7 1.1 Focus of the Study Page 7 1.2 Classical Studies and Digital Humanities Page 9 1.3 Corpus Linguistics Page 13 1.4 Humanities Corpus and Corpus Linguistics Page 15 1.5 Synopsis of the Project Page 17 Chapter 2: Linguistic Purity as Ethnic and Educational Marker, or Greek and Roman Grammarians on Greek and Latin. Page 22 2.1 Introduction Page 22 2.2 Grammatical and Lexicographic Definitions Page 23 2.2.1 Greek and Latin languages Page 23 2.2.2 Grammatici Graeci Page 29 2.2.3 Grammatici Latini. Page 32 2.3 Greek and Attic in Greek Lexicographers Page 48 2.4 Conclusion Page 57 Chapter 3: Attic Oratory and its Imperial Revival: Quantifying Theory and Practice Page 58 3.1 Introduction Page 58 3.2 Atticism: Definition and Redefinitions Page 59 3.3 Significance of Enhanced Linguistic and Computational Analysis of Atticism Page 65 3.3.1 The Perseids Project, the Prague Mark-up Language, and Dependency Grammar Page 67 3.4 Evaluating Atticism Page 70 3.4.1 Dionysius’s of Halicarnassus Theoretical Framework Page 73 3.5 Methods: Computational Quantification of Rhetorical Styles Page 82 3.5.1 The Perseids 1.5 ALDT Schema Page 84 3.5.2 Node-based Sentence Metrics Page 93 3.5.3 Computer Implementation Page 104 3.6 Conclusion Page 108 Chapter 4: Experimental results, Analysis, and Topological Haar Wavelets Page 110 4.1 Introduction Page 110 4.2 Experimental Results Page 111 4.3 Data Visualization Page 117 4. 4 Topological Metric Wavelets for Syntactical Quantification Page 153 4.4.1 Wavelets Page 154 4.4.2 Topological Metrics using Wavelets Page 155 4.4.3 Experimental Results Page 157 4.5 Conclusion Page 162 Chapter 5: «Γαλάτης ὢν ἑλληνίζειν»: Greekness, Latinity, and Otherness in the World of the High Empire. Page 163 5.1 Introduction Page 163 5.2 The Multiethnical Constituents of an Imperial Citizen: Anacharsis, Favorinus, and Dionysius’s of Halicarnassus Ethnography. Page 165 5.3 Conclusion Page 185 Chapter 6: Conclusion Page 187 References Page 190 Appendix Page 203 Curriculum Vitae Page 212 Dissertation related Publications Page 225 Selbständigkeitserklärung Page 22

    Valence deverbativních substantiv v češtině

    Get PDF
    Ústav formální a aplikované lingvistikyInstitute of Formal and Applied LinguisticsFaculty of Mathematics and PhysicsMatematicko-fyzikální fakult

    A computational approach to Latin verbs: new resources and methods

    Get PDF
    Questa tesi presenta l'applicazione di metodi computazionali allo studio dei verbi latini. In particolare, mostriamo la creazione di un lessico di sottocategorizzazione estratto automaticamente da corpora annotati; inoltre presentiamo un modello probabilistico per l'acquisizione di preferenze di selezione a partire da corpora annotati e da un'ontologia (Latin WordNet). Infine, descriviamo i risultati di uno studio diacronico e quantitativo sui preverbi spaziali latini

    Netgraph-A Tool for Searching in the Prague Dependency Treebank 2.0

    Get PDF
    Three sides existed whose connection is solved in this thesis. First, it was the Prague Dependency Treebank 2.0, one of the most advanced treebanks in the linguistic world. Second, there existed a very limited but extremely intuitive search tool - Netgraph 1.0. Third, there were users longing for such a simple and intuitive tool that would be powerful enough to search in the Prague Dependency Treebank. In the thesis, we study the annotation of the Prague Dependency Treebank 2.0, especially on the tectogrammatical layer, which is by far the most complex layer of the treebank, and assemble a list of requirements on a query language that would allow searching for and studying all linguistic phenomena annotated in the treebank. We propose an extension to the query language of the existing search tool Netgraph 1.0 and show that the extended query language satisfies the list of requirements. We also show how all principal linguistic phenomena annotated in the treebank can be searched for with the query language. The proposed query language has also been implemented - we present the search tool as well and talk about the data format for the tool. An attached CD-ROM contains the installation of the tool

    Annotated text databases in the context of the Kaj Munk corpus:One database model, one query language, and several applications

    Get PDF

    Netgraph

    No full text
    Netgraph is a graphically oriented client-server application for searching in linguistically annotated treebanks. The query language of Netgraph is simple and intuitive, yet powerful enough for treebanks with complex annotations schemes. The primary purpose of Netgraph is searching in the Prague Dependency Treebank 2.0, nevertheless it can be used for other treebanks as well

    Netgraph

    No full text
    Netgraph is a graphically oriented client-server application for searching in linguistically annotated treebanks. The query language of Netgraph is simple and intuitive, yet powerful enough for treebanks with complex annotations schemes. The primary purpose of Netgraph is searching in the Prague Dependency Treebank 2.0, nevertheless it can be used for other treebanks as well

    Modélisation informatique de structures dynamiques de segments textuels pour l'analyse de corpus

    No full text
    The objective of the thesis is to propose a data-processing model to represent, build and exploit textualstructures. The suggested model relies on a «type/token» form of text representation extended bysystems of lexical and contextual annotations. This model's establishment was carried out in the SATOsoftware -- of which the functionalities and the internal organization are presented. Reference to anumber of works give an account of the development and use of the software in various contexts.The formal assumption of the textual and discursive structures find an ally in the beaconing XMLlanguage and the proposals of the Text Encoding Initiative (TEI). Formally, the structures built on thetextual segments correspond to graphs. In a development driven textual analysis context, these graphsare multiple and partially deployed. Their resolution, within the fastening of the nodes to textualsegments or that of other graphs, is a dynamic process which can be sustained by various dataprocessingmechanisms. Examples drawn from textual linguistics are used to illustrate the principles ofstructural annotation. Prospective considerations for the data-processing establishment of amanagement system of the structural annotation are also exposed.L'objectif de la thèse est de proposer un modèle informatique pour représenter, construire et exploiterdes structures textuelles. Le modèle proposé s'appuie sur une représentation du texte sous la forme d'unplan lexique/occurrences augmenté de systèmes d'annotations lexicales et contextuelles, modèle dontune implantation a été réalisée dans le logiciel SATO dont on présente les fonctionnalités etl'organisation interne. La présentation d'un certain nombre de travaux rendent compte dudéveloppement et de l'utilisation du logiciel dans divers contextes.La prise en charge formelle des structures textuelles et discursives trouve un allié dans le langage debalisage XML et dans les propositions de la Text Encoding Initiative (TEI). Formellement, lesstructures construites sur les segments textuels correspondent à des graphes. Dans le contexte d'uneanalyse textuelle en élaboration, ces graphes sont multiples et partiellement déployés. La résolution deces graphes, au sens du rattachement des noeuds à des segments textuels ou à des noeuds d'autresgraphes, est un processus dynamique qui peut être soutenu par divers mécanismes informatiques. Desexemples tirés de la linguistique textuelle servent à illustrer les principes de l'annotation structurelle.Des considérations prospectives sur une implantation informatique d'un système de gestion del'annotation structurelle sont aussi exposées
    corecore