20 research outputs found

    Proceedings of the Eighth Italian Conference on Computational Linguistics CliC-it 2021

    Get PDF
    The eighth edition of the Italian Conference on Computational Linguistics (CLiC-it 2021) was held at UniversitĂ  degli Studi di Milano-Bicocca from 26th to 28th January 2022. After the edition of 2020, which was held in fully virtual mode due to the health emergency related to Covid-19, CLiC-it 2021 represented the first moment for the Italian research community of Computational Linguistics to meet in person after more than one year of full/partial lockdown

    SprÄkmöte och sprÄkhistoria

    Get PDF
    The fifteenth volume in the series “Studies in the History of the Swedish Language” consists of 21 articles chosen from the presentations of the 15th conference on Swedish language history that was organized by the Department of Scandinavian studies at the University of Tartu, 13–15 June 2018. The main theme of the book comprises language contact and language history. The contacts between Swedish and other languages and what these involve have constituted an active area of research during the past years. This topic is naturally linked to the historical situation of Swedish in Estonia. The articles in the volume cover a wide spectrum of research areas, including language variation and change, Swedish dialects, historical corpus linguistics, multilingualism, translation theory, onomastics, etc. With this volume the editors hope to further stimulate the notable increase of interest in the history of Swedish and not least in the study of various language contacts.Den femtonde volymen i serien Studier i svensk sprĂ„khistoria bestĂ„r av 21 utvalda artiklar som Ă„tergĂ„r pĂ„ föredrag vid konferensen Svenska sprĂ„kets historia 15, arrangerad av Institutionen för skandinavistik vid Tartu universitet 13–15 juni 2018. Boken har liksom konferensen huvudtemat SprĂ„kmöte och sprĂ„khistoria. Svenskans möte med andra sprĂ„k och vad dessa möten innebĂ€r har varit ett livaktigt forskningsomrĂ„de under de senaste Ă„ren, och temat anknyter ocksĂ„ naturligt till svenska sprĂ„kets historiska situation i Estland. Artiklarna i volymen tĂ€cker ett brett spektrum av forskningsomrĂ„den, bl.a. sprĂ„klig variation och Ă€ndring, svenska dialekter, historisk korpuslingvistik, flersprĂ„kighet, översĂ€ttning, namnforskning m.m. Med volymen hoppas redaktionen ytterligare stimulera det tydligt ökade intresset för svensk sprĂ„khistoria och inte minst för studiet av sprĂ„kmöten av skilda slag

    Plague Dot Text:Text mining and annotation of outbreak reports of the Third Plague Pandemic (1894-1952)

    Get PDF
    The design of models that govern diseases in population is commonly built on information and data gathered from past outbreaks. However, epidemic outbreaks are never captured in statistical data alone but are communicated by narratives, supported by empirical observations. Outbreak reports discuss correlations between populations, locations and the disease to infer insights into causes, vectors and potential interventions. The problem with these narratives is usually the lack of consistent structure or strong conventions, which prohibit their formal analysis in larger corpora. Our interdisciplinary research investigates more than 100 reports from the third plague pandemic (1894-1952) evaluating ways of building a corpus to extract and structure this narrative information through text mining and manual annotation. In this paper we discuss the progress of our ongoing exploratory project, how we enhance optical character recognition (OCR) methods to improve text capture, our approach to structure the narratives and identify relevant entities in the reports. The structured corpus is made available via Solr enabling search and analysis across the whole collection for future research dedicated, for example, to the identification of concepts. We show preliminary visualisations of the characteristics of causation and differences with respect to gender as a result of syntactic-category-dependent corpus statistics. Our goal is to develop structured accounts of some of the most significant concepts that were used to understand the epidemiology of the third plague pandemic around the globe. The corpus enables researchers to analyse the reports collectively allowing for deep insights into the global epidemiological consideration of plague in the early twentieth century.Comment: Journal of Data Mining & Digital Humanities 202

    Proceedings of the Fifth Italian Conference on Computational Linguistics CLiC-it 2018

    Get PDF
    On behalf of the Program Committee, a very warm welcome to the Fifth Italian Conference on Computational Linguistics (CLiC-­‐it 2018). This edition of the conference is held in Torino. The conference is locally organised by the University of Torino and hosted into its prestigious main lecture hall “Cavallerizza Reale”. The CLiC-­‐it conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after five years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges

    Measuring Greekness: A novel computational methodology to analyze syntactical constructions and quantify the stylistic phenomenon of Attic oratory

    Get PDF
    This study is the result of a compilation and interpretation of data that derive from Classical studies, but are studied and analyzed using computational linguistics, Treebank annotation, and the development and post-processing of metrics. More specifically, the purpose of this work is to employ computational methods so as to analyze a particular form of Ancient Greek language that is Attic Greek, “measure” its attributes, and explore the socio-political connotations that its usage had in the era of the High Roman Empire. During the first centuries CE, the landscape of the Roman Empire is polyvalent. It consists of native Romans who can be fluent in Latin and Greek, Greeks who are Roman citizens, other easterners who are potentially trilingual and have also assumed Roman citizenship, and even Christians, who identify themselves as Roman citizens but with a different religious identity. It comes as no surprise that language is politicized, and identity, both individual and civic, is constantly reshaped through it. The question I attempt to answer is whether we can quantify Greekness of native and bilingual speakers based on an analytic computational study of Attic dialect. Chapter 1 provides a discussion of the three aforementioned scholarly fields, which were pertinent for the study. I present the precepts of computational linguistics, corpus linguistics, and digital humanities so as to further explicate what prompts this work and how the confluence of three methodologies significantly enhances our apprehension of the issue at hand. In Chapter 2, I approach Greekness, Latinity, and Atticism through the writings of Greek and Roman grammarians and lexicographers and provide the complete list of all the occurrences of the aforementioned notions. Chapters 3 and 4 explicate further the reasoning behind the usage of the Perseids framework and the Prague annotation system. They then proceed to relate the metrics developed, the computational methods, and their subsequent visualization to quantify and objectify the previously purely theoretical inferences. The metric system was developed after careful consideration of the stylistic attributes of Ancient Greek. Therefore, each metric “measures” something pertinent in the formation of the language. The visualizations then afford us a more understandable and interpretable format of the numerical results. For philologists, it is interesting to view the graphic presentation of humanistic ideas, and for the computer scientists the applicability of their methods on a topic that is predominantly philological and social. Finally, chapter 5 recontextualizes the numerical results and their interpretations, as were acquired in chapters 3 and 4, and thus sets the parameters necessary to discuss them in conjunction with readings of literary texts of the period of the High Empire. My intention is to show how numbers are “translated” into a different “language,” the language of the humanist.:Acknowledgments Page 6 Chapter 1: Introduction Page 7 1.1 Focus of the Study Page 7 1.2 Classical Studies and Digital Humanities Page 9 1.3 Corpus Linguistics Page 13 1.4 Humanities Corpus and Corpus Linguistics Page 15 1.5 Synopsis of the Project Page 17 Chapter 2: Linguistic Purity as Ethnic and Educational Marker, or Greek and Roman Grammarians on Greek and Latin. Page 22 2.1 Introduction Page 22 2.2 Grammatical and Lexicographic Definitions Page 23 2.2.1 Greek and Latin languages Page 23 2.2.2 Grammatici Graeci Page 29 2.2.3 Grammatici Latini. Page 32 2.3 Greek and Attic in Greek Lexicographers Page 48 2.4 Conclusion Page 57 Chapter 3: Attic Oratory and its Imperial Revival: Quantifying Theory and Practice Page 58 3.1 Introduction Page 58 3.2 Atticism: Definition and Redefinitions Page 59 3.3 Significance of Enhanced Linguistic and Computational Analysis of Atticism Page 65 3.3.1 The Perseids Project, the Prague Mark-up Language, and Dependency Grammar Page 67 3.4 Evaluating Atticism Page 70 3.4.1 Dionysius’s of Halicarnassus Theoretical Framework Page 73 3.5 Methods: Computational Quantification of Rhetorical Styles Page 82 3.5.1 The Perseids 1.5 ALDT Schema Page 84 3.5.2 Node-based Sentence Metrics Page 93 3.5.3 Computer Implementation Page 104 3.6 Conclusion Page 108 Chapter 4: Experimental results, Analysis, and Topological Haar Wavelets Page 110 4.1 Introduction Page 110 4.2 Experimental Results Page 111 4.3 Data Visualization Page 117 4. 4 Topological Metric Wavelets for Syntactical Quantification Page 153 4.4.1 Wavelets Page 154 4.4.2 Topological Metrics using Wavelets Page 155 4.4.3 Experimental Results Page 157 4.5 Conclusion Page 162 Chapter 5: Â«Î“Î±Î»ÎŹÏ„Î·Ï‚ áœąÎœ áŒ‘Î»Î»Î·ÎœÎŻÎ¶Î”ÎčΜ»: Greekness, Latinity, and Otherness in the World of the High Empire. Page 163 5.1 Introduction Page 163 5.2 The Multiethnical Constituents of an Imperial Citizen: Anacharsis, Favorinus, and Dionysius’s of Halicarnassus Ethnography. Page 165 5.3 Conclusion Page 185 Chapter 6: Conclusion Page 187 References Page 190 Appendix Page 203 Curriculum Vitae Page 212 Dissertation related Publications Page 225 SelbstĂ€ndigkeitserklĂ€rung Page 22

    Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018)

    Get PDF
    Peer reviewe

    Proceedings of the Fifth Italian Conference on Computational Linguistics CLiC-it 2018 : 10-12 December 2018, Torino

    Get PDF
    On behalf of the Program Committee, a very warm welcome to the Fifth Italian Conference on Computational Linguistics (CLiC-­‐it 2018). This edition of the conference is held in Torino. The conference is locally organised by the University of Torino and hosted into its prestigious main lecture hall “Cavallerizza Reale”. The CLiC-­‐it conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after five years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges
    corecore