20 research outputs found
Proceedings of the Eighth Italian Conference on Computational Linguistics CliC-it 2021
The eighth edition of the Italian Conference on Computational Linguistics (CLiC-it 2021) was held at UniversitĂ degli Studi di Milano-Bicocca from 26th to 28th January 2022. After the edition of 2020, which was held in fully virtual mode due to the health emergency related to Covid-19, CLiC-it 2021 represented the first moment for the Italian research community of Computational Linguistics to meet in person after more than one year of full/partial lockdown
SprÄkmöte och sprÄkhistoria
The fifteenth volume in the series âStudies in the History of the Swedish Languageâ consists of 21 articles chosen from the presentations of the 15th conference on Swedish language history that was organized by the Department of Scandinavian studies at the University of Tartu, 13â15 June 2018. The main theme of the book comprises language contact and language history. The contacts between Swedish and other languages and what these involve have constituted an active area of research during the past years. This topic is naturally linked to the historical situation of Swedish in Estonia. The articles in the volume cover a wide spectrum of research areas, including language variation and change, Swedish dialects, historical corpus linguistics, multilingualism, translation theory, onomastics, etc. With this volume the editors hope to further stimulate the notable increase of interest in the history of Swedish and not least in the study of various language contacts.Den femtonde volymen i serien Studier i svensk sprĂ„khistoria bestĂ„r av 21 utvalda artiklar som Ă„tergĂ„r pĂ„ föredrag vid konferensen Svenska sprĂ„kets historia 15, arrangerad av Institutionen för skandinavistik vid Tartu universitet 13â15 juni 2018. Boken har liksom konferensen huvudtemat SprĂ„kmöte och sprĂ„khistoria. Svenskans möte med andra sprĂ„k och vad dessa möten innebĂ€r har varit ett livaktigt forskningsomrĂ„de under de senaste Ă„ren, och temat anknyter ocksĂ„ naturligt till svenska sprĂ„kets historiska situation i Estland. Artiklarna i volymen tĂ€cker ett brett spektrum av forskningsomrĂ„den, bl.a. sprĂ„klig variation och Ă€ndring, svenska dialekter, historisk korpuslingvistik, flersprĂ„kighet, översĂ€ttning, namnforskning m.m. Med volymen hoppas redaktionen ytterligare stimulera det tydligt ökade intresset för svensk sprĂ„khistoria och inte minst för studiet av sprĂ„kmöten av skilda slag
Plague Dot Text:Text mining and annotation of outbreak reports of the Third Plague Pandemic (1894-1952)
The design of models that govern diseases in population is commonly built on
information and data gathered from past outbreaks. However, epidemic outbreaks
are never captured in statistical data alone but are communicated by
narratives, supported by empirical observations. Outbreak reports discuss
correlations between populations, locations and the disease to infer insights
into causes, vectors and potential interventions. The problem with these
narratives is usually the lack of consistent structure or strong conventions,
which prohibit their formal analysis in larger corpora. Our interdisciplinary
research investigates more than 100 reports from the third plague pandemic
(1894-1952) evaluating ways of building a corpus to extract and structure this
narrative information through text mining and manual annotation. In this paper
we discuss the progress of our ongoing exploratory project, how we enhance
optical character recognition (OCR) methods to improve text capture, our
approach to structure the narratives and identify relevant entities in the
reports. The structured corpus is made available via Solr enabling search and
analysis across the whole collection for future research dedicated, for
example, to the identification of concepts. We show preliminary visualisations
of the characteristics of causation and differences with respect to gender as a
result of syntactic-category-dependent corpus statistics. Our goal is to
develop structured accounts of some of the most significant concepts that were
used to understand the epidemiology of the third plague pandemic around the
globe. The corpus enables researchers to analyse the reports collectively
allowing for deep insights into the global epidemiological consideration of
plague in the early twentieth century.Comment: Journal of Data Mining & Digital Humanities 202
Actes de la 6e confĂ©rence conjointe JournĂ©es d'Ătudes sur la Parole (JEP, 33e Ă©dition), Traitement Automatique des Langues Naturelles (TALN, 27e Ă©dition), Rencontre des Ătudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RĂCITAL, 22e Ă©dition. Volume 2 : Traitement Automatique des Langues Naturelles
@ 6Úme conférence conjointe: JEP-TALN-RECITAL 2020no abstrac
Proceedings of the Fifth Italian Conference on Computational Linguistics CLiC-it 2018
On behalf of the Program Committee, a very warm welcome to the Fifth Italian Conference on Computational Linguistics (CLiC-Ââit 2018). This edition of the conference is held in Torino. The conference is locally organised by the University of Torino and hosted into its prestigious main lecture hall âCavallerizza Realeâ. The CLiC-Ââit conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after five years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges
Measuring Greekness: A novel computational methodology to analyze syntactical constructions and quantify the stylistic phenomenon of Attic oratory
This study is the result of a compilation and interpretation of data that derive from Classical studies, but are studied and analyzed using computational linguistics, Treebank annotation, and the development and post-processing of metrics. More specifically, the purpose of this work is to employ computational methods so as to analyze a particular form of Ancient Greek language that is Attic Greek, âmeasureâ its attributes, and explore the socio-political connotations that its usage had in the era of the High Roman Empire.
During the first centuries CE, the landscape of the Roman Empire is polyvalent. It consists of native Romans who can be fluent in Latin and Greek, Greeks who are Roman citizens, other easterners who are potentially trilingual and have also assumed Roman citizenship, and even Christians, who identify themselves as Roman citizens but with a different religious identity. It comes as no surprise that language is politicized, and identity, both individual and civic, is constantly reshaped through it. The question I attempt to answer is whether we can quantify Greekness of native and bilingual speakers based on an analytic computational study of Attic dialect.
Chapter 1 provides a discussion of the three aforementioned scholarly fields, which were pertinent for the study. I present the precepts of computational linguistics, corpus linguistics, and digital humanities so as to further explicate what prompts this work and how the confluence of three methodologies significantly enhances our apprehension of the issue at hand.
In Chapter 2, I approach Greekness, Latinity, and Atticism through the writings of Greek and Roman grammarians and lexicographers and provide the complete list of all the occurrences of the aforementioned notions.
Chapters 3 and 4 explicate further the reasoning behind the usage of the Perseids framework and the Prague annotation system. They then proceed to relate the metrics developed, the computational methods, and their subsequent visualization to quantify and objectify the previously purely theoretical inferences. The metric system was developed after careful consideration of the stylistic attributes of Ancient Greek. Therefore, each metric âmeasuresâ something pertinent in the formation of the language. The visualizations then afford us a more understandable and interpretable format of the numerical results. For philologists, it is interesting to view the graphic presentation of humanistic ideas, and for the computer scientists the applicability of their methods on a topic that is predominantly philological and social.
Finally, chapter 5 recontextualizes the numerical results and their interpretations, as were acquired in chapters 3 and 4, and thus sets the parameters necessary to discuss them in conjunction with readings of literary texts of the period of the High Empire. My intention is to show how numbers are âtranslatedâ into a different âlanguage,â the language of the humanist.:Acknowledgments Page 6
Chapter 1: Introduction Page 7
1.1 Focus of the Study Page 7
1.2 Classical Studies and Digital Humanities Page 9
1.3 Corpus Linguistics Page 13
1.4 Humanities Corpus and Corpus Linguistics Page 15
1.5 Synopsis of the Project Page 17
Chapter 2: Linguistic Purity as Ethnic and Educational Marker, or Greek and
Roman Grammarians on Greek and Latin. Page 22
2.1 Introduction Page 22
2.2 Grammatical and Lexicographic Definitions Page 23
2.2.1 Greek and Latin languages Page 23
2.2.2 Grammatici Graeci Page 29
2.2.3 Grammatici Latini. Page 32
2.3 Greek and Attic in Greek Lexicographers Page 48
2.4 Conclusion Page 57
Chapter 3: Attic Oratory and its Imperial Revival: Quantifying Theory and
Practice Page 58
3.1 Introduction Page 58
3.2 Atticism: Definition and Redefinitions Page 59
3.3 Significance of Enhanced Linguistic and Computational Analysis of
Atticism Page 65
3.3.1 The Perseids Project, the Prague Mark-up Language, and Dependency
Grammar Page 67
3.4 Evaluating Atticism Page 70
3.4.1 Dionysiusâs of Halicarnassus Theoretical Framework Page 73
3.5 Methods: Computational Quantification of Rhetorical Styles Page 82
3.5.1 The Perseids 1.5 ALDT Schema Page 84
3.5.2 Node-based Sentence Metrics Page 93
3.5.3 Computer Implementation Page 104
3.6 Conclusion Page 108
Chapter 4: Experimental results, Analysis, and Topological Haar Wavelets
Page 110
4.1 Introduction Page 110
4.2 Experimental Results Page 111
4.3 Data Visualization Page 117
4. 4 Topological Metric Wavelets for Syntactical Quantification Page 153
4.4.1 Wavelets Page 154
4.4.2 Topological Metrics using Wavelets Page 155
4.4.3 Experimental Results Page 157
4.5 Conclusion Page 162
Chapter 5: «ÎαλΏÏÎ·Ï áœąÎœ áŒÎ»Î»Î·ÎœÎŻÎ¶Î”ÎčΜ»: Greekness, Latinity, and Otherness in the
World of the High Empire. Page 163
5.1 Introduction Page 163
5.2 The Multiethnical Constituents of an Imperial Citizen: Anacharsis,
Favorinus, and Dionysiusâs of Halicarnassus Ethnography. Page 165
5.3 Conclusion Page 185
Chapter 6: Conclusion Page 187
References Page 190
Appendix Page 203
Curriculum Vitae Page 212
Dissertation related Publications Page 225
SelbstÀndigkeitserklÀrung Page 22
Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018)
Peer reviewe
Proceedings of the Fifth Italian Conference on Computational Linguistics CLiC-it 2018 : 10-12 December 2018, Torino
On behalf of the Program Committee, a very warm welcome to the Fifth Italian Conference on Computational Linguistics (CLiC-Ââit 2018). This edition of the conference is held in Torino. The conference is locally organised by the University of Torino and hosted into its prestigious main lecture hall âCavallerizza Realeâ. The CLiC-Ââit conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after five years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges