Search CORE

21 research outputs found

Recommended from our members

Phonologically Informed Edit Distance Algorithms for Word Alignment with Low-Resource Languages

Author: Frank Robert
McCoy Richard T
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/01/2018
Field of study

We present three methods for weighting edit distance algorithms based on linguistic information. These methods base their penalties on (i) phonological features, (ii) distributional character embeddings, or (iii) differences between cognate words. We also introduce a novel method for evaluating edit distance through the task of low-resource word alignment by using edit-distance neighbors in a high-resource pivot language to inform alignments from the low-resource language. At this task, the cognate-based scheme outperforms our other methods and the Levenshtein edit distance baseline, showing that NLP applications can benefit from information about cross-linguistic phonological patterns

ScholarWorks@UMass Amherst

Cross-Family Similarity Learning for Cognate Identification in Low-Resource Languages

Author: Granroth-Wilding Mark
Soisalon-Soininen Eliel
Publication venue: INCOMA
Publication date: 04/09/2019
Field of study

Peer reviewe

Crossref

Helsingin yliopiston digitaalinen arkisto

Sequence comparison in computational historical linguistics

Author: Forkel Robert
Greenhill Simon
List Johann Mattis
Tresoldi Tiago
Walworth Mary
Publication venue: 'Oxford University Press (OUP)'
Publication date: 23/11/2020
Field of study

With increasing amounts of digitally available data from all over the world, manual annotation of cognates in multi-lingual word lists becomes more and more time-consuming in historical linguistics. Using available software packages to pre-process the data prior to manual analysis can drastically speed-up the process of cognate detection. Furthermore, it allows us to get a quick overview on data which have not yet been intensively studied by experts. LingPy is a Python library which provides a large arsenal of routines for sequence comparison in historical linguistics. With LingPy, linguists can not only automatically search for cognates in lexical data, but they can also align the automatically identified words, and output them in various forms, which aim at facilitating manual inspection. In this tutorial, we will briefly introduce the basic concepts behind the algorithms employed by LingPy and then illustrate in concrete workflows how automatic sequence comparison can be applied to multi-lingual word lists. The goal is to provide the readers with all information they need to (1) carry out cognate detection and alignment analyses in LingPy, (2) select the appropriate algorithms for the appropriate task, (3) evaluate how well automatic cognate detection algorithms perform compared to experts, and (4) export their data into various formats useful for additional analyses or data sharing. While basic knowledge of the Python language is useful for all analyses, our tutorial is structured in such a way that scholars with basic knowledge of computing can follow through all steps as well.This research was supported by the European Research Council Starting Grant ‘Computer-Assisted Language Comparison’ (Grant CALC 715618, J.M.L., T.T.) and the Australian Research Council’s Centre of Excellence for the Dynamics of Language (Australian National University, Grant CE140100041, S.J.G.). As part of the GlottoBank project (http://glottobank.org), this work was further supported by the Department of Linguistic and Cultural Evolution of the Max Planck Institute for the Science of Human History (Jena) and the Royal Society of New Zealand (Marsden Fund, Grant 13-UOA-121)

The Australian National University

Sequence comparison in computational historical linguistics

Author: Johann-Mattis List
Mary Walworth
Robert Forkel
Simon J. Greenhill
Tiago Tresoldi
Publication venue: 'Modern Language Association'
Publication date: 01/01/2018
Field of study

Humanities Commons

MPG.PuRe

Computational Historical Linguistics

Author: Johann-Mattis List
Publication venue: 'Modern Language Association'
Publication date: 01/01/2023
Field of study

In the course, I give a basic introduction into some of the recent developments in the field of computational historical linguistics. While this field is predominantly represented by phylogenetic approaches with whom scholars try to infer phylogenetic trees from different kinds of language data, the approach taken here is much broader, concentrating specifically on the prerequisites needed in order to get one’s data into the shape to carry out phylogenetic analyses. As a result, we will concentrate on topics such as automated phonetic alignments, automated cognate detection, the handling of semantic shift, and the modeling of word formation in comparative wordlists. A major goal of the course is to emphasize the importance of computer-assisted — as opposed to computer-based — approaches, which acknowledge the importance of qualitative work in historical language comparison. The course will be accompanied by code examples which participants can try to replicate on their computers

Humanities Commons

Are Automatic Methods for Cognate Detection Good Enough for Phylogenetic Reconstruction in Historical Linguistics?

Author: Gerhard Jäger
Johann-Mattis LIst
Johannes Wahle
Taraka Rama
Publication venue: 'Modern Language Association'
Publication date: 01/01/2018
Field of study

We evaluate the performance of state-of-the-art algorithms for automatic cognate detection by comparing how useful automatically inferred cognates are for the task of phylogenetic inference compared to classical manually annotated cognate sets. Our findings suggest that phylogenies inferred from automated cog- nate sets come close to phylogenies inferred from expert-annotated ones, although on average, the latter are still superior. We con- clude that future work on phylogenetic reconstruction can profit much from automatic cognate detection. Especially where scholars are merely interested in exploring the bigger picture of a language family’s phylogeny, algorithms for automatic cognate detection are a useful complement for current research on language phylogenies

arXiv.org e-Print Archive

Crossref

Humanities Commons

MPG.PuRe

Computational Approaches to Historical Language Comparison

Author: Johann-Mattis List
Publication venue: 'Modern Language Association'
Publication date: 01/01/2022
Field of study

The chapter discusses recently developed computational techniques providing concrete help in addressing various tasks in historical language comparison, focusing specifically on those tasks which are typically subsumed under the framework of the comparative method. These include the proof of relationship, cognate and correspondence detection, phonological reconstruction and sound law induction, and the reconstruction of evolutionary scenarios

Humanities Commons

A cross-linguistic database of phonetic transcription systems

Author: Anderson C.
Chacon T.
Fehn A.
Forkel R.
List J.
Tresoldi T.
Walworth M.
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/01/2018
Field of study

Contrary to what non-practitioners might expect, the systems of phonetic notation used by linguists are highly idiosyncratic. Not only do various linguistic subfields disagree on the specific symbols they use to denote the speech sounds of languages, but also in large databases of sound inventories considerable variation can be found. Inspired by recent efforts to link cross-linguistic data with help of reference catalogues (Glottolog, Concepticon) across different resources, we present initial efforts to link different phonetic notation systems to a catalogue of speech sounds. This is achieved with the help of a database accompanied by a software framework that uses a limited but easily extendable set of non-binary feature values to allow for quick and convenient registration of different transcription systems, while at the same time linking to additional datasets with restricted inventories. Linking different transcription systems enables us to conveniently translate between different phonetic transcription systems, while linking sounds to databases allows users quick access to various kinds of metadata, including feature values, statistics on phoneme inventories, and information on prosody and sound classes. In order to prove the feasibility of this enterprise, we supplement an initial version of our cross-linguistic database of phonetic transcription systems (CLTS), which currently registers five transcription systems and links to fifteen datasets, as well as a web application, which permits users to conveniently test the power of the automatic translation across transcription systems

Biblioteka Nauki - repozytorium artykuÅÃ³w

MPG.PuRe

Bangime: secret language, language isolate, or language island?

Author: Hantgan A.
List J.
Publication venue
Publication date: 01/01/2020
Field of study

We report the results of a qualitative and quantitative lexical comparison between Bangime and neighboring languages. Our results indicate that the status of the language as an isolate remains viable, and that Bangime speakers have had different levels of language contact with other Malian populations at different time periods. Bangime speakers, the Bangande, claim Dogon ancestry, and the language has both recent borrowings from neighboring Dogon varieties and more rooted vocabulary from Dogon languages spoken to the east from whence the Bangande claim to have come. Evidence of multi-layered long-term contact is clear: lexical items have even permeated even core vocabulary. However, strikingly, the Bangande are seemingly unaware that their language is not intelligible with any Dogon variety. We hope that our findings will influence future studies on the reconstruction of the Dogon languages and other neighboring language varieties to shed light on the mysterious history of Bangime and its speakers

Papers in Historical Phonology

Journal Hosting Service | The University of Edinburgh

MPG.PuRe

Pragmatics of Language Evolution

Author: Johann-Mattis List
Publication venue: 'Modern Language Association'
Publication date: 01/01/2019
Field of study

The fact that “all languages evolve, as long as they exist” (Schleicher 1863: 18f) has been long known to linguists and does not surprise us anymore. The reasons why all language change constantly, however, is still not fully understood. What we know, however, is that language usage must be at the core of language evolution. It is the dynamics among speakers, who want to be understood and understand what others say, while at the same time trying to be efficient, convincing, or poetic when communicating with others. If the dynamics of language use are indeed one of the driving forces of language evolution, it is evident that the phenomena of language change need to be studied from the perspective of pragmatics. In times of constantly increasing amounts of digital language data, in various forms, ranging from wordlists via results of laboratory experiments to large historical corpora, it is clear that every attempt to understand the specific dynamics of language evolution must be carried out in an empirical framework. In the course, I will try to give a rather broad (but nevertheless eclectic) introduction into topics in historical linguistics in which pragmatics play a crucial role for the study of language change and its driving forces. In this context, we will look into empirical aspects of research on language evolution, empirical studies on sound change, and the pragmatics of language contact. In addition, we will also learn how language change can be modeled, and how we can study pragmatic phenomena themselves from an evolutionary perspective by investigating how speech acts and poetic traditions evolve

Humanities Commons

MPG.PuRe