1,088 research outputs found

    A computer-assisted pproach to the comparison of mainland southeast Asian languages

    Get PDF
    This cumulative thesis is based on three separate projects based on a computer-assisted language comparison (CALC) framework to address common obstacles to studying the history of Mainland Southeast Asian (MSEA) languages, such as sparse and non-standardized lexical data, as well as an inadequate method of cognate judgments, and to provide caveats to scholars who will use Bayesian phylogenetic analysis. The first project provides a format that standardizes the sound inventories, regulates language labels, and clarifies lexical items. This standardized format allows us to merge various forms of raw data. The format also summarizes information to assist linguists in researching the relatedness among words and inferring relationships among languages. The second project focuses on increasing the transparency of lexical data and cognate judg- ments with regard to compound words. The method enables the annotation of each part of a word with semantic meanings and syntactic features. In addition, four different conversion methods were developed to convert morpheme cognates into word cognates for input into the Bayesian phylogenetic analysis. The third project applies the methods used in the first project to create a workflow by merging linguistic data sets and inferring a language tree using a Bayesian phylogenetic algorithm. Further- more, the project addresses the importance of integrating cross-disciplinary studies into historical linguistic research. Finally, the methods we proposed for managing lexical data for MSEA languages are discussed and summarized in six perspectives. The work can be seen as a milestone in reconstructing human prehistory in an area that has high linguistic and cultural diversity

    Formal and quantitative approaches to historical language comparison

    Get PDF
    Lecture, given at the Fifth Pavia International Summer School for Indo-European Linguistics (UniversitĂ  di Pavia, 2022-09-05/09

    Computational Historical Linguistics

    Get PDF
    In the course, I give a basic introduction into some of the recent developments in the field of computational historical linguistics. While this field is predominantly represented by phylogenetic approaches with whom scholars try to infer phylogenetic trees from different kinds of language data, the approach taken here is much broader, concentrating specifically on the prerequisites needed in order to get one’s data into the shape to carry out phylogenetic analyses. As a result, we will concentrate on topics such as automated phonetic alignments, automated cognate detection, the handling of semantic shift, and the modeling of word formation in comparative wordlists. A major goal of the course is to emphasize the importance of computer-assisted — as opposed to computer-based — approaches, which acknowledge the importance of qualitative work in historical language comparison. The course will be accompanied by code examples which participants can try to replicate on their computers

    Annual contributions to the Genealogical World of Phylogenetic Networks III

    Get PDF
    This is a summary of 12 contributions made by me for the blog "The Genealogical World of Phylogenetic Networks" in 2018. The contributions are shared in form of a PDF document with a table of contents that allows for a quick search of the contributions and offers also the direct links to the blog

    A comparative phylogenetic approach to Austronesian cultural evolution

    Get PDF

    concepts - methods - visualization

    Get PDF
    While Darwin’s grand view of evolution has undergone many changes and shown up in many facets, there remains one outstanding common feature in its 150-year history: since the very beginning, branching trees have been the dominant scheme for representing evolutionary processes. Only recently, network models have gained ground reflecting contact-induced mixing or hybridization in evolutionary scenarios. In biology, research on prokaryote evolution indicates that lateral gene transfer is a major feature in the evolution of bacteria. In the field of linguistics, the mutual lexical and morphosyntactic borrowing between languages seems to be much more central for language evolution than the family tree model is likely to concede. In the humanities, networks are employed as an alternative to established phylogenetic models, to express the hybridization of cultural phenomena, concepts or the social structure of science. However, an interdisciplinary display of network analyses for evolutionary processes remains lacking. Therefore, this volume includes approaches studying the evolutionary dynamics of science, languages and genomes, all of which were based on methods incorporating network approaches

    Data-driven Language Typology

    Get PDF
    In this thesis we use statistical n-gram language models and the perplexity measure for language typology tasks. We interpret the perplexity of a language model as a distance measure when the model is applied on a phonetic transcript of a language the model wasn't originally trained on. We use these distance measures for detecting language families, detecting closely related languages, and for language family tree reproduction. We also study the sample sizes required to train the language models and make estimations on how large corpora are needed for the successful use of these methods. We find that trigram language models trained from automatically transcribed phonetic transcripts and the perplexity measure can be used for both detecting language families and for detecting closely related languages

    Languoid, Doculect, and Glossonym: Formalizing the Notion 'Language'

    Get PDF
    It is perfectly reasonable for laypeople and non-linguistic scholars to use names for languages without reflecting on the proper definition of the objects referred to by these names. Simply using a name like English or Witotoan suffices as an informal communicative designation for a particular language or a language group. However, for the linguistics community, which is by definition occupied with the details of languages and language variation, it is somewhat bizarre that there does not exist a proper technical apparatus to talk about intricate differences in opinion about the precise sense of a name like English or Witotoan when used in academic discussion. We propose three interrelated concepts—LANGUOID, DOCULECT, and GLOSSONYM—which provide a principled basis for discussion of different points of view about key issues, such as whether two varieties should be associated with the same language, and allow for a precise description of what exactly is being claimed by the use of a given genealogical or areal group name. The framework they provide should be especially useful to researchers who work on underdescribed languages where basic issues of classification remain unresolved

    The Austronesian Diaspora: A Synthetic Total Evidence Model

    Get PDF
    This is an evidence-based account of a remarkable, but perhaps somewhat underestimated, series of human population movements lasting continuously for around 5000 years. Information has been collected from a wide variety of studies across a range of disciplines and subjected to critical examination. The emergent picture is presented as a Synthetic Total Evidence Model which traces the Austronesian Diaspora from Taiwan via a genes, language and culture trail to Island Southeast Asia. From there two distinct branches are shown to lead one across the Pacific and another through Malaysia and Indonesia then on to Madagascar. Along the way there are many confounding episodes of admixture, language shifts and cultural assimilation. The Pacific branch is shown to contain two distinct groups known as Polynesians and Melanesians with similar, but still individually characteristic, genepools. Despite all these complexities, the evidence does build to a single unified multi-dimensional picture
    • 

    corecore