554 research outputs found

    An Algorithm For Building Language Superfamilies Using Swadesh Lists

    Get PDF
    The main contributions of this thesis are the following: i. Developing an algorithm to generate language families and superfamilies given for each input language a Swadesh list represented using the international phonetic alphabet (IPA) notation. ii. The algorithm is novel in using the Levenshtein distance metric on the IPA representation and in the way it measures overall distance between pairs of Swadesh lists. iii. Building a Swadesh list for the author\u27s native Kinyarwanda language because a Swadesh list could not be found even after an extensive search for it. Adviser: Peter Reves

    Measures of lexical distance between languages

    Full text link
    The idea of measuring distance between languages seems to have its roots in the work of the French explorer Dumont D'Urville \cite{Urv}. He collected comparative words lists of various languages during his voyages aboard the Astrolabe from 1826 to 1829 and, in his work about the geographical division of the Pacific, he proposed a method to measure the degree of relation among languages. The method used by modern glottochronology, developed by Morris Swadesh in the 1950s, measures distances from the percentage of shared cognates, which are words with a common historical origin. Recently, we proposed a new automated method which uses normalized Levenshtein distance among words with the same meaning and averages on the words contained in a list. Recently another group of scholars \cite{Bak, Hol} proposed a refined of our definition including a second normalization. In this paper we compare the information content of our definition with the refined version in order to decide which of the two can be applied with greater success to resolve relationships among languages

    Lexical evolution rates by automated stability measure

    Full text link
    Phylogenetic trees can be reconstructed from the matrix which contains the distances between all pairs of languages in a family. Recently, we proposed a new method which uses normalized Levenshtein distances among words with same meaning and averages on all the items of a given list. Decisions about the number of items in the input lists for language comparison have been debated since the beginning of glottochronology. The point is that words associated to some of the meanings have a rapid lexical evolution. Therefore, a large vocabulary comparison is only apparently more accurate then a smaller one since many of the words do not carry any useful information. In principle, one should find the optimal length of the input lists studying the stability of the different items. In this paper we tackle the problem with an automated methodology only based on our normalized Levenshtein distance. With this approach, the program of an automated reconstruction of languages relationships is completed

    LEXICOSTATISTICS OF MALAY AND MALAGASY LANGUAGES: COMPARATIVE HISTORICAL LINGUISTIC STUDY

    Get PDF
    This study examines the kinship of the Malay language and the Malagasy language. These two languages come from the same proto language, namely Proto Austronesian (PAN). Departing from the researchers’ assumptions about the linguistic relationship both at the phoneme and morpheme levels, there is a close kinship system or relationship between these two languages. Even though they are geographically and geo-politically separated, preliminary research on these two languages shows several universal features, one of which is that both languages are agglutinative languages. Therefore, this study is an attempt to find empirical evidence about the separation time between Malay and Malagasy by using language grouping methods and lexicostatistical techniques. The first stage, the researchers collect 300 basic vocabularies compiled by Swadesh (1995). The method used in providing the data is the referential method, while the technique used is the note-taking technique. Second, the researchers determine which pairs of the two languages are cognate languages. Third, the researchers calculate the age and separation time of the two languages. Fourth, the researchers calculate the error term to determine a more precise separation time. The result of this research indicates that Malay and Malagasy were a single language at 4223-3951 thousand years ago and began to separate from their proto languages in 2201-1929 BC

    Bayesian phylolinguistics infers the internal structure and the time-depth of the Turkic language family

    No full text
    Despite more than 200 years of research, the internal structure of the Turkic language family remains subject to debate. Classifications of Turkic so far are based on both classical historical–comparative linguistic and distance-based quantitative approaches. Although these studies yield an internal structure of the Turkic family, they cannot give us an understanding of the statistical robustness of the proposed branches, nor are they capable of reliably inferring absolute divergence dates, without assuming constant rates of change. Here we use computational Bayesian phylogenetic methods to build a phylogeny of the Turkic languages, express the reliability of the proposed branches in terms of probability, and estimate the time-depth of the family within credibility intervals. To this end, we collect a new dataset of 254 basic vocabulary items for thirty-two Turkic language varieties based on the recently introduced Leipzig–Jakarta list. Our application of Bayesian phylogenetic inference on lexical data of the Turkic languages is unprecedented. The resulting phylogenetic tree supports a binary structure for Turkic and replicates most of the conventional sub-branches in the Common Turkic branch. We calculate the robustness of the inferences for subgroups and individual languages whose position in the tree seems to be debatable. We infer the time-depth of the Turkic family at around 2100 years before present, thus providing a reliable quantitative basis for previous estimates based on classical historical linguistics and lexicostatistics

    Lexicostatistics and Australian languages: problems and prospects

    Get PDF

    Norm-referenced lexicostatistics and Chamic

    Get PDF

    Internal classification of the Alor-Pantar language family using computational methods applied to the lexicon

    Get PDF
    The non-Austronesian languages of Alor and Pantar in eastern Indonesia have been shown to be genetically related using the comparative method, but the identified phonological innovations are typologically common and do not delineate neat subgroups. We apply computational methods to recently-collected lexical data and are able to identify subgroups based on the lexicon. Crucially, the lexical data are coded for cognacy based on identified phonological innovations. This methodology can succeed even where phonological innovations themselves fail to identify subgroups, showing that computational methods using lexical data can be a powerful tool supplementing the comparative method.peer reviewed by journal Language Dynamics and Chang

    An Algorithm For Building Language Superfamilies Using Swadesh Lists

    Get PDF
    The main contributions of this thesis are the following: i. Developing an algorithm to generate language families and superfamilies given for each input language a Swadesh list represented using the international phonetic alphabet (IPA) notation. ii. The algorithm is novel in using the Levenshtein distance metric on the IPA representation and in the way it measures overall distance between pairs of Swadesh lists. iii. Building a Swadesh list for the author\u27s native Kinyarwanda language because a Swadesh list could not be found even after an extensive search for it. Adviser: Peter Reves
    • …
    corecore