    Anatolian *meyu- ‘4, four’ and its cognates

    Kassian Alexei. Anatolian *meyu- ‘4, four’ and its cognates [Электронный ресурс] / Kassian Alexei// Вопросы языкового родства. - 2009. - Вып. 2. - С. 65-78. - (Вестник РГГУ. Серия "Филологические науки. Языкознание" ; № 16)

    Towards a formal genealogical classification of the Lezgian languages (North Caucasus): testing various phylogenetic methods on lexical data.

    A lexicostatistical classification is proposed for 20 languages and dialects of the Lezgian group of the North Caucasian family, based on meticulously compiled 110-item wordlists, published as part of the Global Lexicostatistical Database project. The lexical data have been subsequently analyzed with the aid of the principal phylogenetic methods, both distance-based and character-based: Starling neighbor joining (StarlingNJ), Neighbor joining (NJ), Unweighted pair group method with arithmetic mean (UPGMA), Bayesian Markov chain Monte Carlo (MCMC), Unweighted maximum parsimony (UMP). Cognation indexes within the input matrix were marked by two different algorithms: traditional etymological approach and phonetic similarity, i.e., the automatic method of consonant classes (Levenshtein distances). Due to certain reasons (first of all, high lexicographic quality of the wordlists and a consensus about the Lezgian phylogeny among Caucasologists), the Lezgian database is a perfect testing area for appraisal of phylogenetic methods. For the etymology-based input matrix, all the phylogenetic methods, with the possible exception of UMP, have yielded trees that are sufficiently compatible with each other to generate a consensus phylogenetic tree of the Lezgian lects. The obtained consensus tree agrees with the traditional expert classification as well as some of the previously proposed formal classifications of this linguistic group. Contrary to theoretical expectations, the UMP method has suggested the least plausible tree of all. In the case of the phonetic similarity-based input matrix, the distance-based methods (StarlingNJ, NJ, UPGMA) have produced the trees that are rather close to the consensus etymology-based tree and the traditional expert classification, whereas the character-based methods (Bayesian MCMC, UMP) have yielded less likely topologies

    Phonetic similarity-based phylogenetic tree of the Lezgian lects produced by the StarlingNJ method from the multistate matrix (binary nodes only).

    <p>Bootstrap values are shown near the nodes (not shown for stable nodes with bootstrap value ≥ 95%). The tree is dated.</p

    Reverse lexicostatistical distances for 3 Rutul dialects (higher percentage of the shared basic vocabulary meaning greater closeness): binary input matrix.

    <p>Reverse lexicostatistical distances for 3 Rutul dialects (higher percentage of the shared basic vocabulary meaning greater closeness): binary input matrix.</p

    Etymology-based consensus phylogenetic tree of the Lezgian lects produced by the UMP method from the binary matrix in the TNT software.

    <p>Bootstrap values are shown near the nodes (not shown for stable nodes with bootstrap value ≥ 95%). Branch length reflects the relative rate of cognate replacement as suggested by TNT. The four optimal trees only differ in the Aghul node as shown in the above panel. Nodes which appeared to be problem as compared to other phylogenetic methods are shadowed.</p

    Manually constructed consensus etymology-based phylogenetic tree of the Lezgian lects based on the StarlingNJ, NJ, BioNJ, UPGMA, Bayesian MCMC, UMP methods.

    <p>The gray ellipses mark 4 joined nodes which cover binary branchings that differ depending on the method. Probability values are shown in the following sequence: NJ / MCMC / UMP (“+” means that P ≥ 0.95 in an individual method; not shown for nodes with P ≥ 0.95 in all methods). StarlingNJ dates are proposed.</p

    Map of the modern Lezgian lects (adapted from [1]).

    <p>Map of the modern Lezgian lects (adapted from [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0116950#pone.0116950.ref001" target="_blank">1</a>]).</p

    Etymology-based phylogenetic tree of the Lezgian lects produced by the UPGMA method from the binary matrix in the SplitsTree4 software.

    <p>Bootstrap values are shown near the nodes (not shown for stable nodes with bootstrap value ≥ 95%). Branch length reflects the relative rate of cognate replacement as suggested by SplitsTree4.</p

    Phonetic similarity-based consensus phylogenetic tree of the Lezgian lects produced by the Bayesian MCMC method from the binary matrix in the MrBayes software.

    <p>Bayesian posterior probabilities are shown above the branches (not shown for stable branches with <i>P</i> ≥ 0.95). Branch length reflects the relative rate of cognate replacement as suggested by MrBayes.</p