95 research outputs found

    Detecting non-tree-like signal using multiple tree topologies

    Get PDF
    Recent applications of phylogenetic methods to historical linguistics have been criticized for assuming a tree structure in which ancestral languages differentiate and split up into daughter languages, while language evolution is inherently non-tree-like (François 2014; Blench 2015: 32–33). This article attempts to contribute to this debate by discussing the use of the multiple topologies method (Pagel & Meade 2006a) implemented in BayesPhyloge- nies (Pagel & Meade 2004). This method is applied to lexical datasets from four different language families: Austronesian (Gray, Drummond & Green- hill 2009), Sinitic (Ben Hamed & Wang 2006), Indo-European (Bouckaert et al. 2012), and Japonic (Lee & Hasegawa 2011). Evidence for multiple topologies is found in all families except, surprisingly, Austronesian. It is suggested that reticulation may arise from a number of processes, including dialect chain break-up, borrowing (both shortly after language splits and later on), incomplete lineage sorting, and characteristics of lexical datasets. It is shown that the multiple topologies method is a useful tool to study the dynamics of language evolution

    Word order evolves at similar rates in main and subordinate clauses

    Full text link
    In syntactic change, it remains an open issue whether word orders are more conservative or innovative in subordinate clauses compared with main clauses. Using 47 dependency-annotated corpora and Bayesian phylogenetic inference, we explore the evolution of S/V, V/O, and S/O orders across main and subordinate clauses in Indo-European. Our results reveal similar rates of change across clause types, with no evidence for any inherent conservatism of subordinate or main clauses. Our models also support evolutionary biases towards SV, VO, and SO orders, consistent with theories of dependency length minimization that favor verb-medial orders and with theories of a subject preference that favor SO orders. Finally, our results show that while the word order in the proto-language cannot be estimated with any reasonable degree of certainty, the early history of the family was dominated by a moderate preference for SVO orders, with substantial uncertainty between VO and OV orders in both main and subordinate clauses

    The Phylogeny of Little Red Riding Hood

    Get PDF
    Researchers have long been fascinated by the strong continuities evident in the oral traditions associated with different cultures. According to the ‘historic-geographic’ school, it is possible to classify similar tales into “international types” and trace them back to their original archetypes. However, critics argue that folktale traditions are fundamentally fluid, and that most international types are artificial constructs. Here, these issues are addressed using phylogenetic methods that were originally developed to reconstruct evolutionary relationships among biological species, and which have been recently applied to a range of cultural phenomena. The study focuses on one of the most debated international types in the literature: ATU 333, ‘Little Red Riding Hood’. A number of variants of ATU 333 have been recorded in European oral traditions, and it has been suggested that the group may include tales from other regions, including Africa and East Asia. However, in many of these cases, it is difficult to differentiate ATU 333 from another widespread international folktale, ATU 123, ‘The Wolf and the Kids’. To shed more light on these relationships, data on 58 folktales were analysed using cladistic, Bayesian and phylogenetic network-based methods. The results demonstrate that, contrary to the claims made by critics of the historic-geographic approach, it is possible to identify ATU 333 and ATU 123 as distinct international types. They further suggest that most of the African tales can be classified as variants of ATU 123, while the East Asian tales probably evolved by blending together elements of both ATU 333 and ATU 123. These findings demonstrate that phylogenetic methods provide a powerful set of tools for testing hypotheses about cross-cultural relationships among folktales, and point towards exciting new directions for research into the transmission and evolution of oral narratives

    Corpus-based typology: Applications, challenges and some solutions

    Get PDF
    Over the last few years, the number of corpora that can be used for language comparison has dramatically increased. The corpora are so diverse in their structure, size and annotation style, that a novice might not know where to start. The present paper charts this new and changing territory, providing a few landmarks, warning signs and safe paths. Although no corpora corpus at present can replace the traditional type of typological data based on language description in reference grammars, they corpora can help with diverse tasks, being particularly well suited for investigating probabilistic and gradient properties of languages and for discovering and interpreting cross-linguistic generalizations based on processing and communicative mechanisms. At the same time, the use of corpora for typological purposes has not only advantages and opportunities, but also numerous challenges. This paper also contains an empirical case study addressing two pertinent problems: the role of text types in language comparison and the problem of the word as a comparative concept

    Information-theoretic causal inference of lexical flow

    Get PDF
    This volume seeks to infer large phylogenetic networks from phonetically encoded lexical data and contribute in this way to the historical study of language varieties. The technical step that enables progress in this case is the use of causal inference algorithms. Sample sets of words from language varieties are preprocessed into automatically inferred cognate sets, and then modeled as information-theoretic variables based on an intuitive measure of cognate overlap. Causal inference is then applied to these variables in order to determine the existence and direction of influence among the varieties. The directed arcs in the resulting graph structures can be interpreted as reflecting the existence and directionality of lexical flow, a unified model which subsumes inheritance and borrowing as the two main ways of transmission that shape the basic lexicon of languages. A flow-based separation criterion and domain-specific directionality detection criteria are developed to make existing causal inference algorithms more robust against imperfect cognacy data, giving rise to two new algorithms. The Phylogenetic Lexical Flow Inference (PLFI) algorithm requires lexical features of proto-languages to be reconstructed in advance, but yields fully general phylogenetic networks, whereas the more complex Contact Lexical Flow Inference (CLFI) algorithm treats proto-languages as hidden common causes, and only returns hypotheses of historical contact situations between attested languages. The algorithms are evaluated both against a large lexical database of Northern Eurasia spanning many language families, and against simulated data generated by a new model of language contact that builds on the opening and closing of directional contact channels as primary evolutionary events. The algorithms are found to infer the existence of contacts very reliably, whereas the inference of directionality remains difficult. This currently limits the new algorithms to a role as exploratory tools for quickly detecting salient patterns in large lexical datasets, but it should soon be possible for the framework to be enhanced e.g. by confidence values for each directionality decision

    Using ancestral state reconstruction methods for onomasiological reconstruction in multilingual word lists

    Get PDF
    Current efforts in computational historical linguistics are predominantly concerned with phylogenetic inference. Methods for ancestral state reconstruction have only been applied sporadically. In contrast to phylogenetic algorithms, automatic reconstruction methods presuppose phylogenetic information in order to explain what has evolved when and where. Here we report a pilot study exploring how well automatic methods for ancestral state reconstruction perform in the task of onomasiological reconstruction in multilingual word lists, where algorithms are used to infer how the words evolved along a given phylogeny, and reconstruct which cognate classes were used to express a given meaning in the ancestral languages. Comparing three different methods, Maximum Parsimony, Minimal Lateral Networks, and Maximum Likeli- hood on three different test sets (Indo-European, Austronesian, Chinese) using binary and multi-state coding of the data as well as single and sampled phylogenies, we find that Maximum Likelihood largely outperforms the other methods. At the same time, however, the general performance was disappointingly low, ranging between 0.66 (Chinese) and 0.79 (Austronesian) for the F-Scores. A closer linguistic evaluation of the reconstructions proposed by the best method and the reconstructions given in the gold standards revealed that the majority of the cases where the algorithms failed can be attributed to problems of independent semantic shift (homoplasy), to morphological processes in lexical change, and to wrong reconstructions in the independently created test sets that we employed

    Information-theoretic causal inference of lexical flow

    Get PDF
    This volume seeks to infer large phylogenetic networks from phonetically encoded lexical data and contribute in this way to the historical study of language varieties. The technical step that enables progress in this case is the use of causal inference algorithms. Sample sets of words from language varieties are preprocessed into automatically inferred cognate sets, and then modeled as information-theoretic variables based on an intuitive measure of cognate overlap. Causal inference is then applied to these variables in order to determine the existence and direction of influence among the varieties. The directed arcs in the resulting graph structures can be interpreted as reflecting the existence and directionality of lexical flow, a unified model which subsumes inheritance and borrowing as the two main ways of transmission that shape the basic lexicon of languages

    The evolution of language: Proceedings of the Joint Conference on Language Evolution (JCoLE)

    Get PDF
    • 

    corecore