40 research outputs found

    Lexical evolution rates by automated stability measure

    Full text link
    Phylogenetic trees can be reconstructed from the matrix which contains the distances between all pairs of languages in a family. Recently, we proposed a new method which uses normalized Levenshtein distances among words with same meaning and averages on all the items of a given list. Decisions about the number of items in the input lists for language comparison have been debated since the beginning of glottochronology. The point is that words associated to some of the meanings have a rapid lexical evolution. Therefore, a large vocabulary comparison is only apparently more accurate then a smaller one since many of the words do not carry any useful information. In principle, one should find the optimal length of the input lists studying the stability of the different items. In this paper we tackle the problem with an automated methodology only based on our normalized Levenshtein distance. With this approach, the program of an automated reconstruction of languages relationships is completed

    Assimilation in Multilingual Cities

    Get PDF
    We characterise how the assimilation patterns of minorities into the strong and the weak language differ in a situation of asymmetric bilingualism. Using large variations in language composition in Canadian cities from the 2001 and 2006 Censuses, we show that the differences in the knowledge of English by immigrant allophones (i.e. the immigrants with a mother tongue other than English and French) in English-majority cities are mainly due to sorting across cities. Instead, in French-majority cities, learning plays an important role in explaining differences in knowledge of French. In addition, the presence of large anglophone minorities deters much more the assimilation into French than the presence of francophone minorities deters the assimilation into English. Finally, we find that language distance plays a much more important role in explaining assimilation into French, and that assimilation into French is much more sensitive to individual characteristics than assimilation into English. Some of these asymmetric assimilation patterns extend to anglophone and francophone immigrants, but no evidence of learning is found in this case

    QAPgrid: A Two Level QAP-Based Approach for Large-Scale Data Analysis and Visualization

    Get PDF
    Background: The visualization of large volumes of data is a computationally challenging task that often promises rewarding new insights. There is great potential in the application of new algorithms and models from combinatorial optimisation. Datasets often contain “hidden regularities” and a combined identification and visualization method should reveal these structures and present them in a way that helps analysis. While several methodologies exist, including those that use non-linear optimization algorithms, severe limitations exist even when working with only a few hundred objects. Methodology/Principal Findings: We present a new data visualization approach (QAPgrid) that reveals patterns of similarities and differences in large datasets of objects for which a similarity measure can be computed. Objects are assigned to positions on an underlying square grid in a two-dimensional space. We use the Quadratic Assignment Problem (QAP) as a mathematical model to provide an objective function for assignment of objects to positions on the grid. We employ a Memetic Algorithm (a powerful metaheuristic) to tackle the large instances of this NP-hard combinatorial optimization problem, and we show its performance on the visualization of real data sets. Conclusions/Significance: Overall, the results show that QAPgrid algorithm is able to produce a layout that represents the relationships between objects in the data set. Furthermore, it also represents the relationships between clusters that are feed into the algorithm. We apply the QAPgrid on the 84 Indo-European languages instance, producing a near-optimal layout. Next, we produce a layout of 470 world universities with an observed high degree of correlation with the score used by the Academic Ranking of World Universities compiled in the The Shanghai Jiao Tong University Academic Ranking of World Universities without the need of an ad hoc weighting of attributes. Finally, our Gene Ontology-based study on Saccharomyces cerevisiae fully demonstrates the scalability and precision of our method as a novel alternative tool for functional genomics

    Dated ancestral trees from binary trait data and their application to the diversification of languages

    No full text
    Binary trait data record the presence or absence of distinguishing traits in individuals. We treat the problem of estimating ancestral trees with time depth from binary trait data. Simple analysis of such data is problematic. Each homology class of traits has a unique birth event on the tree, and the birth event of a trait that is visible at the leaves is biased towards the leaves. We propose a model-based analysis of such data and present a Markov chain Monte Carlo algorithm that can sample from the resulting posterior distribution. Our model is based on using a birth-death process for the evolution of the elements of sets of traits. Our analysis correctly accounts for the removal of singleton traits, which are commonly discarded in real data sets. We illustrate Bayesian inference for two binary trait data sets which arise in historical linguistics. The Bayesian approach allows for the incorporation of information from ancestral languages. The marginal prior distribution of the root time is uniform. We present a thorough analysis of the robustness of our results to model misspecification, through analysis of predictive distributions for external data, and fitting data that are simulated under alternative observation models. The reconstructed ages of tree nodes are relatively robust, whereas posterior probabilities for topology are not reliable. Copyright (c) 2008 Royal Statistical Society.
    corecore