40 research outputs found
Lexical evolution rates by automated stability measure
Phylogenetic trees can be reconstructed from the matrix which contains the
distances between all pairs of languages in a family. Recently, we proposed a
new method which uses normalized Levenshtein distances among words with same
meaning and averages on all the items of a given list. Decisions about the
number of items in the input lists for language comparison have been debated
since the beginning of glottochronology. The point is that words associated to
some of the meanings have a rapid lexical evolution. Therefore, a large
vocabulary comparison is only apparently more accurate then a smaller one since
many of the words do not carry any useful information. In principle, one should
find the optimal length of the input lists studying the stability of the
different items. In this paper we tackle the problem with an automated
methodology only based on our normalized Levenshtein distance. With this
approach, the program of an automated reconstruction of languages relationships
is completed
Assimilation in Multilingual Cities
We characterise how the assimilation patterns of minorities into the strong and the weak language differ in a situation of asymmetric bilingualism. Using large variations in language composition in Canadian cities from the 2001 and 2006 Censuses, we show that the differences in the knowledge of English by immigrant allophones (i.e. the immigrants with a mother tongue other than English and French) in English-majority cities are mainly due to sorting across cities. Instead, in French-majority cities, learning plays an important role in explaining differences in knowledge of French. In addition, the presence of large anglophone minorities deters much more the assimilation into French than the presence of francophone minorities deters the assimilation into English. Finally, we find that language distance plays a much more important role in explaining assimilation into French, and that assimilation into French is much more sensitive to individual characteristics than assimilation into English. Some of these asymmetric assimilation patterns extend to anglophone and francophone immigrants, but no evidence of learning is found in this case
QAPgrid: A Two Level QAP-Based Approach for Large-Scale Data Analysis and Visualization
Background: The visualization of large volumes of data is a computationally challenging task that often promises rewarding new insights. There is great potential in the application of new algorithms and models from combinatorial optimisation. Datasets often contain âhidden regularitiesâ and a combined identification and visualization method should reveal these structures and present them in a way that helps analysis. While several methodologies exist, including those that use non-linear optimization algorithms, severe limitations exist even when working with only a few hundred objects. Methodology/Principal Findings: We present a new data visualization approach (QAPgrid) that reveals patterns of similarities and differences in large datasets of objects for which a similarity measure can be computed. Objects are assigned to positions on an underlying square grid in a two-dimensional space. We use the Quadratic Assignment Problem (QAP) as a mathematical model to provide an objective function for assignment of objects to positions on the grid. We employ a Memetic Algorithm (a powerful metaheuristic) to tackle the large instances of this NP-hard combinatorial optimization problem, and we show its performance on the visualization of real data sets. Conclusions/Significance: Overall, the results show that QAPgrid algorithm is able to produce a layout that represents the relationships between objects in the data set. Furthermore, it also represents the relationships between clusters that are feed into the algorithm. We apply the QAPgrid on the 84 Indo-European languages instance, producing a near-optimal layout. Next, we produce a layout of 470 world universities with an observed high degree of correlation with the score used by the Academic Ranking of World Universities compiled in the The Shanghai Jiao Tong University Academic Ranking of World Universities without the need of an ad hoc weighting of attributes. Finally, our Gene Ontology-based study on Saccharomyces cerevisiae fully demonstrates the scalability and precision of our method as a novel alternative tool for functional genomics
Dated ancestral trees from binary trait data and their application to the diversification of languages
Binary trait data record the presence or absence of distinguishing traits in individuals. We treat the problem of estimating ancestral trees with time depth from binary trait data. Simple analysis of such data is problematic. Each homology class of traits has a unique birth event on the tree, and the birth event of a trait that is visible at the leaves is biased towards the leaves. We propose a model-based analysis of such data and present a Markov chain Monte Carlo algorithm that can sample from the resulting posterior distribution. Our model is based on using a birth-death process for the evolution of the elements of sets of traits. Our analysis correctly accounts for the removal of singleton traits, which are commonly discarded in real data sets. We illustrate Bayesian inference for two binary trait data sets which arise in historical linguistics. The Bayesian approach allows for the incorporation of information from ancestral languages. The marginal prior distribution of the root time is uniform. We present a thorough analysis of the robustness of our results to model misspecification, through analysis of predictive distributions for external data, and fitting data that are simulated under alternative observation models. The reconstructed ages of tree nodes are relatively robust, whereas posterior probabilities for topology are not reliable. Copyright (c) 2008 Royal Statistical Society.
Recommended from our members
Frequency of word-use predicts rates of lexical evolution throughout Indo-European history
Greek speakers say 'oÏ
ÏÎŹ', Germans 'schwanz', and the French 'queue' to describe what English speakers call a 'tail', but all of these languages use a related form of 'two' to describe the number after one. Among over one hundred Indo-European languages and dialects, the words for some meanings, such as 'tail', evolve rapidly, being expressed across languages by dozens of unrelated words, whilst others evolve much more slowly, such as the number 'two' for which all Indo-European language speakers use the same related word-form. No general linguistic mechanism has been advanced to explain this striking variation in rates of lexical replacement among meanings. Here we use four large and divergent language corpora (English, Spanish, Russian and Greek) and a comparative database of 200 fundamental vocabulary meanings in 87 Indo-European languages to show that the frequency with which these words are used in modern language predicts their rate of replacement over thousands of years of Indo-European language evolution. Across all 200 meanings, frequently used words evolve at slower rates and infrequently used words evolve more rapidly. This relationship holds separately and identically across parts of speech for each of the four language corpora and accounts for approximately 50% of the variation observed in historical rates of lexical replacement. We propose that the frequency with which specific words are used in everyday language exerts a general and law-like influence on their rates of evolution. Our findings are consistent with social models of word change that emphasise the role of selection, and suggest that owing to the ways that humans use language, some words will evolve slowly and others rapidly across all languages.Citation: Pagel, M., Atkinson, Q. D. & Meade, A. (2007). ' Frequency of word use predicts rates of lexical evolution throughout Indo-European history', Nature, 449, 717-720. [Available at http://www.nature.com/nature/index.html]. N.B. Dr Atkinson is now based at the Institute of Cognitive and Evolutionary Anthropology, University of Oxford.