Search CORE

392 research outputs found

Fit of Fossils and Mammalian Molecular Trees: Dating Inconsistencies Revisited

Author: Waddell Peter J
Publication venue
Publication date: 30/12/2008
Field of study

Divergence time estimation requires the reconciliation of two major sources of data. These are fossil and/or biogeographic evidence that give estimates of the absolute age of nodes (ancestors) and molecular estimates that give us estimates of the relative ages of nodes in a molecular evolutionary tree. Both forms of data are often best characterized as yielding continuous probability distributions on nodes. Here, the distributions modeling older fossil calibrations within the tree of placental (eutherian) mammals are reconsidered. In particular the Horse/Rhino, Human/Tarsier, Whale/ Hippo, Rabbit/Pika and Rodentia calibrations are reexamined and adjusted. Inferring the relative ages of nodes in a phylogeny also requires the assumption of a model of evolutionary rate change across the tree. Here nine models of evolutionary rate change, are combined with various continuous distributions modeling fossil calibrations. Fit of model is measured both relative to a normalized fit, which assumes that all models fit well in the absence of multiple fossil calibrations, and also by the linearity of their residuals. The normalized fit used attempts to track twice the log likelihood difference from the best expected model. The results suggest there is a very large difference in the age of the root proposed by calibrations in Supraprimates (informally Euarchontoglires) versus Laurasiatheria. Combining both sets of calibrations results in the penalty function vastly increasing in all cases. These issues remain irrespective of the model used or whether the newer calibrations are used

arXiv.org e-Print Archive

New g%AIC, g%AICc, g%BIC, and Power Divergence Fit Statistics Expose Mating between Modern Humans, Neanderthals and other Archaics

Author: Tan Xi
Waddell Peter J.
Publication venue
Publication date: 30/12/2012
Field of study

The purpose of this article is to look at how information criteria, such as AIC and BIC, relate to the g%SD fit criterion derived in Waddell et al. (2007, 2010a). The g%SD criterion measures the fit of data to model based on a normalized weighted root mean square percentage deviation between the observed data and model estimates of the data, with g%SD = 0 being a perfectly fitting model. However, this criterion may not be adjusting for the number of parameters in the model comprehensively. Thus, its relationship to more traditional measures for maximizing useful information in a model, including AIC and BIC, are examined. This results in an extended set of fit criteria including g%AIC and g%BIC. Further, a broader range of asymptotically most powerful fit criteria of the power divergence family, which includes maximum likelihood (or minimum G^2) and minimum X^2 modeling as special cases, are used to replace the sum of squares fit criterion within the g%SD criterion. Results are illustrated with a set of genetic distances looking particularly at a range of Jewish populations, plus a genomic data set that looks at how Neanderthals and Denisovans are related to each other and modern humans. Evidence that Homo erectus may have left a significant fraction of its genome within the Denisovan is shown to persist with the new modeling criteria

arXiv.org e-Print Archive

Combined Sum of Squares Penalties for Molecular Divergence Time Estimation

Author: Kalakota Prasanth
Waddell Peter J.
Publication venue
Publication date: 23/07/2007
Field of study

Estimates of molecular divergence times when rates of evolution vary require the assumption of a model of rate change. Brownian motion is one such model, and since rates cannot become negative, a log Brownian model seems appropriate. Divergence time estimates can then be made using weighted least squares penalties. As sequences become long, this approach effectively becomes equivalent to penalized likelihood or Bayesian approaches. Different forms of the least squares penalty are considered to take into account correlation due to shared ancestors. It is shown that a scale parameter is also needed since the sum of squares changes with the scale of time. Errors or uncertainty on fossil calibrations, may be folded in with errors due to the stochastic nature of Brownian motion and ancestral polymorphism, giving a total sum of squares to be minimized. Applying these methods to placental mammal data the estimated age of the root decreases from 125 to about 94 mybp. However, multiple fossil calibration points and relative molecular divergence times inflate the sum of squares more than expected. If fossil data are also bootstrapped, then the confidence interval for the root of placental mammals varies widely from ~70 to 130 mybp. Such a wide interval suggests that more and better fossil calibration data is needed and/or better models of rate evolution are needed and/or better molecular data are needed. Until these issues are thoroughly investigated, it is premature to declare either the old molecular dates frequently obtained (e.g. > 110 mybp) or the lack of identified placental fossils in the Cretaceous, more indicative of when crown-group placental mammals evolved.Comment: 18 pages, 3 figures, 2 table

arXiv.org e-Print Archive

Homo denisova, Correspondence Spectral Analysis, Finite Sites Reticulate Hierarchical Coalescent Models and the Ron Jeremy Hypothesis

Author: Ramos Jorge
Tan Xi
Waddell Peter J.
Publication venue
Publication date: 29/12/2011
Field of study

This article shows how to fit reticulate finite and infinite sites sequence spectra to aligned data from five modern human genomes (San, Yoruba, French, Han and Papuan) plus two archaic humans (Denisovan and Neanderthal), to better infer demographic parameters. These include interbreeding between distinct lineages. Major improvements in the fit of the sequence spectrum are made with successively more complicated models. Findings include some evidence of a male biased gene flow from the Denisova lineage to Papuan ancestors and possibly even more archaic gene flow. It is unclear if there is evidence for more than one Neanderthal interbreeding, as the evidence suggesting this largely disappears when a finite sites model is fitted.Comment: 43 pages, 9 figures, 9 table

arXiv.org e-Print Archive

What use are Exponential Weights for flexi-Weighted Least Squares Phylogenetic Trees?

Author: Khan Ishita
Tan Xi
Waddell Peter J.
Publication venue
Publication date: 29/12/2010
Field of study

The method of flexi-Weighted Least Squares on evolutionary trees uses simple polynomial or exponential functions of the evolutionary distance in place of model-based variances. This has the advantage that unexpected deviations from additivity can be modeled in a more flexible way. At present, only polynomial weights have been used. However, a general family of exponential weights is desirable to compare with polynomial weights and to potentially exploit recent insights into fast least squares edge length estimation on trees. Here describe families of weights that are multiplicative on trees, along with measures of fit of data to tree. It is shown that polynomial, but also multiplicative weights can approximate model-based variance of evolutionary distances well. Both models are fitted to evolutionary data from yeast genomes and while the polynomial weights model fits better, the exponential weights model can fit a lot better than ordinary least squares. Iterated least squares is evaluated and is seen to converge quickly and with minimal change in the fit statistics when the data are in the range expected for the useful evolutionary distances and simple Markov models of character change. In summary, both polynomial and exponential weighted least squares work well and justify further investment into developing the fastest possible algorithms for evaluating evolutionary trees.Comment: 16 pages, 7 figure

arXiv.org e-Print Archive

Measuring Fit of Sequence Data to Phylogenetic Model: Gain of Power using Marginal Tests

Author: Ota Rissa
Penny David
Waddell Peter J.
Publication venue
Publication date: 30/12/2008
Field of study

Testing fit of data to model is fundamentally important to any science, but publications in the field of phylogenetics rarely do this. Such analyses discard fundamental aspects of science as prescribed by Karl Popper. Indeed, not without cause, Popper (1978) once argued that evolutionary biology was unscientific as its hypotheses were untestable. Here we trace developments in assessing fit from Penny et al. (1982) to the present. We compare the general log-likelihood ratio (the G or G2 statistic) statistic between the evolutionary tree model and the multinomial model with that of marginalized tests applied to an alignment (using placental mammal coding sequence data). It is seen that the most general test does not reject the fit of data to model (p~0.5), but the marginalized tests do. Tests on pair-wise frequency (F) matrices, strongly (p < 0.001) reject the most general phylogenetic (GTR) models commonly in use. It is also clear (p < 0.01) that the sequences are not stationary in their nucleotide composition. Deviations from stationarity and homogeneity seem to be unevenly distributed amongst taxa; not necessarily those expected from examining other regions of the genome. By marginalizing the 4t patterns of the i.i.d. model to observed and expected parsimony counts, that is, from constant sites, to singletons, to parsimony informative characters of a minimum possible length, then the likelihood ratio test regains power, and it too rejects the evolutionary model with p << 0.001. Given such behavior over relatively recent evolutionary time, readers in general should maintain a healthy skepticism of results, as the scale of the systematic errors in published analyses may really be far larger than the analytical methods (e.g., bootstrap) report

arXiv.org e-Print Archive

Extended Distance-based Phylogenetic Analyses Applied to 3D Homo Fossil Skull Evolution

Author: Waddell Peter J.
Publication venue
Publication date: 30/12/2014
Field of study

This article shows how 3D geometric morphometric data can be analyzed using newly developed distance-based evolutionary tree inference methods, with extensions to planar graphs. Application of these methods to 3D representations of the skullcap (calvaria) of 13 diverse skulls in the genus Homo, ranging from Homo erectus (ergaster) at about 1.6 mya, all the way forward to modern humans, yields a remarkably clear phylogenetic tree. Various evolutionary hypotheses are tested. Results of these tests include rejection of the monophyly of Homo heidelbergensis, the Multi-Regional hypothesis, and the hypothesis that the unusual 12,000 year old (12kya) Iwo Eleru skull represents a modern human. Rather, by quantitative phylogenetic analyses the latter is seen to be an old (200-400kya) lineage that probably represents a novel African species, Homo iwoelerueensis. It diverged after the lineage leading to Neanderthals, and may have been driven to extinction in the last 10kya by modern humans, Homo sapiens, another African species of Homo that appeared about 100kya. Another enigmatic skull, Qafzeh 6 from the Middle East about 90kya, appears to be a hybrid of two thirds near, but not, anatomically modern human and one third of an archaic lineage diverging close to classic European Neanderthals. Overall, the tree clearly implies an accelerating rate of skullcap shape change, and by extension, change of the underlying brain, over the last 400kya in Africa. This acceleration may have extended right up to the origin of modern humans. Methods of distance-based evolutionary tree inference are refined and extended, with particular attention to diagnosing the model and achieving a better fit. This includes power transformations of the input data which favor root Procrustes distances.Comment: 42 pages, 18 figure

arXiv.org e-Print Archive

Comparing a Menagerie of Models for Estimating Molecular Divergence Times

Author: Waddell Peter J
Publication venue
Publication date: 28/12/2007
Field of study

Estimation of molecular evolutionary divergence times requires models of rate change. These vary with regard to the assumption of what quantity is penalized. The possibilities considered are the rate of evolution, the log of the rate of evolution and the inverse of the rate of evolution. These models also vary with regard to how time affects the expected variance of rate change. Here the alternatives are not at all, linearly with time and as the product of rate and time. This results in a set of nine models, both random walks and Brownian motion. A priori any of these models could be correct, yet different researchers may well prefer, or simply use, one rather than the others. Another variable is whether to use a scaling factor to take account of the variance of the process of rate change being unknown and therefore avoid minimizing the penalty function with unrealistically large times. Here the difference these models and assumptions make on a tree of mammals, with the root fixed and with a single internal node fixed, is measured. The similarity of models is measured as the correlation of their time estimates and visualized with a least squares tree. The fit of model to data is measured and Q-Q plots are shown. Comparing model estimates with each other, the age of clades within Laurasiatheria are seen to vary far more across models than those within Supraprimates (informally called Euarchontoglires). Especially problematic are the often-used fossil calibrated nodes of horse/rhino and whale/hippo clashing with times within Supraprimates and in particular no fossil rodent teeth older than ~60 mybp. A scaling factor in addition to penalizing rate change is seen to yield consistent relative time estimates irrespective of exactly where the calibration point is placed

arXiv.org e-Print Archive

Happy New Year Homo erectus? More evidence for interbreeding with archaics predating the modern human/Neanderthal split

Author: Waddell Peter J.
Publication venue
Publication date: 30/12/2013
Field of study

A range of a priori hypotheses about the evolution of modern and archaic genomes are further evaluated and tested. In addition to the well-known splits/introgressions involving Neanderthal genes into out-of- Africa people, or Denisovan genes into Oceanians, a further series of archaic splits and hypotheses proposed in Waddell et al. (2011) are considered in detail. These include signals of Denisovans with something markedly more archaic and possibly something more archaic into Papuans as well. These are compared and contrasted with some well-advertised introgressions such as Denisovan genes across East Asia, archaic genes into San or non-tree mixing between Oceanians, East Asians and Europeans. The general result is that these less appreciated and surprising archaic splits have just as much or more support in genome sequence data. Further, evaluation confirms the hypothesis that archaic genes are much rarer on modern X chromosomes, and may even be near totally absent, suggesting strong selection against their introgression. Modeling of relative split weights allows an inference of the proportion of the genome the Denisovan seems to have gotten from an older archaic, and the best estimate is around 2%. Using a mix of quantitative and qualitative morphological data and novel phylogenetic methods, robust support is found for multiple distinct middle Pleistocene lineages. Of these, fossil hominids such as SH5, Petralona, and Dali, in particular, look like prime candidates for contributing pre-Neanderthal/Modern archaic genes to Denisovans, while the Jinniu-Shan fossil looks like the best candidate for a close relative of the Denisovan. That the Papuans might have received some truly archaic genes appears a good possibility and they might even be from Homo erectus.Comment: 29 pages, 10 figures, 6 table

arXiv.org e-Print Archive

Expanded Distance-based Phylogenetic Analyses of Fossil Homo Skull Shape Evolution

Author: Waddell Peter J.
Publication venue
Publication date: 30/12/2015
Field of study

Analyses of a set of 47 fossil and 4 modern skulls using phylogenetic geometric morphometric methods corroborate and refine earlier results. These include evidence that the African Iwo Eleru skull, only about 12,000 years old, indeed represents a new species of near human. In contrast, the earliest known anatomically modern human skull, Qafzeh 9, the skull of Eve from Israel/Palestine, is validated as fully modern in form. Analyses clearly show evidence of archaic introgression into Gravettian, pre_Gravettian, Qafzeh, and Upper Cave (China) populations of near modern humans, and in about that order of increasing archaic content. The enigmatic Saldahna (Elandsfontein) skull emerges as a probable first representative of that lineage, which exclusive of Neanderthals that, eventually lead to modern humans. There is also evidence that the poorly dated Kabwe (Broken Hill) skull represents a much earlier distinct lineage. The clarity of the results bode well for quantitative statistical phylogenetic methods making significant inroads in the stalemates of paleoanthropology

arXiv.org e-Print Archive