392 research outputs found
Fit of Fossils and Mammalian Molecular Trees: Dating Inconsistencies Revisited
Divergence time estimation requires the reconciliation of two major sources
of data. These are fossil and/or biogeographic evidence that give estimates of
the absolute age of nodes (ancestors) and molecular estimates that give us
estimates of the relative ages of nodes in a molecular evolutionary tree. Both
forms of data are often best characterized as yielding continuous probability
distributions on nodes. Here, the distributions modeling older fossil
calibrations within the tree of placental (eutherian) mammals are reconsidered.
In particular the Horse/Rhino, Human/Tarsier, Whale/ Hippo, Rabbit/Pika and
Rodentia calibrations are reexamined and adjusted. Inferring the relative ages
of nodes in a phylogeny also requires the assumption of a model of evolutionary
rate change across the tree. Here nine models of evolutionary rate change, are
combined with various continuous distributions modeling fossil calibrations.
Fit of model is measured both relative to a normalized fit, which assumes that
all models fit well in the absence of multiple fossil calibrations, and also by
the linearity of their residuals. The normalized fit used attempts to track
twice the log likelihood difference from the best expected model. The results
suggest there is a very large difference in the age of the root proposed by
calibrations in Supraprimates (informally Euarchontoglires) versus
Laurasiatheria. Combining both sets of calibrations results in the penalty
function vastly increasing in all cases. These issues remain irrespective of
the model used or whether the newer calibrations are used
New g%AIC, g%AICc, g%BIC, and Power Divergence Fit Statistics Expose Mating between Modern Humans, Neanderthals and other Archaics
The purpose of this article is to look at how information criteria, such as
AIC and BIC, relate to the g%SD fit criterion derived in Waddell et al. (2007,
2010a). The g%SD criterion measures the fit of data to model based on a
normalized weighted root mean square percentage deviation between the observed
data and model estimates of the data, with g%SD = 0 being a perfectly fitting
model. However, this criterion may not be adjusting for the number of
parameters in the model comprehensively. Thus, its relationship to more
traditional measures for maximizing useful information in a model, including
AIC and BIC, are examined. This results in an extended set of fit criteria
including g%AIC and g%BIC. Further, a broader range of asymptotically most
powerful fit criteria of the power divergence family, which includes maximum
likelihood (or minimum G^2) and minimum X^2 modeling as special cases, are used
to replace the sum of squares fit criterion within the g%SD criterion. Results
are illustrated with a set of genetic distances looking particularly at a range
of Jewish populations, plus a genomic data set that looks at how Neanderthals
and Denisovans are related to each other and modern humans. Evidence that Homo
erectus may have left a significant fraction of its genome within the Denisovan
is shown to persist with the new modeling criteria
Combined Sum of Squares Penalties for Molecular Divergence Time Estimation
Estimates of molecular divergence times when rates of evolution vary require
the assumption of a model of rate change. Brownian motion is one such model,
and since rates cannot become negative, a log Brownian model seems appropriate.
Divergence time estimates can then be made using weighted least squares
penalties. As sequences become long, this approach effectively becomes
equivalent to penalized likelihood or Bayesian approaches. Different forms of
the least squares penalty are considered to take into account correlation due
to shared ancestors. It is shown that a scale parameter is also needed since
the sum of squares changes with the scale of time. Errors or uncertainty on
fossil calibrations, may be folded in with errors due to the stochastic nature
of Brownian motion and ancestral polymorphism, giving a total sum of squares to
be minimized. Applying these methods to placental mammal data the estimated age
of the root decreases from 125 to about 94 mybp. However, multiple fossil
calibration points and relative molecular divergence times inflate the sum of
squares more than expected. If fossil data are also bootstrapped, then the
confidence interval for the root of placental mammals varies widely from ~70 to
130 mybp. Such a wide interval suggests that more and better fossil calibration
data is needed and/or better models of rate evolution are needed and/or better
molecular data are needed. Until these issues are thoroughly investigated, it
is premature to declare either the old molecular dates frequently obtained
(e.g. > 110 mybp) or the lack of identified placental fossils in the
Cretaceous, more indicative of when crown-group placental mammals evolved.Comment: 18 pages, 3 figures, 2 table
Homo denisova, Correspondence Spectral Analysis, Finite Sites Reticulate Hierarchical Coalescent Models and the Ron Jeremy Hypothesis
This article shows how to fit reticulate finite and infinite sites sequence
spectra to aligned data from five modern human genomes (San, Yoruba, French,
Han and Papuan) plus two archaic humans (Denisovan and Neanderthal), to better
infer demographic parameters. These include interbreeding between distinct
lineages. Major improvements in the fit of the sequence spectrum are made with
successively more complicated models. Findings include some evidence of a male
biased gene flow from the Denisova lineage to Papuan ancestors and possibly
even more archaic gene flow. It is unclear if there is evidence for more than
one Neanderthal interbreeding, as the evidence suggesting this largely
disappears when a finite sites model is fitted.Comment: 43 pages, 9 figures, 9 table
What use are Exponential Weights for flexi-Weighted Least Squares Phylogenetic Trees?
The method of flexi-Weighted Least Squares on evolutionary trees uses simple
polynomial or exponential functions of the evolutionary distance in place of
model-based variances. This has the advantage that unexpected deviations from
additivity can be modeled in a more flexible way. At present, only polynomial
weights have been used. However, a general family of exponential weights is
desirable to compare with polynomial weights and to potentially exploit recent
insights into fast least squares edge length estimation on trees. Here describe
families of weights that are multiplicative on trees, along with measures of
fit of data to tree. It is shown that polynomial, but also multiplicative
weights can approximate model-based variance of evolutionary distances well.
Both models are fitted to evolutionary data from yeast genomes and while the
polynomial weights model fits better, the exponential weights model can fit a
lot better than ordinary least squares. Iterated least squares is evaluated and
is seen to converge quickly and with minimal change in the fit statistics when
the data are in the range expected for the useful evolutionary distances and
simple Markov models of character change. In summary, both polynomial and
exponential weighted least squares work well and justify further investment
into developing the fastest possible algorithms for evaluating evolutionary
trees.Comment: 16 pages, 7 figure
Measuring Fit of Sequence Data to Phylogenetic Model: Gain of Power using Marginal Tests
Testing fit of data to model is fundamentally important to any science, but
publications in the field of phylogenetics rarely do this. Such analyses
discard fundamental aspects of science as prescribed by Karl Popper. Indeed,
not without cause, Popper (1978) once argued that evolutionary biology was
unscientific as its hypotheses were untestable. Here we trace developments in
assessing fit from Penny et al. (1982) to the present. We compare the general
log-likelihood ratio (the G or G2 statistic) statistic between the evolutionary
tree model and the multinomial model with that of marginalized tests applied to
an alignment (using placental mammal coding sequence data). It is seen that the
most general test does not reject the fit of data to model (p~0.5), but the
marginalized tests do. Tests on pair-wise frequency (F) matrices, strongly (p <
0.001) reject the most general phylogenetic (GTR) models commonly in use. It is
also clear (p < 0.01) that the sequences are not stationary in their nucleotide
composition. Deviations from stationarity and homogeneity seem to be unevenly
distributed amongst taxa; not necessarily those expected from examining other
regions of the genome. By marginalizing the 4t patterns of the i.i.d. model to
observed and expected parsimony counts, that is, from constant sites, to
singletons, to parsimony informative characters of a minimum possible length,
then the likelihood ratio test regains power, and it too rejects the
evolutionary model with p << 0.001. Given such behavior over relatively recent
evolutionary time, readers in general should maintain a healthy skepticism of
results, as the scale of the systematic errors in published analyses may really
be far larger than the analytical methods (e.g., bootstrap) report
Extended Distance-based Phylogenetic Analyses Applied to 3D Homo Fossil Skull Evolution
This article shows how 3D geometric morphometric data can be analyzed using
newly developed distance-based evolutionary tree inference methods, with
extensions to planar graphs. Application of these methods to 3D representations
of the skullcap (calvaria) of 13 diverse skulls in the genus Homo, ranging from
Homo erectus (ergaster) at about 1.6 mya, all the way forward to modern humans,
yields a remarkably clear phylogenetic tree. Various evolutionary hypotheses
are tested. Results of these tests include rejection of the monophyly of Homo
heidelbergensis, the Multi-Regional hypothesis, and the hypothesis that the
unusual 12,000 year old (12kya) Iwo Eleru skull represents a modern human.
Rather, by quantitative phylogenetic analyses the latter is seen to be an old
(200-400kya) lineage that probably represents a novel African species, Homo
iwoelerueensis. It diverged after the lineage leading to Neanderthals, and may
have been driven to extinction in the last 10kya by modern humans, Homo
sapiens, another African species of Homo that appeared about 100kya. Another
enigmatic skull, Qafzeh 6 from the Middle East about 90kya, appears to be a
hybrid of two thirds near, but not, anatomically modern human and one third of
an archaic lineage diverging close to classic European Neanderthals. Overall,
the tree clearly implies an accelerating rate of skullcap shape change, and by
extension, change of the underlying brain, over the last 400kya in Africa. This
acceleration may have extended right up to the origin of modern humans. Methods
of distance-based evolutionary tree inference are refined and extended, with
particular attention to diagnosing the model and achieving a better fit. This
includes power transformations of the input data which favor root Procrustes
distances.Comment: 42 pages, 18 figure
Comparing a Menagerie of Models for Estimating Molecular Divergence Times
Estimation of molecular evolutionary divergence times requires models of rate
change. These vary with regard to the assumption of what quantity is penalized.
The possibilities considered are the rate of evolution, the log of the rate of
evolution and the inverse of the rate of evolution. These models also vary with
regard to how time affects the expected variance of rate change. Here the
alternatives are not at all, linearly with time and as the product of rate and
time. This results in a set of nine models, both random walks and Brownian
motion. A priori any of these models could be correct, yet different
researchers may well prefer, or simply use, one rather than the others. Another
variable is whether to use a scaling factor to take account of the variance of
the process of rate change being unknown and therefore avoid minimizing the
penalty function with unrealistically large times. Here the difference these
models and assumptions make on a tree of mammals, with the root fixed and with
a single internal node fixed, is measured. The similarity of models is measured
as the correlation of their time estimates and visualized with a least squares
tree. The fit of model to data is measured and Q-Q plots are shown. Comparing
model estimates with each other, the age of clades within Laurasiatheria are
seen to vary far more across models than those within Supraprimates (informally
called Euarchontoglires). Especially problematic are the often-used fossil
calibrated nodes of horse/rhino and whale/hippo clashing with times within
Supraprimates and in particular no fossil rodent teeth older than ~60 mybp. A
scaling factor in addition to penalizing rate change is seen to yield
consistent relative time estimates irrespective of exactly where the
calibration point is placed
Happy New Year Homo erectus? More evidence for interbreeding with archaics predating the modern human/Neanderthal split
A range of a priori hypotheses about the evolution of modern and archaic
genomes are further evaluated and tested. In addition to the well-known
splits/introgressions involving Neanderthal genes into out-of- Africa people,
or Denisovan genes into Oceanians, a further series of archaic splits and
hypotheses proposed in Waddell et al. (2011) are considered in detail. These
include signals of Denisovans with something markedly more archaic and possibly
something more archaic into Papuans as well. These are compared and contrasted
with some well-advertised introgressions such as Denisovan genes across East
Asia, archaic genes into San or non-tree mixing between Oceanians, East Asians
and Europeans. The general result is that these less appreciated and surprising
archaic splits have just as much or more support in genome sequence data.
Further, evaluation confirms the hypothesis that archaic genes are much rarer
on modern X chromosomes, and may even be near totally absent, suggesting strong
selection against their introgression. Modeling of relative split weights
allows an inference of the proportion of the genome the Denisovan seems to have
gotten from an older archaic, and the best estimate is around 2%. Using a mix
of quantitative and qualitative morphological data and novel phylogenetic
methods, robust support is found for multiple distinct middle Pleistocene
lineages. Of these, fossil hominids such as SH5, Petralona, and Dali, in
particular, look like prime candidates for contributing pre-Neanderthal/Modern
archaic genes to Denisovans, while the Jinniu-Shan fossil looks like the best
candidate for a close relative of the Denisovan. That the Papuans might have
received some truly archaic genes appears a good possibility and they might
even be from Homo erectus.Comment: 29 pages, 10 figures, 6 table
Expanded Distance-based Phylogenetic Analyses of Fossil Homo Skull Shape Evolution
Analyses of a set of 47 fossil and 4 modern skulls using phylogenetic
geometric morphometric methods corroborate and refine earlier results. These
include evidence that the African Iwo Eleru skull, only about 12,000 years old,
indeed represents a new species of near human. In contrast, the earliest known
anatomically modern human skull, Qafzeh 9, the skull of Eve from
Israel/Palestine, is validated as fully modern in form. Analyses clearly show
evidence of archaic introgression into Gravettian, pre_Gravettian, Qafzeh, and
Upper Cave (China) populations of near modern humans, and in about that order
of increasing archaic content. The enigmatic Saldahna (Elandsfontein) skull
emerges as a probable first representative of that lineage, which exclusive of
Neanderthals that, eventually lead to modern humans. There is also evidence
that the poorly dated Kabwe (Broken Hill) skull represents a much earlier
distinct lineage. The clarity of the results bode well for quantitative
statistical phylogenetic methods making significant inroads in the stalemates
of paleoanthropology
- …