42,728 research outputs found
Evaluation of phylogenetic reconstruction methods using bacterial whole genomes: a simulation based study
Background: Phylogenetic reconstruction is a necessary first step in many analyses which use whole genome sequence data from bacterial populations. There are many available methods to infer phylogenies, and these have various advantages and disadvantages, but few unbiased comparisons of the range of approaches have been made. Methods: We simulated data from a defined "true tree" using a realistic evolutionary model. We built phylogenies from this data using a range of methods, and compared reconstructed trees to the true tree using two measures, noting the computational time needed for different phylogenetic reconstructions. We also used real data from Streptococcus pneumoniae alignments to compare individual core gene trees to a core genome tree. Results: We found that, as expected, maximum likelihood trees from good quality alignments were the most accurate, but also the most computationally intensive. Using less accurate phylogenetic reconstruction methods, we were able to obtain results of comparable accuracy; we found that approximate results can rapidly be obtained using genetic distance based methods. In real data we found that highly conserved core genes, such as those involved in translation, gave an inaccurate tree topology, whereas genes involved in recombination events gave inaccurate branch lengths. We also show a tree-of-trees, relating the results of different phylogenetic reconstructions to each other. Conclusions: We recommend three approaches, depending on requirements for accuracy and computational time. Quicker approaches that do not perform full maximum likelihood optimisation may be useful for many analyses requiring a phylogeny, as generating a high quality input alignment is likely to be the major limiting factor of accurate tree topology. We have publicly released our simulated data and code to enable further comparisons
Inferring phylogenetic networks with maximum pseudolikelihood under incomplete lineage sorting
Phylogenetic networks are necessary to represent the tree of life expanded by
edges to represent events such as horizontal gene transfers, hybridizations or
gene flow. Not all species follow the paradigm of vertical inheritance of their
genetic material. While a great deal of research has flourished into the
inference of phylogenetic trees, statistical methods to infer phylogenetic
networks are still limited and under development. The main disadvantage of
existing methods is a lack of scalability. Here, we present a statistical
method to infer phylogenetic networks from multi-locus genetic data in a
pseudolikelihood framework. Our model accounts for incomplete lineage sorting
through the coalescent model, and for horizontal inheritance of genes through
reticulation nodes in the network. Computation of the pseudolikelihood is fast
and simple, and it avoids the burdensome calculation of the full likelihood
which can be intractable with many species. Moreover, estimation at the
quartet-level has the added computational benefit that it is easily
parallelizable. Simulation studies comparing our method to a full likelihood
approach show that our pseudolikelihood approach is much faster without
compromising accuracy. We applied our method to reconstruct the evolutionary
relationships among swordtails and platyfishes (: Poeciliidae),
which is characterized by widespread hybridizations
Using neutral cline decay to estimate contemporary dispersal: a generic tool and its application to a major crop pathogen
Dispersal is a key parameter of adaptation, invasion and persistence. Yet standard population genetics inference methods hardly distinguish it from drift and many species cannot be studied by direct mark-recapture methods. Here, we introduce a method using rates of change in cline shapes for neutral markers to estimate contemporary dispersal. We apply it to the devastating banana pest Mycosphaerella fijiensis, a wind-dispersed fungus for which a secondary contact zone had previously been detected using landscape genetics tools. By tracking the spatio-temporal frequency change of 15 microsatellite markers, we find that σ, the standard deviation of parent–offspring dispersal distances, is 1.2 km/generation1/2. The analysis is further shown robust to a large range of dispersal kernels. We conclude that combining landscape genetics approaches to detect breaks in allelic frequencies with analyses of changes in neutral genetic clines offers a powerful way to obtain ecologically relevant estimates of dispersal in many species
Integrating genealogical and dynamical modelling to infer escape and reversion rates in HIV epitopes
The rates of escape and reversion in response to selection pressure arising
from the host immune system, notably the cytotoxic T-lymphocyte (CTL) response,
are key factors determining the evolution of HIV. Existing methods for
estimating these parameters from cross-sectional population data using ordinary
differential equations (ODE) ignore information about the genealogy of sampled
HIV sequences, which has the potential to cause systematic bias and
over-estimate certainty. Here, we describe an integrated approach, validated
through extensive simulations, which combines genealogical inference and
epidemiological modelling, to estimate rates of CTL escape and reversion in HIV
epitopes. We show that there is substantial uncertainty about rates of viral
escape and reversion from cross-sectional data, which arises from the inherent
stochasticity in the evolutionary process. By application to empirical data, we
find that point estimates of rates from a previously published ODE model and
the integrated approach presented here are often similar, but can also differ
several-fold depending on the structure of the genealogy. The model-based
approach we apply provides a framework for the statistical analysis of escape
and reversion in population data and highlights the need for longitudinal and
denser cross-sectional sampling to enable accurate estimate of these key
parameters
Increasing the density of available pareto optimal solutions
The set of available multi-objective optimization
algorithms continues to grow. This fact can be partially attributed to their widespread use and applicability. However this increase also suggests several issues remain to be addressed satisfactorily. One such issue is the diversity and the number of solutions available to the decision maker (DM). Even for algorithms very well suited for a particular problem, it is difficult - mainly due
to the computational cost - to use a population large enough
to ensure the likelihood of obtaining a solution close to the DMs preferences. In this paper we present a novel methodology that produces additional Pareto optimal solutions from a Pareto optimal set obtained at the end run of any multi-objective optimization algorithm. This method, which we refer to as Pareto estimation, is tested against a set of 2 and 3-objective test problems and a 3-objective portfolio optimization problem to illustrate its’ utility for a real-world problem
Inference of population splits and mixtures from genome-wide allele frequency data
Many aspects of the historical relationships between populations in a species
are reflected in genetic data. Inferring these relationships from genetic data,
however, remains a challenging task. In this paper, we present a statistical
model for inferring the patterns of population splits and mixtures in multiple
populations. In this model, the sampled populations in a species are related to
their common ancestor through a graph of ancestral populations. Using
genome-wide allele frequency data and a Gaussian approximation to genetic
drift, we infer the structure of this graph. We applied this method to a set of
55 human populations and a set of 82 dog breeds and wild canids. In both
species, we show that a simple bifurcating tree does not fully describe the
data; in contrast, we infer many migration events. While some of the migration
events that we find have been detected previously, many have not. For example,
in the human data we infer that Cambodians trace approximately 16% of their
ancestry to a population ancestral to other extant East Asian populations. In
the dog data, we infer that both the boxer and basenji trace a considerable
fraction of their ancestry (9% and 25%, respectively) to wolves subsequent to
domestication, and that East Asian toy breeds (the Shih Tzu and the Pekingese)
result from admixture between modern toy breeds and "ancient" Asian breeds.
Software implementing the model described here, called TreeMix, is available at
http://treemix.googlecode.comComment: 28 pages, 6 figures in main text. Attached supplement is 22 pages, 15
figures. This is an updated version of the preprint available at
http://precedings.nature.com/documents/6956/version/
Discovery of an M4 Spectroscopic Binary in Upper Scorpius: A Calibration Point for Young Low-Mass Evolutionary Models
We report the discovery of a new low-mass spectroscopic (SB2) stellar binary
system in the star-forming region of Upper Scorpius. This object, UScoCTIO5,
was discovered by Ardila (2000), who assigned it a spectral class of M4. A
KeckI HIRES spectrum revealed it to be double-lined, and we then carried out a
program at several observatories to determine its orbit. The orbital period is
34 days, and the eccentricity is nearly 0.3. The importance of such a discovery
is that it can be used to help calibrate evolutionary models at low masses and
young ages. This is one of the outstanding problems in the study of formation
mechanisms and initial mass functions at low masses. The orbit allows us to
place a lower limit of 0.64 +- 0.02 M_sol on the total system mass. The
components appear to be of almost equal mass. We are able to show that this
mass is significantly higher than predicted by evolutionary models for an
object of this luminosity and age, in agreement with other recent results. More
precise determination of the temperature and surface gravity of the components
would be helpful in further solidifying this conclusion.Comment: 17 pages, 4 figures, accepted for publication in Ap
- …