Search CORE

42,728 research outputs found

Evaluation of phylogenetic reconstruction methods using bacterial whole genomes: a simulation based study

Author: Bentley SD
Colijn C
Harris SR
Kendall M
Lees JA
Parkhill J
Publication venue: 'F1000 Research Ltd'
Publication date: 01/01/2018
Field of study

Background: Phylogenetic reconstruction is a necessary first step in many analyses which use whole genome sequence data from bacterial populations. There are many available methods to infer phylogenies, and these have various advantages and disadvantages, but few unbiased comparisons of the range of approaches have been made. Methods: We simulated data from a defined "true tree" using a realistic evolutionary model. We built phylogenies from this data using a range of methods, and compared reconstructed trees to the true tree using two measures, noting the computational time needed for different phylogenetic reconstructions. We also used real data from Streptococcus pneumoniae alignments to compare individual core gene trees to a core genome tree. Results: We found that, as expected, maximum likelihood trees from good quality alignments were the most accurate, but also the most computationally intensive. Using less accurate phylogenetic reconstruction methods, we were able to obtain results of comparable accuracy; we found that approximate results can rapidly be obtained using genetic distance based methods. In real data we found that highly conserved core genes, such as those involved in translation, gave an inaccurate tree topology, whereas genes involved in recombination events gave inaccurate branch lengths. We also show a tree-of-trees, relating the results of different phylogenetic reconstructions to each other. Conclusions: We recommend three approaches, depending on requirements for accuracy and computational time. Quicker approaches that do not perform full maximum likelihood optimisation may be useful for many analyses requiring a phylogeny, as generating a high quality input alignment is likely to be the major limiting factor of accurate tree topology. We have publicly released our simulated data and code to enable further comparisons

Oxford University Research Archive

Spiral - Imperial College Digital Repository

Inferring phylogenetic networks with maximum pseudolikelihood under incomplete lineage sorting

Author: Ané Cécile
Solís-Lemus Claudia
Publication venue
Publication date: 12/02/2016
Field of study

Phylogenetic networks are necessary to represent the tree of life expanded by edges to represent events such as horizontal gene transfers, hybridizations or gene flow. Not all species follow the paradigm of vertical inheritance of their genetic material. While a great deal of research has flourished into the inference of phylogenetic trees, statistical methods to infer phylogenetic networks are still limited and under development. The main disadvantage of existing methods is a lack of scalability. Here, we present a statistical method to infer phylogenetic networks from multi-locus genetic data in a pseudolikelihood framework. Our model accounts for incomplete lineage sorting through the coalescent model, and for horizontal inheritance of genes through reticulation nodes in the network. Computation of the pseudolikelihood is fast and simple, and it avoids the burdensome calculation of the full likelihood which can be intractable with many species. Moreover, estimation at the quartet-level has the added computational benefit that it is easily parallelizable. Simulation studies comparing our method to a full likelihood approach show that our pseudolikelihood approach is much faster without compromising accuracy. We applied our method to reconstruct the evolutionary relationships among swordtails and platyfishes (

Xiphophorus

: Poeciliidae), which is characterized by widespread hybridizations

arXiv.org e-Print Archive

Directory of Open Access Journals

PubMed Central

Using neutral cline decay to estimate contemporary dispersal: a generic tool and its application to a major crop pathogen

Author: Amil
Barton
Barton
Barton
Barton
Barton
Broquet
Brown
Burt
Carlier
Chen
Daguin
Dieckmann
Endler
Fisher
Gay
Giraud
Goudet
Guillot
Guillot
Halkett
Jones
Lapeyre de Bellaire
Lebreton
Lenormand
Lenormand
Lenormand
Lenormand
Lockwood
McCartney
McDonald
Mourichon
Neu
Pennisi
Rieux
Rieux
Robert
Ronce
Rousset
Rousset
Rousset
Saccheri
Sackett
Slatkin
Storey
Turelli
Zapater
Publication venue
Publication date: 01/01/2013
Field of study

Dispersal is a key parameter of adaptation, invasion and persistence. Yet standard population genetics inference methods hardly distinguish it from drift and many species cannot be studied by direct mark-recapture methods. Here, we introduce a method using rates of change in cline shapes for neutral markers to estimate contemporary dispersal. We apply it to the devastating banana pest Mycosphaerella fijiensis, a wind-dispersed fungus for which a secondary contact zone had previously been detected using landscape genetics tools. By tracking the spatio-temporal frequency change of 15 microsatellite markers, we find that σ, the standard deviation of parent–offspring dispersal distances, is 1.2 km/generation1/2. The analysis is further shown robust to a large range of dispersal kernels. We conclude that combining landscape genetics approaches to detect breaks in allelic frequencies with analyses of changes in neutral genetic clines offers a powerful way to obtain ecologically relevant estimates of dispersal in many species

Integrating genealogical and dynamical modelling to infer escape and reversion rates in HIV epitopes

Author: Frater John
McLean Angela
McVean Gil
Palmer Duncan
Philips Rodney
Publication venue
Publication date: 01/01/2013
Field of study

The rates of escape and reversion in response to selection pressure arising from the host immune system, notably the cytotoxic T-lymphocyte (CTL) response, are key factors determining the evolution of HIV. Existing methods for estimating these parameters from cross-sectional population data using ordinary differential equations (ODE) ignore information about the genealogy of sampled HIV sequences, which has the potential to cause systematic bias and over-estimate certainty. Here, we describe an integrated approach, validated through extensive simulations, which combines genealogical inference and epidemiological modelling, to estimate rates of CTL escape and reversion in HIV epitopes. We show that there is substantial uncertainty about rates of viral escape and reversion from cross-sectional data, which arises from the inherent stochasticity in the evolutionary process. By application to empirical data, we find that point estimates of rates from a previously published ODE model and the integrated approach presented here are often similar, but can also differ several-fold depending on the structure of the genealogy. The model-based approach we apply provides a framework for the statistical analysis of escape and reversion in population data and highlights the need for longitudinal and denser cross-sectional sampling to enable accurate estimate of these key parameters

arXiv.org e-Print Archive

PubMed Central

Oxford University Research Archive

Increasing the density of available pareto optimal solutions

Author: Fleming P.J.
Giagkiozis I.
Publication venue: Automatic Control and Systems Engineering, University of Sheffield
Publication date: 01/11/2012
Field of study

The set of available multi-objective optimization algorithms continues to grow. This fact can be partially attributed to their widespread use and applicability. However this increase also suggests several issues remain to be addressed satisfactorily. One such issue is the diversity and the number of solutions available to the decision maker (DM). Even for algorithms very well suited for a particular problem, it is difficult - mainly due to the computational cost - to use a population large enough to ensure the likelihood of obtaining a solution close to the DMs preferences. In this paper we present a novel methodology that produces additional Pareto optimal solutions from a Pareto optimal set obtained at the end run of any multi-objective optimization algorithm. This method, which we refer to as Pareto estimation, is tested against a set of 2 and 3-objective test problems and a 3-objective portfolio optimization problem to illustrate its’ utility for a real-world problem

White Rose Research Online

Inference of population splits and mixtures from genome-wide allele frequency data

Author: A Keinan
A RoyChoudhury
AL Price
AR Boyko
BM Henn
BM vonHoldt
BS Weir
C Becquet
D Reich
D Reich
D Reich
DH Huson
DJ Lawson
EY Durand
G Bhatia
G Coop
G Hellenthal
G Liti
G McVean
G Nicholson
GM Lathrop
HG Parker
Hua Tang
I Gronau
J Felsenstein
J Felsenstein
J Felsenstein
J Hey
J Novembre
J Novembre
J Novembre
J Sirén
J Sukumaran
JK Pritchard
JK Pritchard
Jonathan K. Pritchard
Joseph K. Pickrell
JZ Li
K Lindblad-Toh
LL Cavalli-Sforza
LL Cavalli-Sforza
LL Cavalli-Sforza
LL Cavalli-Sforza
LS Kubatko
M Bonhomme
M DeGiorgio
M Jakobsson
M Nei
M Nei
M Rasmussen
MA Beaumont
N Patterson
N Patterson
N Saitou
NA Rosenberg
O François
P Beerli
P Menozzi
P Moorjani
R Nielsen
RE Green
RJ Dyer
RL Cann
RN Gutenkunst
RR Hudson
S Xu
SF Schaffner
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2012
Field of study

Many aspects of the historical relationships between populations in a species are reflected in genetic data. Inferring these relationships from genetic data, however, remains a challenging task. In this paper, we present a statistical model for inferring the patterns of population splits and mixtures in multiple populations. In this model, the sampled populations in a species are related to their common ancestor through a graph of ancestral populations. Using genome-wide allele frequency data and a Gaussian approximation to genetic drift, we infer the structure of this graph. We applied this method to a set of 55 human populations and a set of 82 dog breeds and wild canids. In both species, we show that a simple bifurcating tree does not fully describe the data; in contrast, we infer many migration events. While some of the migration events that we find have been detected previously, many have not. For example, in the human data we infer that Cambodians trace approximately 16% of their ancestry to a population ancestral to other extant East Asian populations. In the dog data, we infer that both the boxer and basenji trace a considerable fraction of their ancestry (9% and 25%, respectively) to wolves subsequent to domestication, and that East Asian toy breeds (the Shih Tzu and the Pekingese) result from admixture between modern toy breeds and "ancient" Asian breeds. Software implementing the model described here, called TreeMix, is available at http://treemix.googlecode.comComment: 28 pages, 6 figures in main text. Attached supplement is 22 pages, 15 figures. This is an updated version of the preprint available at http://precedings.nature.com/documents/6956/version/

arXiv.org e-Print Archive

CiteSeerX

Crossref

Directory of Open Access Journals

FigShare

Discovery of an M4 Spectroscopic Binary in Upper Scorpius: A Calibration Point for Young Low-Mass Evolutionary Models

Author: Ansgar Reiners
Baraffe I.
Delfosse X.
Gibor Basri
Subhanjoy Mohanty
Publication venue: 'University of Chicago Press'
Publication date: 21/06/2005
Field of study

We report the discovery of a new low-mass spectroscopic (SB2) stellar binary system in the star-forming region of Upper Scorpius. This object, UScoCTIO5, was discovered by Ardila (2000), who assigned it a spectral class of M4. A KeckI HIRES spectrum revealed it to be double-lined, and we then carried out a program at several observatories to determine its orbit. The orbital period is 34 days, and the eccentricity is nearly 0.3. The importance of such a discovery is that it can be used to help calibrate evolutionary models at low masses and young ages. This is one of the outstanding problems in the study of formation mechanisms and initial mass functions at low masses. The orbit allows us to place a lower limit of 0.64 +- 0.02 M_sol on the total system mass. The components appear to be of almost equal mass. We are able to show that this mass is significantly higher than predicted by evolutionary models for an object of this luminosity and age, in agreement with other recent results. More precise determination of the temperature and surface gravity of the components would be helpful in further solidifying this conclusion.Comment: 17 pages, 4 figures, accepted for publication in Ap

arXiv.org e-Print Archive

Crossref

CERN Document Server