Search CORE

4,696 research outputs found

Fast computation of distance estimators

Author: A Rambaut
D Swofford
F Barker
H Kishino
I Elias
Isaac Elias
J Felsenstein
J Felsenstein
Jens Lagergren
K Tamura
K Tuplin
L Arvestad
M Kimura
N Saitou
T Jukes
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

BACKGROUND: Some distance methods are among the most commonly used methods for reconstructing phylogenetic trees from sequence data. The input to a distance method is a distance matrix, containing estimated pairwise distances between all pairs of taxa. Distance methods themselves are often fast, e.g., the famous and popular Neighbor Joining (NJ) algorithm reconstructs a phylogeny of n taxa in time O(n(3)). Unfortunately, the fastest practical algorithms known for Computing the distance matrix, from n sequences of length l, takes time proportional to l·n(2). Since the sequence length typically is much larger than the number of taxa, the distance estimation is the bottleneck in phylogeny reconstruction. This bottleneck is especially apparent in reconstruction of large phylogenies or in applications where many trees have to be reconstructed, e.g., bootstrapping and genome wide applications. RESULTS: We give an advanced algorithm for Computing the number of mutational events between DNA sequences which is significantly faster than both Phylip and Paup. Moreover, we give a new method for estimating pairwise distances between sequences which contain ambiguity Symbols. This new method is shown to be more accurate as well as faster than earlier methods. CONCLUSION: Our novel algorithm for Computing distance estimators provides a valuable tool in phylogeny reconstruction. Since the running time of our distance estimation algorithm is comparable to that of most distance methods, the previous bottleneck is removed. All distance methods, such as NJ, require a distance matrix as input and, hence, our novel algorithm significantly improves the overall running time of all distance methods. In particular, we show for real world biological applications how the running time of phylogeny reconstruction using NJ is improved from a matter of hours to a matter of seconds

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Inference of population splits and mixtures from genome-wide allele frequency data

Author: A Keinan
A RoyChoudhury
AL Price
AR Boyko
BM Henn
BM vonHoldt
BS Weir
C Becquet
D Reich
D Reich
D Reich
DH Huson
DJ Lawson
EY Durand
G Bhatia
G Coop
G Hellenthal
G Liti
G McVean
G Nicholson
GM Lathrop
HG Parker
Hua Tang
I Gronau
J Felsenstein
J Felsenstein
J Felsenstein
J Hey
J Novembre
J Novembre
J Novembre
J Sirén
J Sukumaran
JK Pritchard
JK Pritchard
Jonathan K. Pritchard
Joseph K. Pickrell
JZ Li
K Lindblad-Toh
LL Cavalli-Sforza
LL Cavalli-Sforza
LL Cavalli-Sforza
LL Cavalli-Sforza
LS Kubatko
M Bonhomme
M DeGiorgio
M Jakobsson
M Nei
M Nei
M Rasmussen
MA Beaumont
N Patterson
N Patterson
N Saitou
NA Rosenberg
O François
P Beerli
P Menozzi
P Moorjani
R Nielsen
RE Green
RJ Dyer
RL Cann
RN Gutenkunst
RR Hudson
S Xu
SF Schaffner
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2012
Field of study

Many aspects of the historical relationships between populations in a species are reflected in genetic data. Inferring these relationships from genetic data, however, remains a challenging task. In this paper, we present a statistical model for inferring the patterns of population splits and mixtures in multiple populations. In this model, the sampled populations in a species are related to their common ancestor through a graph of ancestral populations. Using genome-wide allele frequency data and a Gaussian approximation to genetic drift, we infer the structure of this graph. We applied this method to a set of 55 human populations and a set of 82 dog breeds and wild canids. In both species, we show that a simple bifurcating tree does not fully describe the data; in contrast, we infer many migration events. While some of the migration events that we find have been detected previously, many have not. For example, in the human data we infer that Cambodians trace approximately 16% of their ancestry to a population ancestral to other extant East Asian populations. In the dog data, we infer that both the boxer and basenji trace a considerable fraction of their ancestry (9% and 25%, respectively) to wolves subsequent to domestication, and that East Asian toy breeds (the Shih Tzu and the Pekingese) result from admixture between modern toy breeds and "ancient" Asian breeds. Software implementing the model described here, called TreeMix, is available at http://treemix.googlecode.comComment: 28 pages, 6 figures in main text. Attached supplement is 22 pages, 15 figures. This is an updated version of the preprint available at http://precedings.nature.com/documents/6956/version/

arXiv.org e-Print Archive

CiteSeerX

Crossref

Directory of Open Access Journals

FigShare

Bacterial community dynamics and activity in relation to dissolved organic matter availability during sea-ice formation in a mesocosm experiment

Author: Anderson M. J.
Clarke K. R.
Felsenstein J.
Grossmann S.
Kirchman D.
R Development Core Team
Riedel A.
Staden R.
Publication venue
Publication date: 01/01/2014
Field of study

Peer reviewe

Crossref

PubMed Central

Electronic Publication Information Center

Helsingin yliopiston digitaalinen arkisto

Recognizing Treelike k-Dissimilarities

Author: A Schrijver
AD Gordon
Andreas Spillner
AWM Dress
AWM Dress
C Bocci
C Hayashi
D Levy
DP Faith
E Rubei
G Soete de
H-J Bandelt
H-J Bandelt
J Culberson
J Felsenstein
K Zaretsky
Katharina T. Huber
L Pachter
M Steel
M-M Deza
MJ Warrens
N Grishin
S Joly
Sven Herrmann
Vincent Moulton
WJ Heiser
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

A k-dissimilarity D on a finite set X, |X| >= k, is a map from the set of size k subsets of X to the real numbers. Such maps naturally arise from edge-weighted trees T with leaf-set X: Given a subset Y of X of size k, D(Y) is defined to be the total length of the smallest subtree of T with leaf-set Y . In case k = 2, it is well-known that 2-dissimilarities arising in this way can be characterized by the so-called "4-point condition". However, in case k > 2 Pachter and Speyer recently posed the following question: Given an arbitrary k-dissimilarity, how do we test whether this map comes from a tree? In this paper, we provide an answer to this question, showing that for k >= 3 a k-dissimilarity on a set X arises from a tree if and only if its restriction to every 2k-element subset of X arises from some tree, and that 2k is the least possible subset size to ensure that this is the case. As a corollary, we show that there exists a polynomial-time algorithm to determine when a k-dissimilarity arises from a tree. We also give a 6-point condition for determining when a 3-dissimilarity arises from a tree, that is similar to the aforementioned 4-point condition.Comment: 18 pages, 4 figure

arXiv.org e-Print Archive

Crossref

University of East Anglia digital repository

Understanding the errors of SHAPE-directed RNA structure modeling

Author: Aviran S.
Brunel C.
Butler E. B.
Byrne R. T.
Cate J. H.
Christopher C. VanLang
Collins K.
Correll C. C.
Cruz J. A.
Darty K.
Das R.
Das R.
Deigan K. E.
Efron B.
Efron B.
Felsenstein J.
Gesteland R. F.
Hickey D. R.
Hofacker I. L.
Kladwang W.
Kladwang W.
Kulshina N.
Kwon M.
Lemay J. F.
Leontis N. B.
Levitt M.
Mandal M.
Mandal M.
Mathews D. H.
Mathews D. H.
Mills D. R.
Mitra S.
Mortimer S. A.
Noller H. F.
Pablo Cordero
Panning B.
Pedersen J. S.
Regulski E. E.
Rhiju Das
Russell R.
Russell R.
Serganov A.
Smith K. D.
Staley J. P.
Sudarsan N.
Sussman J. L.
Takamoto K.
Vasa S. M.
Watts J. M.
Wilkinson K. A.
Wilkinson K. A.
Winkler W. C.
Wipapat Kladwang
Yoon S.
Publication venue: 'American Chemical Society (ACS)'
Publication date: 07/09/2011
Field of study

Single-nucleotide-resolution chemical mapping for structured RNA is being rapidly advanced by new chemistries, faster readouts, and coupling to computational algorithms. Recent tests have shown that selective 2'-hydroxyl acylation by primer extension (SHAPE) can give near-zero error rates (0-2%) in modeling the helices of RNA secondary structure. Here, we benchmark the method using six molecules for which crystallographic data are available: tRNA(phe) and 5S rRNA from Escherichia coli, the P4-P6 domain of the Tetrahymena group I ribozyme, and ligand-bound domains from riboswitches for adenine, cyclic di-GMP, and glycine. SHAPE-directed modeling of these highly structured RNAs gave an overall false negative rate (FNR) of 17% and a false discovery rate (FDR) of 21%, with at least one helix prediction error in five of the six cases. Extensive variations of data processing, normalization, and modeling parameters did not significantly mitigate modeling errors. Only one varation, filtering out data collected with deoxyinosine triphosphate during primer extension, gave a modest improvement (FNR = 12%, and FDR = 14%). The residual structure modeling errors are explained by the insufficient information content of these RNAs' SHAPE data, as evaluated by a nonparametric bootstrapping analysis. Beyond these benchmark cases, bootstrapping suggests a low level of confidence (<50%) in the majority of helices in a previously proposed SHAPE-directed model for the HIV-1 RNA genome. Thus, SHAPE-directed RNA modeling is not always unambiguous, and helix-by-helix confidence estimates, as described herein, may be critical for interpreting results from this powerful methodology.Comment: Biochemistry, Article ASAP (Aug. 15, 2011

arXiv.org e-Print Archive

Crossref

Epitope Discovery with Phylogenetic Hidden Markov Models

Author: Bhattacharya
Brusic
C. Seoighe
Cao
Carlson
Currier
Felsenstein
Flower
Halpern
K. Scheffler
Kawashima
Larsen
M. Lacerda
Navis
Nielsen
Rognan
Schuler
Stewart-Jones
Vyas
Yu
Zhao
Publication venue: Oxford University Press
Publication date: 20/01/2010
Field of study

Existing methods for the prediction of immunologically active T-cell epitopes are based on the amino acid sequence or structure of pathogen proteins. Additional information regarding the locations of epitopes may be acquired by considering the evolution of viruses in hosts with different immune backgrounds. In particular, immune-dependent evolutionary patterns at sites within or near T-cell epitopes can be used to enhance epitope identification. We have developed a mutation–selection model of T-cell epitope evolution that allows the human leukocyte antigen (HLA) genotype of the host to influence the evolutionary process. This is one of the first examples of the incorporation of environmental parameters into a phylogenetic model and has many other potential applications where the selection pressures exerted on an organism can be related directly to environmental factors. We combine this novel evolutionary model with a hidden Markov model to identify contiguous amino acid positions that appear to evolve under immune pressure in the presence of specific host immune alleles and that therefore represent potential epitopes. This phylogenetic hidden Markov model provides a rigorous probabilistic framework that can be combined with sequence or structural information to improve epitope prediction. As a demonstration, we apply the model to a data set of HIV-1 protein-coding sequences and host HLA genotypes

Crossref

PubMed Central

Access to Research at National University of Ireland, Galway

Inferring stabilizing mutations from protein phylogenies : application to influenza hemagglutinin

Author: A Akasako
A Akasako
A Cao
A Martin
A Mitraki
A Rambaut
AA Pakula
AR Dinner
AR Fersht
AR Fersht
AS Yang
AS Yang
AV Gribenko
B Steipe
B Steipe
BM Broome
C Pal
C Park
CB Anfinsen
CB Do
CM Dobson
CT Saunders
D Gilis
D Perl
D Shortle
DA Cowan
DA Drummond
DA Drummond
DD Loeb
DM Taverna
DM Taverna
E Capriotti
E Hoffmann
E van Nimwegen
EPC Rocha
Eugene I. Shakhnovich
F Chiti
F Ronquist
G Parisi
GG Brownlee
H Akashi
H Li
H Schindelin
H Zhao
H Zhou
HW Hellinga
I Keller
IE Sanchez
IMP del Pino
J Felsenstein
J Felsenstein
J Felsenstein
J Felsenstein
J Kyte
JA Wells
JB Garrett
JD Bloom
JD Bloom
JD Bloom
JD Bloom
Jesse D. Bloom
JL Thorne
JM Koshi
JP Huelsenbeck
JP Huelsenbeck
JR Cochran
JR Lepock
JV Chamary
K Ishikawa
K Ishikawa
K Katayanagi
KA Bava
KA Gray
KB Zeldovich
KJ Szretter
KL Maxwell
L Giver
L Serrano
M Dai
M Haruki
M Jacob
M Lehmann
M Matrosovich
M Ueda
M Wunderlich
Matthew J. Glassman
MD Kumar
MF Sippl
MM Garcia-Mira
MM Gromiha
MP Canadillas
MS Fornasari
MW Pantoliano
N Amin
N Goldman
N Goldman
N Lartillot
N Tong
R Godoy-Ruiz
R Godoy-Ruiz
R Godoy-Ruiz
R Guerois
R Rabadan
R Sakaue
RC Edgar
RJ Ellis
S Govindarajan
S Kimura
S Kimura
S Nakajima
S Sato
SC Choi
SH White
SJ Gamblin
SS Jaswal
U Bastolla
V Parthiban
VG Dugan
VN Uversky
W Besenmatter
WS Sandberg
WSW Wong
XJ Zhang
Y Bao
YY Tseng
Z Chen
Publication venue: International Society for Computational Biology
Publication date: 01/04/2009
Field of study

One selection pressure shaping sequence evolution is the requirement that a protein fold with sufficient stability to perform its biological functions. We present a conceptual framework that explains how this requirement causes the probability that a particular amino acid mutation is fixed during evolution to depend on its effect on protein stability. We mathematically formalize this framework to develop a Bayesian approach for inferring the stability effects of individual mutations from homologous protein sequences of known phylogeny. This approach is able to predict published experimentally measured mutational stability effects (ΔΔG values) with an accuracy that exceeds both a state-of-the-art physicochemical modeling program and the sequence-based consensus approach. As a further test, we use our phylogenetic inference approach to predict stabilizing mutations to influenza hemagglutinin. We introduce these mutations into a temperature-sensitive influenza virus with a defect in its hemagglutinin gene and experimentally demonstrate that some of the mutations allow the virus to grow at higher temperatures. Our work therefore describes a powerful new approach for predicting stabilizing mutations that can be successfully applied even to large, complex proteins such as hemagglutinin. This approach also makes a mathematical link between phylogenetics and experimentally measurable protein properties, potentially paving the way for more accurate analyses of molecular evolution

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Caltech Authors

Neighborhoods of trees in circular orderings

Author: Andreas Spillner
B Allen
C Semple
C Semple
D Bryant
D Bryant
D Bryant
D Robinson
D Robinson
D Sleator
E Tepe
F Luccio
J De Loera
J Felsenstein
K Gordon
L Kubatko
L Pournin
M Li
MJ Sanderson
MS Bansal
P Humphries
P Lemey
R Desper
R Jamison
S Bastkowski
S Cleary
S Whelan
Sarah Baskowski
Taoyang Wu
Vincent Moulton
Y Ding
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

In phylogenetics, a common strategy used to construct an evolutionary tree for a set of species X is to search in the space of all such trees for one that optimizes some given score function (such as the minimum evolution, parsimony or likelihood score). As this can be computationally intensive, it was recently proposed to restrict such searches to the set of all those trees that are compatible with some circular ordering of the set X. To inform the design of efficient algorithms to perform such searches, it is therefore of interest to find bounds for the number of trees compatible with a fixed ordering in the neighborhood of a tree that is determined by certain tree operations commonly used to search for trees: the nearest neighbor interchange (nni), the subtree prune and regraft (spr) and the tree bisection and reconnection (tbr) operations. We show that the size of such a neighborhood of a binary tree associated with the nni operation is independent of the tree’s topology, but that this is not the case for the spr and tbr operations. We also give tight upper and lower bounds for the size of the neighborhood of a binary tree for the spr and tbr operations and characterize those trees for which these bounds are attained

Crossref

University of East Anglia digital repository

The statistical neuroanatomy of frontal networks in the macaque

We were interested in gaining insight into the functional properties of frontal networks based upon their anatomical inputs. We took a neuroinformatics approach, carrying out maximum likelihood hierarchical cluster analysis on 25 frontal cortical areas based upon their anatomical connections, with 68 input areas representing exterosensory, chemosensory, motor, limbic, and other frontal inputs. The analysis revealed a set of statistically robust clusters. We used these clusters to divide the frontal areas into 5 groups, including ventral-lateral, ventral-medial, dorsal-medial, dorsal-lateral, and caudal-orbital groups. Each of these groups was defined by a unique set of inputs. This organization provides insight into the differential roles of each group of areas and suggests a gradient by which orbital and ventral-medial areas may be responsible for decision-making processes based on emotion and primary reinforcers, and lateral frontal areas are more involved in integrating affective and rational information into a common framework

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

UCL Discovery

PubMed Central

Optimizing the order of taxon addition in phylogenetic tree construction using genetic algorithm

Author: A. Rambaut
A. W. F. Edwards
D. Cavicchio
D. E. Goldberg
D. H. Colless
D. Whitley
G. J. Olsen
J. Felsenstein
J. Felsenstein
J. H. Camin
J. Holland
J. Neyman
K. Hayasaka
K. Katoh
M. Hasegawa
M. Mitchell
M. Nei
P. O. Lewis
R. Chakraborty
R. L. Graham
W. H. E. Day
W. H. Li
W. Messier
Z. Yang
Z. Yang
Z. Yang
Z. Yang
Z. Yang
Publication venue: Springer-Verlag
Publication date: 01/01/2003
Field of study

Abstract. Phylogenetics has gained in public favor for the analysis of DNA sequence data as molecular biology has advanced. Among a number of algorithms for phylogenetics, the fastDNAml is considered to have reasonable computational cost and performance. However, it has a defect that its performance is likely to be significantly affected by the order of taxon addition. In this paper, we propose a genetic algorithm for optimizing the order of taxon addition in the fastDNAml. Experimental results show that the fastDNAml with the optimized order of taxon addition constructs more probable evolutionary trees in terms of the maximum likelihood.

CiteSeerX

Crossref