Search CORE

23 research outputs found

Structural Analysis of Biodiversity

Author: A Chapman
C Moritz
D Bryant
J Ausubel
J de Rosnay
K Atteson
K Kerr
K Kimura
K Petersen
K Tamura
K Tamura
L Sirovich
Lawrence Sirovich
M Caterino
M Hasegawa
M Kallersjo
M McMahon
M Sanderson
Mark Y. Stoeckle
Mukund Thattai
N Eldredge
N Saitou
O Gascuel
P Goloboff
P Hebert
R Mihaescu
S Hackett
S Smith
T Castoe
Yu Zhang
Publication venue: Public Library of Science
Publication date: 01/02/2010
Field of study

Large, recently-available genomic databases cover a wide range of life forms, suggesting opportunity for insights into genetic structure of biodiversity. In this study we refine our recently-described technique using indicator vectors to analyze and visualize nucleotide sequences. The indicator vector approach generates correlation matrices, dubbed Klee diagrams, which represent a novel way of assembling and viewing large genomic datasets. To explore its potential utility, here we apply the improved algorithm to a collection of almost 17000 DNA barcode sequences covering 12 widely-separated animal taxa, demonstrating that indicator vectors for classification gave correct assignment in all 11000 test cases. Indicator vector analysis revealed discontinuities corresponding to species- and higher-level taxonomic divisions, suggesting an efficient approach to classification of organisms from poorly-studied groups. As compared to standard distance metrics, indicator vectors preserve diagnostic character probabilities, enable automated classification of test sequences, and generate high-information density single-page displays. These results support application of indicator vectors for comparative analysis of large nucleotide data sets and raise prospect of gaining insight into broad-scale patterns in the genetic structure of biodiversity

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Alignment-Free Phylogenetic Reconstruction

Author: A. Loytynoja
B.D. Thatte
C. Daskalakis
C. Daskalakis
C. Daskalakis
C. Semple
D. Graur
D. Metzler
D.G. Higgins
E. Mossel
E. Mossel
I. Elias
I. Gronau
I. Miklos
J. Felsenstein
J.L. Thorne
J.L. Thorne
K. Atteson
K. Katoh
K. Liu
K.B. Athreya
K.M. Wong
L. Wang
M. Csurös
M. Csurös
M. Hohl
M.A. Steel
M.A. Steel
M.A. Suchard
M.R. Lacey
P. Buneman
P.L. Erdös
P.L. Erdös
R.C. Edgar
S. Karlin
V. King
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

14th Annual International Conference, RECOMB 2010, Lisbon, Portugal, April 25-28, 2010. ProceedingsWe introduce the first polynomial-time phylogenetic reconstruction algorithm under a model of sequence evolution allowing insertions and deletions (or indels). Given appropriate assumptions, our algorithm requires sequence lengths growing polynomially in the number of leaf taxa. Our techniques are distance-based and largely bypass the problem of multiple alignment

CiteSeerX

DSpace@MIT

Crossref

Large-Scale Neighbor-Joining with NINJA

Author: D. Bryant
D.A. Patterson
I. Elias
J. Evans
J.A. Studier
K. Atteson
K. Brengel
K. Howe
L. Sheneman
L. Zaslavsky
M. Simonsen
M.N. Price
N. Goldman
N. Saitou
O. Gascuel
R. Bayer
R. Desper
R.D. Finn
S. Griffiths Jones
S.A. Smith
T. Mailund
T. Mailund
T.H. Corman
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Abstract Neighbor-joining is a well-established hierarchical clustering algorithm for inferring phylogenies. It begins with observed distances between pairs of sequences, and clustering order depends on a metric related to those distances. The canonical algorithm requires O(n3) time and O(n2) space for n sequences, which precludes application to very large sequence families, e.g. those containing 100,000 sequences. Datasets of this size are available today, and such phylogenies will play an increasingly important role in comparative genomics studies. Recent algorithmic advances have greatly sped up neighbor-joining for inputs of thousands of sequences, but are limited to fewer than 13,000 sequences on a system with 4GB RAM. In this paper, I describe an algorithm that speeds up neighbor-joining by dramatically reducing the number of distance values that are viewed in each iteration of the clustering procedure, while still computing a correct neighbor-joining tree. This algorithm can scale to inputs larger than 100,000 sequences because of external-memory-efficient data structures. A free implementation may by obtained fro

CiteSeerX

Crossref

Rec-DCM-Eigen: Reconstructing a Less Parsimonious but More Accurate Tree in Shorter Time

Author: A Bhutkar
A Coghlan
A Coghlan
A Pothen
A Wei Xu
B Mohar
BME Moret
BME Moret
BME Moret
BME Moret
CA Stewart
Christian Schönbach
D Sankoff
DA Bader
David A. Bader
DH Huson
DH Huson
G Bourque
G Fertin
G Li
J Bergsten
J Tang
JA Hartigan
Jijun Tang
K Atteson
KM Swenson
M Bernt
M Blanchette
MD Hendy
MEJ Newman
N Saitou
ND Pattengale
Seunghwa Kang
Stephen W. Schaeffer
U von Luxburg
UW Roshan
W Arndt
WM Fitch
Y Lin
Y Lin
Publication venue: Public Library of Science
Publication date
Field of study

Maximum parsimony (MP) methods aim to reconstruct the phylogeny of extant species by finding the most parsimonious evolutionary scenario using the species' genome data. MP methods are considered to be accurate, but they are also computationally expensive especially for a large number of species. Several disk-covering methods (DCMs), which decompose the input species to multiple overlapping subgroups (or disks), have been proposed to solve the problem in a divide-and-conquer way

Crossref

Directory of Open Access Journals

PubMed Central

Recommended from our members

Final technical report: analysis of molecular data using statistical and evolutionary approaches

Author: Atteson K.
Kim Junhyong
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date: 15/02/2000
Field of study

This document describes the research and training accomplishments of Dr. Kevin Atteson during the DOE fellowship period of September 1997 to September 1999. Dr. Atteson received training in molecular evolution during this period and made progress on seven research topics including: computation of DNA pattern probability, asymptotic redundancy of Bayes rules, performance of neighbor-joining evolutionary tree estimation, convex evolutionary tree estimation, identifiability of trees under mixed rates, gene expression analysis, and population genetics of unequal crossover

UNT Digital Library

NARPL: A solution to the student compiler project problem

Author: Aho A. V.
K. Atteson
M. Lorenz
W. F. Dowling
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Exact-IEBP: A New Technique For Estimating Evolutionary Distances Between Whole Genomes

Author: D. Huson
J.H. Nadeau
K. Atteson
L.A. Raubeson
M. Blanchette
N. Saitou
R.G. Olmstead
S. Kumar
Publication venue: Springer Verlag
Publication date: 01/01/2001
Field of study

Evolution operates on whole genomes by operations that change the order and strandedness of genes within the genomes. This type of data presents new opportunities for discoveries about deep evolutionary rearrangement events, provided that suciently accurate methods can be developed to reconstruct evolutionary trees in these models [3, 11, 13, 18]. A necessary component of any such method is the ability to accurately estimate the true evolutionary distance between two genomes, which is the number of rearrangement events that took place in the evolutionary history between them. We improve the technique (IEBP) in [21] with a new method, Exact-IEBP, for estimating the true evolutionary distance between two signed genomes. Our simulation study shows Exact-IEBP is a better estimation of true evolutionary distances. Furthermore, Exact-IEBP produces more accurate trees than IEBP when used with the popular distance-based method, neighbor joining [16]

CiteSeerX

Crossref

The Performance of Phylogenetic Methods on Trees of Bounded Diameter

Author: A. Rambaut
C. McGeoch
D. F. Robinson
D. Huson
D. L. Swofford
J. Huelsenbeck
K. Atteson
M. Kimura
N. Sautou
P. L. Erdos
P. L. Erdos
V. Berry
Publication venue
Publication date: 01/01/2001
Field of study

We study the convergence rates of neighbor-joining and several new phylogenetic reconstruction methods on families of trees of bounded diameter. Our study presents theoretically obtained convergence rates, as well as an empirical study based upon simulation of evolution on random birth-death trees. We find that the new phylogenetic methods offer an advantage over the neighborjoining method, except at low rates of evolution where they have comparable performance. The improvement in performance of the new methods over neighborjoining increases with the number of taxa and the rate of evolution

CiteSeerX

Crossref

Distance Corrections on Recombinant Sequences

Author: A. Rambaut
A. Rzhetsky
C. Wiuf
D. Bryant
D. Posada
F. Rodriguez
H.-J. Bandelt
J. Chang
J. Maynard-Smith
K. Atteson
K. Strimmer
K. Tamura
M. Schierup
M.K. Kuhner
M.K. Kuhner
O. Gascuel
R.R. Hudson
T.H. Jukes
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2003
Field of study

Crossref

Sequence-Length Requirements for Phylogenetic Methods

Author: A. Rambaut
B. Rannala
D. F. Robinson
D. Huson
J. Huelsenbeck
J. Huelsenbeck
K. Atteson
K. Kuhner
L. R. Foulds
M. Kimura
M.J. Sanderson
N. Saitou
P. L. Erdős
P. L. Erdős
W. J. Bruno
Z. Yang
Publication venue
Publication date
Field of study

We study the sequence lengths required by neighbor-joining, greedy parsimony, and a phylogenetic reconstruction method (DCM NJ +MP) based on disk-covering and the maximum parsimony criterion. We use extensive simulations based on random birth-death trees, with controlled deviations from ultrametricity, to collect data on the scaling of sequence-length requirements for each of the three methods as a function of the number of taxa, the rate of evolution on the tree, and the deviation from ultrametricity. Our experiments show that DCM NJ +MP has consistently lower sequence-length requirements than the other two methods when trees of high topological accuracy are desired, although all methods require much longer sequences as the deviation from ultrametricity or the height of the tree grows. Our study has significant implications for large-scale phylogenetic reconstruction (where sequence-length requirements are a crucial factor), but also for future performance analyses in phylogenetics (since deviations from ultrametricity are proving pivotal)

CiteSeerX

Crossref