Search CORE

46 research outputs found

Using tree diversity to compare phylogenetic heuristics

Author: Matthews Suzanne
Sul Seung-Jin
Williams Tiffani L
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Texas A&M Repository

Rec-I-DCM3: A Fast Algorithmic Technique for Reconstructing Large Phylogenetic Trees

Author: Moret Bernard M. E.
Roshan Usman
Warnow Tandy
Williams Tiffani L.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 12/12/2006
Field of study

Phylogenetic trees are commonly reconstructed based on hard optimization problems such as maximum parsimony (MP) and maximum likelihood (ML). Conventional MP heuristics for producing phylogenetic trees produce good solutions within reasonable time on small datasets (up to a few thousand sequences), while ML heuristics are limited to smaller datasets (up to a few hundred sequences). However, since MP (and presumably ML) is NP-hard, such approaches do not scale when applied to large datasets. In this paper, we present a new technique called Recursive-Iterative-DCM3 (Rec-I-DCM3), which belongs to our family of disk-covering methods (DCMs). We tested this new technique on ten large biological datasets ranging from 1,322 to 13,921 sequences and obtained dramatic speedups as well as significant improvements in accuracy (better than 99.99%) in comparison to existing approaches. Thus, high-quality reconstructions can be obtained for datasets at least ten times larger than was previously possible

Infoscience - École polytechnique fédérale de Lausanne

An efficient and extensible approach for compressing phylogenetic trees

Author: AD Molin
DE Soltis
HE Williams
JP Huelsenbeck
LA Lewis
N Amenta
PA Goloboff
RS Boyer
SJ Matthews
SJ Sul
Suzanne J Matthews
Tiffani L Williams
WA Hunt Jr
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Biologists require new algorithms to efficiently compress and store their large collections of phylogenetic trees. TreeZip is a novel method for compressing phylogenetic trees. Recently, we extended our TreeZip algorithm to support branch lengths and show how it can be used to extract sets of trees of interest quickly. The key advantage of TreeZip over standard compression methods like 7zip is its ability to interpret and compress tree collections semantically, making it immune to branch rotations and allowing key operations (such calculating a consensus tree) to be performed quickly and without a loss of space savings. On unweighted phylogenetic trees, TreeZip is able to compress Newick files in excess of 98%. On weighted phylogenetic trees, TreeZip is able to compress a Newick file by at least 73%. TreeZip can be combined with 7zip with little overhead, allowing space savings in excess of 99 % (unweighted) and 92%(weighted). Unlike TreeZip, 7zip is not immune to branch rotations, and performs worse as the level of variability in the Newick string representation increases. Finally, since the TreeZip compressed text (TRZ) file contains all the semantic information in a collection of trees, we can easily filter and decompress a subset of trees of interest (such as the set of unique trees), or build the resulting consensus tree in a matter of seconds. We also show the ease of which set operations can be performed on TRZ files, at speeds quicker than those performed on Newick or 7zip compressed Newick files, and without loss of space savings. TreeZip is an efficient approach for compressing large collections of phylogenetic trees. The semantic and compact nature of the TRZ file allow it to be operated upon directly and quickly, without a need to decompress the original Newick file. We believe that TreeZip will be vital for compressing and archiving trees in the biological community.

CiteSeerX

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Texas A&M Repository

MrsRF: an efficient MapReduce algorithm for analyzing large collections of evolutionary trees

Author: C Ranger
C Stockham
DE Soltis
DF Robinson
DM Hillis
E Gabriel
J Dean
LA Lewis
MC Schatz
SJ Sul
SJ Sul
SJ Sul
SJ Sul
Suzanne J Matthews
Tiffani L Williams
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background MapReduce is a parallel framework that has been used effectively to design large-scale parallel applications for large computing clusters. In this paper, we evaluate the viability of the MapReduce framework for designing phylogenetic applications. The problem of interest is generating the all-to-all Robinson-Foulds distance matrix, which has many applications for visualizing and clustering large collections of evolutionary trees. We introduce MrsRF (<it>MapReduce Speeds up RF</it>), a multi-core algorithm to generate a <it>t </it>× <it>t </it>Robinson-Foulds distance matrix between <it>t </it>trees using the MapReduce paradigm. Results We studied the performance of our MrsRF algorithm on two large biological trees sets consisting of 20,000 trees of 150 taxa each and 33,306 trees of 567 taxa each. Our experiments show that MrsRF is a scalable approach reaching a speedup of over 18 on 32 total cores. Our results also show that achieving top speedup on a multi-core cluster requires different cluster configurations. Finally, we show how to use an RF matrix to summarize collections of phylogenetic trees visually. Conclusion Our results show that MapReduce is a promising paradigm for developing multi-core phylogenetic applications. The results also demonstrate that different multi-core configurations must be tested in order to obtain optimum performance. We conclude that RF matrices play a critical role in developing techniques to summarize large collections of trees.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Texas A&M Repository

A General-Purpose Model for Heterogeneous Computation

Author: Tiffani L. Williams
Publication venue
Publication date: 01/01/2000
Field of study

Heterogeneous computing environments are becoming an increasingly popular platform for executing parallel applications. Such environments consist of a diverse set of machines and offer considerably more computational power at a lower cost than a parallel computer. Efficient heterogeneous parallel applications must account for the differences inherent in such an environment. For example, faster machines should possess more data items than their slower counterparts and communication should be minimized over slow network links. Current parallel applications are not designed with such heterogeneity in mind. Thus, a new approach is necessary for designing efficient heterogeneous parallel programs. We propos

CiteSeerX

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

Recommended from our members

High-Performance Phylogeny Reconstruction

Author: Williams Tiffani L.
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date: 10/11/2004
Field of study

Under the Alfred P. Sloan Fellowship in Computational Biology, I have been afforded the opportunity to study phylogenetics--one of the most important and exciting disciplines in computational biology. A phylogeny depicts an evolutionary relationship among a set of organisms (or taxa). Typically, a phylogeny is represented by a binary tree, where modern organisms are placed at the leaves and ancestral organisms occupy internal nodes, with the edges of the tree denoting evolutionary relationships. The task of phylogenetics is to infer this tree from observations upon present-day organisms. Reconstructing phylogenies is a major component of modern research programs in many areas of biology and medicine, but it is enormously expensive. The most commonly used techniques attempt to solve NP-hard problems such as maximum likelihood and maximum parsimony, typically by bounded searches through an exponentially-sized tree-space. For example, there are over 13 billion possible trees for 13 organisms. Phylogenetic heuristics that quickly analyze large amounts of data accurately will revolutionize the biological field. This final report highlights my activities in phylogenetics during the two-year postdoctoral period at the University of New Mexico under Prof. Bernard Moret. Specifically, this report reports my scientific, community and professional activities as an Alfred P. Sloan Postdoctoral Fellow in Computational Biology

UNT Digital Library