Search CORE

29 research outputs found

Fast Hash-Based Algorithms for Analyzing Large Collections of Evolutionary Trees

Author: Sul Seung Jin
Publication venue
Publication date
Field of study

Phylogenetic analysis can produce easily tens of thousands of equally plausible evolutionary trees. Consensus trees and topological distance matrices are often used to summarize the evolutionary relationships among the trees of interest. However, current approaches are not designed to analyze very large tree collections. In this dissertation, we present two fast algorithms— HashCS and HashRF —for analyzing large collections of evolutionary trees based on a novel hash table data structure, which provides a convenient and fast approach to store and access the bipartition information collected from the tree collections. Our HashCS algorithm is a fast () technique for constructing consensus trees, where is the number of taxa and is the number of trees. By reprocessing the bipartition information in our hash table, HashCS constructs strict and majority consensus trees. In addition to a consensus algorithm, we design a fast topological distance algorithm called HashRF to compute the × Robinson-Foulds distance matrix, which requires (^ 2) running time. A RF distance matrix provides plenty of data-mining opportunities to help researchers understand the evolutionary relationships contained in their collection of trees. We also introduce a series of extensions based on HashRF to provide researchers with more convenient set of tools for analyzing their trees. We provide extensive experimentation regarding the practical performance of our hash-based algorithms across a diverse collection of biological and artificial trees. Our results show that both algorithms easily outperform existing consensus and RF matrix implementations. For example, on our biological trees, HashCS and HashRF are 1.8 and 100 times faster than PAUP*, respectively. We show two real-world applications of our fast hashing algorithms: (i) comparing phylogenetic heuristic implementations, and (ii) clustering and visualizing trees. In our first application, we design novel methods to compare the PaupRat and Rec-I-DCM3, two popular phylogenetic heuristics that use the Maximum Parsimony criterion, and show that RF distances are more effective than parsimony scores at identifying heterogeneity within a collection of trees. In our second application, we empirically show how to determine the distinct clusters of trees within large tree collections. We use two different techniques to identify distinct tree groups. Both techniques show that partitioning the trees into distinct groups and summarizing each group separately is a better representation of the data. Additional benefits of our approach are better consensus trees as well as insightful information regarding the convergence behavior of phylogenetic heuristics. Our fast hash-based algorithms provide scientists with a very powerful tools for analyzing the relationships within their large phylogenetic tree collections in new and exciting ways. Our work has many opportunities for future work including detecting convergence and designing better heuristics. Furthermore, our hash tables have lots of potential future extensions. For example, we can also use our novel hashing structure to design algorithms for computing other distance metrics such as Nearest Neighbor Interchange (NNI), Subtree Pruning and Regrafting (SPR), and Tree Bisection and Reconnection (TBR) distances

Texas A&M Repository

Using tree diversity to compare phylogenetic heuristics

Author: Matthews Suzanne
Sul Seung-Jin
Williams Tiffani L
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Texas A&M Repository

A randomized algorithm for comparing sets of phylogenetic trees

Author: L. Williams
Seung-jin Sul
Tiffani
Publication venue
Publication date: 01/01/2007
Field of study

Phylogenetic analysis often produce a large number of candidate evolutionary trees, each a hypothesis of the ”true ” tree. Post-processing techniques such as strict consensus trees are widely used to summarize the evolutionary relationships into a single tree. However, valuable information is lost during the summarization process. A more elementary step is to produce estimates of the topological differences that exist among all pairs of trees. We design a new randomized algorithm, called Hash-RF, that computes the all-to-all Robinson-Foulds (RF) distance—the most common distance metric for comparing two phylogenetic trees. Our approach uses a hash table to organize the bipartitions of a tree, and a universal hashing function makes our algorithm randomized. We compare the performance of our Hash-RF algorithm to PAUP*’s implementation of computing the all-to-all RF distance matrix. Our experiments focus on the algorithmic performance of comparing sets of biological trees, where the size of each tree ranged from 500 to 2,000 taxa and the collection of trees varied from 200 to 1,000 trees. Our experimental results clearly show that our Hash-RF algorithm is up to 500 times faster than PAUP*’s approach. Thus, Hash-RF provides an efficient alternative to a single tree summary of a collection of trees and potentially gives researchers the ability to explore their data in new and interesting ways. 1

CiteSeerX

A New Support Measure to Quantify the Impact of Local Optima in Phylogenetic Analyses

Author: Grant Brammer
Seung-jin Sul
Tiffani L. Williams
Publication venue
Publication date: 01/01/2011
Field of study

Full open access to this and thousands of other papers a

CiteSeerX

Directory of Open Access Journals

Texas A&M Repository

PubMed Central

Rapidly Growing, High-Risk Gastrointestinal Stromal Tumor of the Stomach: A Case Report

Author: Dong Soo Lee
Hae Joung Sul
Han Mo Yoo
Seung-Woo Lee
Sung Jin Lim
Publication venue: Yong Chan Lee
Publication date: 01/12/2023
Field of study

An increase in the volume of endoscopic procedures performed in recent times has led to increasing detection rates of asymptomatic gastrointestinal subepithelial tumors. However, accurate diagnosis and risk assessment of these tumors preoperatively is challenging. A 70-year-old man patient visited the emergency department for evaluation of melena. Emergency endoscopy revealed an ulcerated subepithelial tumor (8 cm in size) in the gastric cardia and fundus. Computed tomography and upper endoscopy performed at another hospital 6 months earlier were reviewed; the mass showed a significant increase in size (from 2 cm to 8 cm). The patient underwent surgical resection of the mass and was diagnosed with a high-risk gastrointestinal stromal tumor (GIST). In this article, we describe a rare case of a rapidly growing GIST at a rate significantly greater than commonly reported rates

Directory of Open Access Journals

Formation of an Anti-Contamination Layer with Polystyrene Nanobeads over Cover Glass for Solar Cells

Author: Min Jung Kim
No-Kuk Park
Seung Hun Lee
Seung Hyun Lee
Tae Jin Lee
Xi J.-Q.
Yong Sul Kim
Publication venue: 'Informa UK Limited'
Publication date
Field of study

Crossref