Search CORE

37 research outputs found

Enumerating All Maximal Frequent Subtrees

Author: Deepak Akshay
Fernández-Baca David
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2012
Field of study

Given a collection of leaf-labeled trees on a common leafset and a fraction f in (1/2,1], a frequent subtree (FST) is a subtree isomorphically included in at least fraction f of the input trees. The well-known maximum agreement subtree (MAST) problem identifies FST with f = 1 and having the largest number of leaves. Apart from its intrinsic interest from the algorithmic perspective, MAST has practical applications as a metric for tree similarity, for computing tree congruence, in detection horizontal gene transfer events and as a consensus approach. Enumerating FSTs extend the MAST problem by denition and reveal additional subtrees not displayed by MAST. This can happen in tow ways - such a subtree is included in majority but not all of the input trees or such a subtree though included in all the input trees, does not have the maximum number of leaves. Further, FSTs can be enumerated on collection o ftrees having partially overlapping leafsets. MAST may not be useful here especially if the common overlap among leafsets is very low. Though very useful, the number of FSTs suffer from combinatorial explosion - just a single enumeration of maximal frequent subtrees (MFSTs). A MFST is a FST that is not a subtree to any othe rFST. the set of MFSTs is a compact non-redundant summary of all FSTs and is much smaller in size. Here we tackle the novel problem of enumerating all MFSTs in collections of phylogenetic trees. We demonstrate its utility in returning larger consensus trees in comparison to MAST. The current implementation is available on the web

Digital Repository @ Iowa State University (ISU)

EvoMiner: Frequent Subtree Mining in Phylogenetic Databases

Author: Deepak Akshay
Fernández-Baca David
McMahon Michelle M.
Sanderson Michael J.
Tirthapura Srikanta
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2013
Field of study

The problem of mining collections of trees to identify common patterns, called frequent subtrees (FSTs), arises often when trying to interpret the results of phylogenetic analysis. FST mining generalizes the well-known maximum agreement subtree problem. Here we present EvoMiner, a new algorithm for mining frequent subtrees in collections of phylogenetic trees. EvoMiner is an Apriori-like level-wise method, which uses a novel phylogeny-specific constant-time candidate generation scheme, an efficient fingerprinting-based technique for downward closure, and a lowest common ancestor based support counting step that requires neither costly subtree operations nor database traversal. Our algorithm achieves speed-ups of up to 100 times or more over Phylominer, the current state-of-the-art algorithm for mining phylogenetic trees. EvoMiner can also work in depth first enumeration mode, to use less memory at the expense of speed. We demonstrate the utility of FST mining as a way to extract meaningful phylogenetic information from collections of trees when compared to maximum agreement subtrees and majority rule trees --- two commonly used approaches in phylogenetic analysis for extracting consensus information from a collection of trees over a common leaf set

Digital Repository @ Iowa State University (ISU)

EvoMiner: Frequent Subtree Mining in Phylogenetic Databases

Author: Deepak Akshay
Fernández-Baca David
McMahon Michelle M.
Sanderson Michael J.
Tirthapura Srikanta
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2011
Field of study

The problem of mining collections of trees to identify common patterns, called frequent subtrees (FSTs), arises often when trying to make sense of the results of phylogenetic analysis. FST mining generalizes the well-known maximum agreement subtree problem. Here we present EvoMiner, a new algorithm for mining frequent subtrees in collections of phylogenetic trees. EvoMiner is an Apriori-like level-wise method, which uses novel phylogeny-specific constant-time candidate generation scheme, an efficient fingerprinting-based technique for downward closure operation, and a lowest common ancestor based support counting step that requires neither costly subtree operations nor database traversal. As a result of these techniques, our algorithm achieves speed-ups of up to 100 times or more over phylominer, another algorithm for mining phylogenetic trees. EvoMiner can also work in vertical mining mode, to use less memory at the expense of speed

Digital Repository @ Iowa State University (ISU)

EvoMiner: frequent subtree mining in phylogenetic databases

Author: A Amir
Akshay Deepak
B Mau
B Rannala
B Schieber
C Finden
CC Aggarwal
D Bryant
D Harel
D Vienne De
David Fernández-Baca
E Kubicka
F Geerts
F Lapointe
G Yule
H Liu
J Felsenstein
J Han
J Han
J Pei
J Pei
J Pei
J Slowinski
J Wang
JP Huelsenbeck
L Lewis
M Farach
M Kao
M Sanderson
M Sanderson
M Smith
M Steel
M Zaki
M Zaki
Michael J. Sanderson
Michelle M. McMahon
N Pattengale
R Agrawal
R Cole
R Gray
R Karp
R Motwani
S Barns
S Flint-Garcia
S Guillemot
S Zhang
Srikanta Tirthapura
T Margush
TE Currie
V Daubin
W Goddard
X Wu
X Zou
Y Bei
Y Chi
Y Jia
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

グラフや超グラフに含まれる非巡回部分構造の列挙に関する研究

Author: 和佐州洋
Publication venue
Publication date: 24/03/2016
Field of study

Hokkaido University Collection of Scholarly and Academic Papers

Consistency Properties of Species Tree Inference by Minimizing Deep Coalescences

Author: Rosenberg Noah A.
Than Cuong V.
Publication venue: 'Mary Ann Liebert Inc'
Publication date: 01/01/2011
Field of study

Methods for inferring species trees from sets of gene trees need to account for the possibility of discordance among the gene trees. Assuming that discordance is caused by incomplete lineage sorting, species tree estimates can be obtained by finding those species trees that minimize the number of -deep- coalescence events required for a given collection of gene trees. Efficient algorithms now exist for applying the minimizing-deep-coalescence (MDC) criterion, and simulation experiments have demonstrated its promising performance. However, it has also been noted from simulation results that the MDC criterion is not always guaranteed to infer the correct species tree estimate. In this article, we investigate the consistency of the MDC criterion. Using the multipscies coalescent model, we show that there are indeed anomaly zones for the MDC criterion for asymmetric four-taxon species tree topologies, and for all species tree topologies with five or more taxa.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/90434/1/cmb-2E2010-2E0102.pd

Deep Blue Documents at the University of Michigan

GiRaF: robust, computational identification of influenza reassortments via graph mining

Author: Bokhari
Carl Kingsford
Chutinimitkul
Dawood
Drummond
Edgar
Gabriela
Ghedin
Gray
Holmes
Huelsenbeck
Huson
Huson
Huson
Jinyan
Kawaoka
Khiabanian
Kingsford
Kishino
Li
Macken
Martin
Martin
Mickevich
Nagarajan
Nelson
Niranjan Nagarajan
Obenauer
Padidam
Paraskevis
Planet
Posada
Rabadan
Rambaut
Rambaut
Salzberg
Takemae
Wan
Wilgenbusch
Publication venue: Oxford University Press
Publication date
Field of study

Reassortments in the influenza virus—a process where strains exchange genetic segments—have been implicated in two out of three pandemics of the 20th century as well as the 2009 H1N1 outbreak. While advances in sequencing have led to an explosion in the number of whole-genome sequences that are available, an understanding of the rate and distribution of reassortments and their role in viral evolution is still lacking. An important factor in this is the paucity of automated tools for confident identification of reassortments from sequence data due to the challenges of analyzing large, uncertain viral phylogenies. We describe here a novel computational method, called GiRaF (Graph-incompatibility-based Reassortment Finder), that robustly identifies reassortments in a fully automated fashion while accounting for uncertainties in the inferred phylogenies. The algorithms behind GiRaF search large collections of Markov chain Monte Carlo (MCMC)-sampled trees for groups of incompatible splits using a fast biclique enumeration algorithm coupled with several statistical tests to identify sets of taxa with differential phylogenetic placement. GiRaF correctly finds known reassortments in human, avian, and swine influenza populations, including the evolutionary events that led to the recent ‘swine flu’ outbreak. GiRaF also identifies several previously unreported reassortments via whole-genome studies to catalog events in H5N1 and swine influenza isolates

Crossref

PubMed Central

On finding bicliques in bipartite graphs: a novel algorithm and its application to the integration of diverse biological data types

Author: Charles A Phillips
Elissa J Chesler
Erich J Baker
Gary L Rogers
Michael A Langston
Yun Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

BACKGROUND: Integrating and analyzing heterogeneous genome-scale data is a huge algorithmic challenge for modern systems biology. Bipartite graphs can be useful for representing relationships across pairs of disparate data types, with the interpretation of these relationships accomplished through an enumeration of maximal bicliques. Most previously-known techniques are generally ill-suited to this foundational task, because they are relatively inefficient and without effective scaling. In this paper, a powerful new algorithm is described that produces all maximal bicliques in a bipartite graph. Unlike most previous approaches, the new method neither places undue restrictions on its input nor inflates the problem size. Efficiency is achieved through an innovative exploitation of bipartite graph structure, and through computational reductions that rapidly eliminate non-maximal candidates from the search space. An iterative selection of vertices for consideration based on non-decreasing common neighborhood sizes boosts efficiency and leads to more balanced recursion trees. RESULTS: The new technique is implemented and compared to previously published approaches from graph theory and data mining. Formal time and space bounds are derived. Experiments are performed on both random graphs and graphs constructed from functional genomics data. It is shown that the new method substantially outperforms the best previous alternatives. CONCLUSIONS: The new method is streamlined, efficient, and particularly well-suited to the study of huge and diverse biological data. A robust implementation has been incorporated into GeneWeaver, an online tool for integrating and analyzing functional genomics experiments, available at http://geneweaver.org. The enormous increase in scalability it provides empowers users to study complex and previously unassailable gene-set associations between genes and their biological functions in a hierarchical fashion and on a genome-wide scale. This practical computational resource is adaptable to almost any applications environment in which bipartite graphs can be used to model relationships between pairs of heterogeneous entities. BMC Bioinformatics 2014; 15(1):110

Crossref

The Jackson Laboratory: The Mouseion at the JAXlibrary

PubMed Central