172 research outputs found
Managing and analyzing phylogenetic databases
The ever growing availability of phylogenomic data makes it increasingly possible to study and analyze phylogenetic relationships across a wide range of species. Indeed, current phylogenetic analyses are now producing enormous collections of trees that vary greatly in size. Our proposed research addresses the challenges posed by storing, querying, and analyzing such phylogenetic databases.
Our first contribution is the further development of STBase, a phylogenetic tree database consisting of a billion trees whose leaf sets range from four to 20000. STBase applies techniques from different areas of computer science for efficient tree storage and retrieval. It also introduces new ideas that are specific to tree databases.
STBase provides a unique opportunity to explore innovative ways to analyze the results from queries on large sets of phylogenetic trees. We propose new ways of extracting consensus information from a collection of phylogenetic trees. Specifically, this involves extending the maximum agreement subtree problem. We greatly improve upon an existing approach based on frequent subtrees and, propose two new approaches based on agreement subtrees and frequent subtrees respectively.
The final part of our proposed work deals with the problem of simplifying multi-labeled trees and handling rogue taxa. We propose a novel technique to extract conflict-free information from multi-labeled trees as a much smaller single labeled tree. We show that the inherent problem in identifying rogue taxa is NP-hard and give fixed-parameter tractable and integer linear programming solutions
Bounds on graviton mass using weak lensing and SZ effect in galaxy clusters
In General Relativity (GR), the graviton is massless. However, a common
feature in several theoretical alternatives of GR is a non-zero mass for the
graviton. These theories can be described as massive gravity theories. Despite
many theoretical complexities in these theories, on phenomenological grounds,
the implications of massive gravity have been widely used to put bounds on
graviton mass. One of the generic implications of giving a mass to the graviton
is that the gravitational potential will follow a Yukawa-like fall off. We use
this feature of massive gravity theories to probe the mass of graviton by using
the largest gravitationally bound objects, namely galaxy clusters. In this
work, we use the mass estimates of galaxy clusters measured at various
cosmologically defined radial distances measured via weak lensing (WL) and
Sunyaev-Zel'dovich (SZ) effect. We also use the model independent values of
Hubble parameter smoothed by a non-parametric method, Gaussian process.
Within confidence region, we obtain the mass of graviton eV with the corresponding Compton length scale Mpc from weak lensing and eV with Mpc from SZ effect. This analysis improves the upper bound on graviton
mass obtained earlier from galaxy clusters.Comment: 9 Pages, 3 Figures, 2 Tables, Accepted for publication in Physics
Letters
Extracting Conflict-free Information from Multi-labeled Trees
A multi-labeled tree, or MUL-tree, is a phylogenetic tree where two or more
leaves share a label, e.g., a species name. A MUL-tree can imply multiple
conflicting phylogenetic relationships for the same set of taxa, but can also
contain conflict-free information that is of interest and yet is not obvious.
We define the information content of a MUL-tree T as the set of all
conflict-free quartet topologies implied by T, and define the maximal reduced
form of T as the smallest tree that can be obtained from T by pruning leaves
and contracting edges while retaining the same information content. We show
that any two MUL-trees with the same information content exhibit the same
reduced form. This introduces an equivalence relation in MUL-trees with
potential applications to comparing MUL-trees. We present an efficient
algorithm to reduce a MUL-tree to its maximally reduced form and evaluate its
performance on empirical datasets in terms of both quality of the reduced tree
and the degree of data reduction achieved.Comment: Submitted in Workshop on Algorithms in Bioinformatics 2012
(http://algo12.fri.uni-lj.si/?file=wabi
Enumerating All Maximal Frequent Subtrees
Given a collection of leaf-labeled trees on a common leafset and a fraction f in (1/2,1], a frequent subtree (FST) is a subtree isomorphically included in at least fraction f of the input trees. The well-known maximum agreement subtree (MAST) problem identifies FST with f = 1 and having the largest number of leaves. Apart from its intrinsic interest from the algorithmic perspective, MAST has practical applications as a metric for tree similarity, for computing tree congruence, in detection horizontal gene transfer events and as a consensus approach. Enumerating FSTs extend the MAST problem by denition and reveal additional subtrees not displayed by MAST. This can happen in tow ways - such a subtree is included in majority but not all of the input trees or such a subtree though included in all the input trees, does not have the maximum number of leaves. Further, FSTs can be enumerated on collection o ftrees having partially overlapping leafsets. MAST may not be useful here especially if the common overlap among leafsets is very low. Though very useful, the number of FSTs suffer from combinatorial explosion - just a single enumeration of maximal frequent subtrees (MFSTs). A MFST is a FST that is not a subtree to any othe rFST. the set of MFSTs is a compact non-redundant summary of all FSTs and is much smaller in size. Here we tackle the novel problem of enumerating all MFSTs in collections of phylogenetic trees. We demonstrate its utility in returning larger consensus trees in comparison to MAST. The current implementation is available on the web
- …