Skip to main content
Article thumbnail
Location of Repository

Fast Structural Search in Phylogenetic Databases

By Jason T. L. Wang, Huiyuan Shan, Dennis Shasha and William H. Piel


As the size of phylogenetic databases grows, the need for efficiently searching these databases arises. Thanks to previous and ongoing research, searching by attribute value and by text has become commonplace in these databases. However, searching by topological or physical structure, especially for large databases and especially for approximate matches, is still an art. We propose structural search techniques that, given a query or pattern tree P and a database of phylogenies D, find trees in D that are sufficiently close to P. The “closeness” is a measure of the topological relationships in P that are found to be the same or similar in a tree D in D. We develop a filtering technique that accelerates searches and present algorithms for rooted and unrooted trees where the trees can be weighted or unweighted. Experimental results on comparing the similarity measure with existing tree metrics and on evaluating the efficiency of the search techniques demonstrate that the proposed approach is promising

Topics: Original Research
Publisher: Libertas Academica
OAI identifier:
Provided by: PubMed Central

Suggested articles


  1. (1984). A computationally effi cient approximation to the nearest neighbor interchange metric.
  2. (1965). A method for deducing branching sequences in phylogeny.
  3. (2000). A more effi cient approximation scheme for tree alignment.
  4. (2002). Algorithmics and applications of tree and graph searching.
  5. (1997). Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology.
  6. (1995). An algorithm to fi nd agreement subtrees.
  7. An O(nlogn) algorithm for the maximum agreement subtree problem for binary trees.
  8. (1989). Comments on component-compatibility in historical bio-geography.
  9. (1998). Computing distances between evolutionary trees.
  10. (1995). Computing the local consensus of trees.
  11. (2001). Computing the quartet distance between evolutionary trees in time O(nlog 2n).
  12. (2000). Computing the quartet distance between evolutionary trees.
  13. (1996). Computing the unrooted maximum agreement subtree in subquadratic time.
  14. (1998). Constructing additive trees when the error is small.
  15. (1965). Constructing trees from the set of distances between pendant vertices.
  16. (1990). Determining the evolutionary tree.
  17. (1993). Distributions of tree comparison metrics—some new results.
  18. (1999). Faster reliable phylogenetic analysis.
  19. (2002). Finding approximate patterns in undirected acyclic graphs.
  20. (1997). General techniques for comparing unrooted evolutionary trees.
  21. (2003). Inferring Phylogenies. Sinauer Associates,
  22. (1993). Kaikoura tree theorems: Computing the maximum agreement subtree.
  23. (1998). Molecular Evolution: A Phylogenetic Approach.
  24. (1971). On the comparison of two classifi cations on the same set of elements.
  25. (1996). On the complexity of comparing evolutionary trees.
  26. (1985). Optimal algorithms for comparing trees with labeled leaves.
  27. (2000). RadCon: Phylogenetic tree comparison and consensus.
  28. (1971). The recovery of trees from measures of dissimilarity.
  29. (1994). TreeBASE: A prototype database of phylogenetic analyses and an interactive tool for browsing the phylogeny of life.
  30. (2003). TreeRank: A similarity measure for nearest neighbor searching in phylogenetic databases.
  31. (1998). Trees within trees: Phylogeny and historical associations.

To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.