2 research outputs found

    An approximate search engine for structure

    Get PDF
    As the size of structural databases grows, the need for efficiently searching these databases arises. Thanks to previous and ongoing research, searching by attribute-value and by text has become commonplace in these databases. However, searching by topological or physical structure, especially for large databases and especially for approximate matches, is still an art. In this dissertation, efficient search techniques are presented for retrieving trees from a database that are similar to a given query tree. Rooted ordered labeled trees, rooted unordered labeled trees and free trees are considered. Ordered labeled trees are trees in which each node has a label and the left-to-right order among siblings matters. Unordered labeled trees are trees in which the parent-child relationship is significant, but the order among siblings is unimportant. Free trees (unrooted unordered trees) are acyclic graphs. These trees find many applications in bioinformatics, Web log analysis, phyloinformatics, XML processing, etc. Two types of similarity measures are investigated: (i) counting the mismatching paths in the query tree and a data tree, and (ii) measuring the topological relationship between the trees. The proposed approaches include storing the paths of trees in a suffix array, employing hashing techniques to speed up retrieval, and counting the number of up-down operations to move a token from one node to another node in a tree. Various filters for accelerating a search, different strategies for parallelizing these search algorithms and applications of these algorithms to XML and phylogenetic data management are discussed. The proposed techniques have been implemented into a phylogenetic search engine which is fully operational and is available on the World Wide Web. Experimental results on comparing the similarity measures with existing tree metrics and on evaluating the efficiency of the search techniques demonstrate the effectiveness of the search engine. Future work includes extending the techniques to other structural data, as well as developing new filters and algorithms for speeding up searching and mining in complex structures

    Optimal Evolutionary Tree Comparison by Sparse Dynamic Programming (Extended Abstract)

    No full text
    ) Martin Farach 3 Mikkel Thorup y Department of Computer Science Department of Computer Science Rutgers University University of Copenhagen Piscataway, NJ 08855 2100 København Ø USA Denmark Abstract In computational biology one is often interested in finding the concensus between different evolutionary trees for the same set of species. A popular formalizations is the Maximum Agreement Subtree Problem (MAST) defined as follows: given a set A and two rooted trees T 0 and T 1 leaf-labeled by the elements of A, find a maximum cardinality subset B of A such that the restrictions of T 0 and T 1 to B are topologically isomorphic. Polynomial time solutions exist, but they rely on a dynamic program with 2(n 2 ) nodes---and 2(n 2 ) running time. We sparsify this dynamic program and show that MAST is equivalent to Unary Weighted Bipartite Matching (UWBM) modulo an O(nc p log n ) additive overhead. Applying the best bound for UWBM, we get an O(n 1:5 log n) algorithm for MAST. From ..
    corecore