Search CORE

105 research outputs found

A clique-based method for the edit distance between unordered trees and its application to analysis of glycan structures

Author: Akutsu Tatsuya
Fukagawa Daiji
Takasu Atsuhiro
Tamura Takeyuki
Tomita Etsuji
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

[Background]Measuring similarities between tree structured data is important for analysis of RNA secondary structures, phylogenetic trees, glycan structures, and vascular trees. The edit distance is one of the most widely used measures for comparison of tree structured data. However, it is known that computation of the edit distance for rooted unordered trees is NP-hard. Furthermore, there is almost no available software tool that can compute the exact edit distance for unordered trees. [Results]In this paper, we present a practical method for computing the edit distance between rooted unordered trees. In this method, the edit distance problem for unordered trees is transformed into the maximum clique problem and then efficient solvers for the maximum clique problem are applied. We applied the proposed method to similar structure search for glycan structures. The result suggests that our proposed method can efficiently compute the edit distance for moderate size unordered trees. It also suggests that the proposed method has the accuracy comparative to those by the edit distance for ordered trees and by an existing method for glycan search. [Conclusions]The proposed method is simple but useful for computation of the edit distance between unordered trees. The object code is available upon request

Crossref

Springer - Publisher Connector

PubMed Central

Kyoto University Research Information Repository

An edit script for taxonomic classifications

Author: Page Roderic DM
Valiente Gabriel
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

BACKGROUND: The NCBI taxonomy provides one of the most powerful ways to navigate sequence data bases but currently users are forced to formulate queries according to a single taxonomic classification. Given that there is not universal agreement on the classification of organisms, providing a single classification places constraints on the questions biologists can ask. However, maintaining multiple classifications is burdensome in the face of a constantly growing NCBI classification. RESULTS: In this paper, we present a solution to the problem of generating modifications of the NCBI taxonomy, based on the computation of an edit script that summarises the differences between two classification trees. Our algorithms find the shortest possible edit script based on the identification of all shared subtrees, and only take time quasi linear in the size of the trees because classification trees have unique node labels. CONCLUSION: These algorithms have been recently implemented, and the software is freely available for download from

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Enlighten

Mirroring co-evolving trees in the light of their topologies

Author: Hajirasouliha Iman
Juan David
Sahinalp S. Cenk
Schönhuth Alexander
Valencia Alfonso
Publication venue
Publication date: 01/01/2011
Field of study

Determining the interaction partners among protein/domain families poses hard computational problems, in particular in the presence of paralogous proteins. Available approaches aim to identify interaction partners among protein/domain families through maximizing the similarity between trimmed versions of their phylogenetic trees. Since maximization of any natural similarity score is computationally difficult, many approaches employ heuristics to maximize the distance matrices corresponding to the tree topologies in question. In this paper we devise an efficient deterministic algorithm which directly maximizes the similarity between two leaf labeled trees with edge lengths, obtaining a score-optimal alignment of the two trees in question. Our algorithm is significantly faster than those methods based on distance matrix comparison: 1 minute on a single processor vs. 730 hours on a supercomputer. Furthermore we have advantages over the current state-of-the-art heuristic search approach in terms of precision as well as a recently suggested overall performance measure for mirrortree approaches, while incurring only acceptable losses in recall. A C implementation of the method demonstrated in this paper is available at http://compbio.cs.sfu.ca/mirrort.htmComment: 13 pages, 2 figures, Iman Hajirasouliha and Alexander Sch\"onhuth are joint first author

arXiv.org e-Print Archive

CiteSeerX

Fast alignment of fragmentation trees

Author: Arora
Backofen
Björklund
Böcker
Canzar
Cui
Fernie
Fiehn
Florian Rasche
Franziska Hufsky
Halket
Herlihy
Hill
Horai
Jiang
Kai Dührkop
Last
Le
Lederberg
Li
Ljubić
Markus Chimani
Neumann
Oberacher
Pagh
Rasche
Rasche
Rauf
Scheubert
Schmidt
Sebastian Böcker
Sniedovich
Werner
Zhang
Zhang
Publication venue: Oxford University Press
Publication date: 11/06/2012
Field of study

Motivation: Mass spectrometry allows sensitive, automated and high-throughput analysis of small molecules such as metabolites. One major bottleneck in metabolomics is the identification of ‘unknown’ small molecules not in any database. Recently, fragmentation tree alignments have been introduced for the automated comparison of the fragmentation patterns of small molecules. Fragmentation pattern similarities are strongly correlated with the chemical similarity of the molecules, and allow us to cluster compounds based solely on their fragmentation patterns

Crossref

PubMed Central

MPG.PuRe

Faster Algorithms for the Maximum Common Subtree Isomorphism Problem

Author: Droschinsky Andre
Kriege Nils M.
Mutzel Petra
Publication venue
Publication date: 01/01/2016
Field of study

The maximum common subtree isomorphism problem asks for the largest possible isomorphism between subtrees of two given input trees. This problem is a natural restriction of the maximum common subgraph problem, which is

{\sf NP}

-hard in general graphs. Confining to trees renders polynomial time algorithms possible and is of fundamental importance for approaches on more general graph classes. Various variants of this problem in trees have been intensively studied. We consider the general case, where trees are neither rooted nor ordered and the isomorphism is maximum w.r.t. a weight function on the mapped vertices and edges. For trees of order

n

and maximum degree

\Delta

our algorithm achieves a running time of

\mathcal{O}(n^2\Delta)

by exploiting the structure of the matching instances arising as subproblems. Thus our algorithm outperforms the best previously known approaches. No faster algorithm is possible for trees of bounded degree and for trees of unbounded degree we show that a further reduction of the running time would directly improve the best known approach to the assignment problem. Combining a polynomial-delay algorithm for the enumeration of all maximum common subtree isomorphisms with central ideas of our new algorithm leads to an improvement of its running time from

\mathcal{O}(n^6+Tn^2)

\mathcal{O}(n^3+Tn\Delta)

, where

n

is the order of the larger tree,

T

is the number of different solutions, and

\Delta

is the minimum of the maximum degrees of the input trees. Our theoretical results are supplemented by an experimental evaluation on synthetic and real-world instances

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Exploiting syntactic relations for question answering

Author: Loreto Daniel
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2006
Field of study

Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.Includes bibliographical references (p. 61-66).Recently there has been a resurgent interest in syntax-based approaches to information access, as a means of overcoming the limitations of keyword-based approaches. So far attempts to use syntax have been ad hoc, choosing to use some syntactic information but still ignoring most of the tree structure. This thesis describes the design and implementation of SMARTQA, a proof-of-concept question answering system that compares syntactic trees in a principled manner. Specifically, SMARTQA uses a tree edit-distance algorithm to calculate the similarity between unordered, unrooted syntactic trees. The general case of this problem is NP-complete; in practice, SMARTQA demonstrates that an optimized implementation of the algorithm can be feasibly used for question answering applications.by Daniel Loreto.M.Eng

DSpace@MIT