13,521 research outputs found

    A Multi-labeled Tree Edit Distance for Comparing "Clonal Trees" of Tumor Progression

    Get PDF
    We introduce a new edit distance measure between a pair of "clonal trees", each representing the progression and mutational heterogeneity of a tumor sample, constructed by the use of single cell or bulk high throughput sequencing data. In a clonal tree, each vertex represents a specific tumor clone, and is labeled with one or more mutations in a way that each mutation is assigned to the oldest clone that harbors it. Given two clonal trees, our multi-labeled tree edit distance (MLTED) measure is defined as the minimum number of mutation/label deletions, (empty) leaf deletions, and vertex (clonal) expansions, applied in any order, to convert each of the two trees to the maximal common tree. We show that the MLTED measure can be computed efficiently in polynomial time and it captures the similarity between trees of different clonal granularity well. We have implemented our algorithm to compute MLTED exactly and applied it to a variety of data sets successfully. The source code of our method can be found in: https://github.com/khaled-rahman/leafDelTED

    A generalized Robinson-Foulds distance for labeled trees

    Get PDF
    Background: The Robinson-Foulds (RF) distance is a well-established measure between phylogenetic trees. Despite a lack of biological justification, it has the advantages of being a proper metric and being computable in linear time. For phylogenetic applications involving genes, however, a crucial aspect of the trees ignored by the RF metric is the type of the branching event (e.g. speciation, duplication, transfer, etc). Results: We extend RF to trees with labeled internal nodes by including a node flip operation, alongside edge contractions and extensions. We explore properties of this extended RF distance in the case of a binary labeling. In particular, we show that contrary to the unlabeled case, an optimal edit path may require contracting “good” edges, i.e. edges shared between the two trees. Conclusions: We provide a 2-approximation algorithm which is shown to perform well empirically. Looking ahead, computing distances between labeled trees opens up a variety of new algorithmic directions. Implementation and simulations available at https://github.com/DessimozLab/pylabeledrf

    A generalized Robinson-Foulds distance for labeled trees.

    Get PDF
    The Robinson-Foulds (RF) distance is a well-established measure between phylogenetic trees. Despite a lack of biological justification, it has the advantages of being a proper metric and being computable in linear time. For phylogenetic applications involving genes, however, a crucial aspect of the trees ignored by the RF metric is the type of the branching event (e.g. speciation, duplication, transfer, etc). We extend RF to trees with labeled internal nodes by including a node flip operation, alongside edge contractions and extensions. We explore properties of this extended RF distance in the case of a binary labeling. In particular, we show that contrary to the unlabeled case, an optimal edit path may require contracting "good" edges, i.e. edges shared between the two trees. We provide a 2-approximation algorithm which is shown to perform well empirically. Looking ahead, computing distances between labeled trees opens up a variety of new algorithmic directions.Implementation and simulations available at https://github.com/DessimozLab/pylabeledrf

    Automatic Wrapper Adaptation by Tree Edit Distance Matching

    Get PDF
    Information distributed through the Web keeps growing faster day by day,\ud and for this reason, several techniques for extracting Web data have been suggested\ud during last years. Often, extraction tasks are performed through so called wrappers,\ud procedures extracting information from Web pages, e.g. implementing logic-based\ud techniques. Many fields of application today require a strong degree of robustness\ud of wrappers, in order not to compromise assets of information or reliability of data\ud extracted.\ud Unfortunately, wrappers may fail in the task of extracting data from a Web page, if\ud its structure changes, sometimes even slightly, thus requiring the exploiting of new\ud techniques to be automatically held so as to adapt the wrapper to the new structure\ud of the page, in case of failure. In this work we present a novel approach of automatic wrapper adaptation based on the measurement of similarity of trees through\ud improved tree edit distance matching techniques

    Process model comparison based on cophenetic distance

    Get PDF
    The automated comparison of process models has received increasing attention in the last decade, due to the growing existence of process models and repositories, and the consequent need to assess similarities between the underlying processes. Current techniques for process model comparison are either structural (based on graph edit distances), or behavioural (through activity profiles or the analysis of the execution semantics). Accordingly, there is a gap between the quality of the information provided by these two families, i.e., structural techniques may be fast but inaccurate, whilst behavioural are accurate but complex. In this paper we present a novel technique, that is based on a well-known technique to compare labeled trees through the notion of Cophenetic distance. The technique lays between the two families of methods for comparing a process model: it has an structural nature, but can provide accurate information on the differences/similarities of two process models. The experimental evaluation on various benchmarks sets are reported, that position the proposed technique as a valuable tool for process model comparison.Peer ReviewedPostprint (author's final draft
    corecore