7,171 research outputs found

    Optimal Decomposition Strategy For Tree Edit Distance

    Get PDF
    An ordered labeled tree is a tree where the left-to-right order among siblings is significant. Given two ordered labeled trees, the edit distance between them is the minimum cost edit operations that convert one tree to the other. In this thesis, we present an algorithm for the tree edit distance problem by using the optimal tree decomposition strategy. By combining the vertical compression of trees with optimal decomposition we can significantly reduce the running time of the algorithm. We compare our method with other methods both theoretically and experimentally. The test results show that our strategies on compressed trees are by far the best decomposition strategy, creating the least number of relevant sub-problems

    Decomposition algorithms for the tree edit distance problem

    Get PDF
    AbstractWe study the behavior of dynamic programming methods for the tree edit distance problem, such as [P. Klein, Computing the edit-distance between unrooted ordered trees, in: Proceedings of 6th European Symposium on Algorithms, 1998, p. 91–102; K. Zhang, D. Shasha, SIAM J. Comput. 18 (6) (1989) 1245–1262]. We show that those two algorithms may be described as decomposition strategies. We introduce the general framework of cover strategies, and we provide an exact characterization of the complexity of cover strategies. This analysis allows us to define a new tree edit distance algorithm, that is optimal for cover strategies

    An O(n^3)-Time Algorithm for Tree Edit Distance

    Full text link
    The {\em edit distance} between two ordered trees with vertex labels is the minimum cost of transforming one tree into the other by a sequence of elementary operations consisting of deleting and relabeling existing nodes, as well as inserting new nodes. In this paper, we present a worst-case O(n3)O(n^3)-time algorithm for this problem, improving the previous best O(n3logn)O(n^3\log n)-time algorithm~\cite{Klein}. Our result requires a novel adaptive strategy for deciding how a dynamic program divides into subproblems (which is interesting in its own right), together with a deeper understanding of the previous algorithms for the problem. We also prove the optimality of our algorithm among the family of \emph{decomposition strategy} algorithms--which also includes the previous fastest algorithms--by tightening the known lower bound of Ω(n2log2n)\Omega(n^2\log^2 n)~\cite{Touzet} to Ω(n3)\Omega(n^3), matching our algorithm's running time. Furthermore, we obtain matching upper and lower bounds of Θ(nm2(1+lognm))\Theta(n m^2 (1 + \log \frac{n}{m})) when the two trees have different sizes mm and~nn, where m<nm < n.Comment: 10 pages, 5 figures, 5 .tex files where TED.tex is the main on

    Approximating Edit Distance Within Constant Factor in Truly Sub-Quadratic Time

    Full text link
    Edit distance is a measure of similarity of two strings based on the minimum number of character insertions, deletions, and substitutions required to transform one string into the other. The edit distance can be computed exactly using a dynamic programming algorithm that runs in quadratic time. Andoni, Krauthgamer and Onak (2010) gave a nearly linear time algorithm that approximates edit distance within approximation factor poly(logn)\text{poly}(\log n). In this paper, we provide an algorithm with running time O~(n22/7)\tilde{O}(n^{2-2/7}) that approximates the edit distance within a constant factor

    Interactive visualisation and exploration of biological data

    Get PDF
    International audienceno abstrac

    Structural Rounding: Approximation Algorithms for Graphs Near an Algorithmically Tractable Class

    Get PDF
    We develop a framework for generalizing approximation algorithms from the structural graph algorithm literature so that they apply to graphs somewhat close to that class (a scenario we expect is common when working with real-world networks) while still guaranteeing approximation ratios. The idea is to edit a given graph via vertex- or edge-deletions to put the graph into an algorithmically tractable class, apply known approximation algorithms for that class, and then lift the solution to apply to the original graph. We give a general characterization of when an optimization problem is amenable to this approach, and show that it includes many well-studied graph problems, such as Independent Set, Vertex Cover, Feedback Vertex Set, Minimum Maximal Matching, Chromatic Number, (l-)Dominating Set, Edge (l-)Dominating Set, and Connected Dominating Set. To enable this framework, we develop new editing algorithms that find the approximately-fewest edits required to bring a given graph into one of a few important graph classes (in some cases these are bicriteria algorithms which simultaneously approximate both the number of editing operations and the target parameter of the family). For bounded degeneracy, we obtain an O(r log{n})-approximation and a bicriteria (4,4)-approximation which also extends to a smoother bicriteria trade-off. For bounded treewidth, we obtain a bicriteria (O(log^{1.5} n), O(sqrt{log w}))-approximation, and for bounded pathwidth, we obtain a bicriteria (O(log^{1.5} n), O(sqrt{log w} * log n))-approximation. For treedepth 2 (related to bounded expansion), we obtain a 4-approximation. We also prove complementary hardness-of-approximation results assuming P != NP: in particular, these problems are all log-factor inapproximable, except the last which is not approximable below some constant factor 2 (assuming UGC)

    A Survey on Metric Learning for Feature Vectors and Structured Data

    Full text link
    The need for appropriate ways to measure the distance or similarity between data is ubiquitous in machine learning, pattern recognition and data mining, but handcrafting such good metrics for specific problems is generally difficult. This has led to the emergence of metric learning, which aims at automatically learning a metric from data and has attracted a lot of interest in machine learning and related fields for the past ten years. This survey paper proposes a systematic review of the metric learning literature, highlighting the pros and cons of each approach. We pay particular attention to Mahalanobis distance metric learning, a well-studied and successful framework, but additionally present a wide range of methods that have recently emerged as powerful alternatives, including nonlinear metric learning, similarity learning and local metric learning. Recent trends and extensions, such as semi-supervised metric learning, metric learning for histogram data and the derivation of generalization guarantees, are also covered. Finally, this survey addresses metric learning for structured data, in particular edit distance learning, and attempts to give an overview of the remaining challenges in metric learning for the years to come.Comment: Technical report, 59 pages. Changes in v2: fixed typos and improved presentation. Changes in v3: fixed typos. Changes in v4: fixed typos and new method
    corecore