2,487 research outputs found

    Computational Performance Evaluation of Two Integer Linear Programming Models for the Minimum Common String Partition Problem

    Full text link
    In the minimum common string partition (MCSP) problem two related input strings are given. "Related" refers to the property that both strings consist of the same set of letters appearing the same number of times in each of the two strings. The MCSP seeks a minimum cardinality partitioning of one string into non-overlapping substrings that is also a valid partitioning for the second string. This problem has applications in bioinformatics e.g. in analyzing related DNA or protein sequences. For strings with lengths less than about 1000 letters, a previously published integer linear programming (ILP) formulation yields, when solved with a state-of-the-art solver such as CPLEX, satisfactory results. In this work, we propose a new, alternative ILP model that is compared to the former one. While a polyhedral study shows the linear programming relaxations of the two models to be equally strong, a comprehensive experimental comparison using real-world as well as artificially created benchmark instances indicates substantial computational advantages of the new formulation.Comment: arXiv admin note: text overlap with arXiv:1405.5646 This paper version replaces the one submitted on January 10, 2015, due to detected error in the calculation of the variables involved in the ILP model

    Edit Distance: Sketching, Streaming and Document Exchange

    Full text link
    We show that in the document exchange problem, where Alice holds x{0,1}nx \in \{0,1\}^n and Bob holds y{0,1}ny \in \{0,1\}^n, Alice can send Bob a message of size O(K(log2K+logn))O(K(\log^2 K+\log n)) bits such that Bob can recover xx using the message and his input yy if the edit distance between xx and yy is no more than KK, and output "error" otherwise. Both the encoding and decoding can be done in time O~(n+poly(K))\tilde{O}(n+\mathsf{poly}(K)). This result significantly improves the previous communication bounds under polynomial encoding/decoding time. We also show that in the referee model, where Alice and Bob hold xx and yy respectively, they can compute sketches of xx and yy of sizes poly(Klogn)\mathsf{poly}(K \log n) bits (the encoding), and send to the referee, who can then compute the edit distance between xx and yy together with all the edit operations if the edit distance is no more than KK, and output "error" otherwise (the decoding). To the best of our knowledge, this is the first result for sketching edit distance using poly(Klogn)\mathsf{poly}(K \log n) bits. Moreover, the encoding phase of our sketching algorithm can be performed by scanning the input string in one pass. Thus our sketching algorithm also implies the first streaming algorithm for computing edit distance and all the edits exactly using poly(Klogn)\mathsf{poly}(K \log n) bits of space.Comment: Full version of an article to be presented at the 57th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2016

    Dissimilarity Clustering by Hierarchical Multi-Level Refinement

    Full text link
    We introduce in this paper a new way of optimizing the natural extension of the quantization error using in k-means clustering to dissimilarity data. The proposed method is based on hierarchical clustering analysis combined with multi-level heuristic refinement. The method is computationally efficient and achieves better quantization errors than theComment: 20-th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2012), Bruges : Belgium (2012

    Approximate Matching of Hierarchial Data

    Get PDF

    Approximating reversal distance for strings with bounded number of duplicates

    Get PDF
    AbstractFor a string A=a1…an, a reversal ρ(i,j), 1⩽i⩽j⩽n, transforms the string A into a string A′=a1…ai-1ajaj-1…aiaj+1… an, that is, the reversal ρ(i,j) reverses the order of symbols in the substring ai…aj of A. In the case of signed strings, where each symbol is given a sign + or -, the reversal operation also flips the sign of each symbol in the reversed substring. Given two strings, A and B, signed or unsigned, sorting by reversals (SBR) is the problem of finding the minimum number of reversals that transform the string A into the string B.Traditionally, the problem was studied for permutations, that is, for strings in which every symbol appears exactly once. We consider a generalization of the problem, k-SBR, and allow each symbol to appear at most k times in each string, for some k⩾1. The main result of the paper is an O(k2)-approximation algorithm running in time O(n). For instances with 3<k⩽O(lognlog*n), this is the best known approximation algorithm for k-SBRand, moreover, it is faster than the previous best approximation algorithm

    Fast Quasi-Threshold Editing

    Full text link
    We introduce Quasi-Threshold Mover (QTM), an algorithm to solve the quasi-threshold (also called trivially perfect) graph editing problem with edge insertion and deletion. Given a graph it computes a quasi-threshold graph which is close in terms of edit count. This edit problem is NP-hard. We present an extensive experimental study, in which we show that QTM is the first algorithm that is able to scale to large real-world graphs in practice. As a side result we further present a simple linear-time algorithm for the quasi-threshold recognition problem.Comment: 26 pages, 4 figures, submitted to ESA 201
    corecore