30,198 research outputs found

    Computational Performance Evaluation of Two Integer Linear Programming Models for the Minimum Common String Partition Problem

    Full text link
    In the minimum common string partition (MCSP) problem two related input strings are given. "Related" refers to the property that both strings consist of the same set of letters appearing the same number of times in each of the two strings. The MCSP seeks a minimum cardinality partitioning of one string into non-overlapping substrings that is also a valid partitioning for the second string. This problem has applications in bioinformatics e.g. in analyzing related DNA or protein sequences. For strings with lengths less than about 1000 letters, a previously published integer linear programming (ILP) formulation yields, when solved with a state-of-the-art solver such as CPLEX, satisfactory results. In this work, we propose a new, alternative ILP model that is compared to the former one. While a polyhedral study shows the linear programming relaxations of the two models to be equally strong, a comprehensive experimental comparison using real-world as well as artificially created benchmark instances indicates substantial computational advantages of the new formulation.Comment: arXiv admin note: text overlap with arXiv:1405.5646 This paper version replaces the one submitted on January 10, 2015, due to detected error in the calculation of the variables involved in the ILP model

    Approximating Weighted Duo-Preservation in Comparative Genomics

    Full text link
    Motivated by comparative genomics, Chen et al. [9] introduced the Maximum Duo-preservation String Mapping (MDSM) problem in which we are given two strings s1s_1 and s2s_2 from the same alphabet and the goal is to find a mapping π\pi between them so as to maximize the number of duos preserved. A duo is any two consecutive characters in a string and it is preserved in the mapping if its two consecutive characters in s1s_1 are mapped to same two consecutive characters in s2s_2. The MDSM problem is known to be NP-hard and there are approximation algorithms for this problem [3, 5, 13], but all of them consider only the "unweighted" version of the problem in the sense that a duo from s1s_1 is preserved by mapping to any same duo in s2s_2 regardless of their positions in the respective strings. However, it is well-desired in comparative genomics to find mappings that consider preserving duos that are "closer" to each other under some distance measure [19]. In this paper, we introduce a generalized version of the problem, called the Maximum-Weight Duo-preservation String Mapping (MWDSM) problem that captures both duos-preservation and duos-distance measures in the sense that mapping a duo from s1s_1 to each preserved duo in s2s_2 has a weight, indicating the "closeness" of the two duos. The objective of the MWDSM problem is to find a mapping so as to maximize the total weight of preserved duos. In this paper, we give a polynomial-time 6-approximation algorithm for this problem.Comment: Appeared in proceedings of the 23rd International Computing and Combinatorics Conference (COCOON 2017

    Minimum Convex Partitions and Maximum Empty Polytopes

    Full text link
    Let SS be a set of nn points in Rd\mathbb{R}^d. A Steiner convex partition is a tiling of conv(S){\rm conv}(S) with empty convex bodies. For every integer dd, we show that SS admits a Steiner convex partition with at most (n1)/d\lceil (n-1)/d\rceil tiles. This bound is the best possible for points in general position in the plane, and it is best possible apart from constant factors in every fixed dimension d3d\geq 3. We also give the first constant-factor approximation algorithm for computing a minimum Steiner convex partition of a planar point set in general position. Establishing a tight lower bound for the maximum volume of a tile in a Steiner convex partition of any nn points in the unit cube is equivalent to a famous problem of Danzer and Rogers. It is conjectured that the volume of the largest tile is ω(1/n)\omega(1/n). Here we give a (1ε)(1-\varepsilon)-approximation algorithm for computing the maximum volume of an empty convex body amidst nn given points in the dd-dimensional unit box [0,1]d[0,1]^d.Comment: 16 pages, 4 figures; revised write-up with some running times improve

    On Optimally Partitioning Variable-Byte Codes

    Get PDF
    The ubiquitous Variable-Byte encoding is one of the fastest compressed representation for integer sequences. However, its compression ratio is usually not competitive with other more sophisticated encoders, especially when the integers to be compressed are small that is the typical case for inverted indexes. This paper shows that the compression ratio of Variable-Byte can be improved by 2x by adopting a partitioned representation of the inverted lists. This makes Variable-Byte surprisingly competitive in space with the best bit-aligned encoders, hence disproving the folklore belief that Variable-Byte is space-inefficient for inverted index compression. Despite the significant space savings, we show that our optimization almost comes for free, given that: we introduce an optimal partitioning algorithm that does not affect indexing time because of its linear-time complexity; we show that the query processing speed of Variable-Byte is preserved, with an extensive experimental analysis and comparison with several other state-of-the-art encoders.Comment: Published in IEEE Transactions on Knowledge and Data Engineering (TKDE), 15 April 201

    The Salesman's Improved Tours for Fundamental Classes

    Full text link
    Finding the exact integrality gap α\alpha for the LP relaxation of the metric Travelling Salesman Problem (TSP) has been an open problem for over thirty years, with little progress made. It is known that 4/3α3/24/3 \leq \alpha \leq 3/2, and a famous conjecture states α=4/3\alpha = 4/3. For this problem, essentially two "fundamental" classes of instances have been proposed. This fundamental property means that in order to show that the integrality gap is at most ρ\rho for all instances of metric TSP, it is sufficient to show it only for the instances in the fundamental class. However, despite the importance and the simplicity of such classes, no apparent effort has been deployed for improving the integrality gap bounds for them. In this paper we take a natural first step in this endeavour, and consider the 1/21/2-integer points of one such class. We successfully improve the upper bound for the integrality gap from 3/23/2 to 10/710/7 for a superclass of these points, as well as prove a lower bound of 4/34/3 for the superclass. Our methods involve innovative applications of tools from combinatorial optimization which have the potential to be more broadly applied

    Robust and MaxMin Optimization under Matroid and Knapsack Uncertainty Sets

    Full text link
    Consider the following problem: given a set system (U,I) and an edge-weighted graph G = (U, E) on the same universe U, find the set A in I such that the Steiner tree cost with terminals A is as large as possible: "which set in I is the most difficult to connect up?" This is an example of a max-min problem: find the set A in I such that the value of some minimization (covering) problem is as large as possible. In this paper, we show that for certain covering problems which admit good deterministic online algorithms, we can give good algorithms for max-min optimization when the set system I is given by a p-system or q-knapsacks or both. This result is similar to results for constrained maximization of submodular functions. Although many natural covering problems are not even approximately submodular, we show that one can use properties of the online algorithm as a surrogate for submodularity. Moreover, we give stronger connections between max-min optimization and two-stage robust optimization, and hence give improved algorithms for robust versions of various covering problems, for cases where the uncertainty sets are given by p-systems and q-knapsacks.Comment: 17 pages. Preliminary version combining this paper and http://arxiv.org/abs/0912.1045 appeared in ICALP 201
    corecore