1,037 research outputs found

    Approximating Weighted Duo-Preservation in Comparative Genomics

    Full text link
    Motivated by comparative genomics, Chen et al. [9] introduced the Maximum Duo-preservation String Mapping (MDSM) problem in which we are given two strings s1s_1 and s2s_2 from the same alphabet and the goal is to find a mapping π\pi between them so as to maximize the number of duos preserved. A duo is any two consecutive characters in a string and it is preserved in the mapping if its two consecutive characters in s1s_1 are mapped to same two consecutive characters in s2s_2. The MDSM problem is known to be NP-hard and there are approximation algorithms for this problem [3, 5, 13], but all of them consider only the "unweighted" version of the problem in the sense that a duo from s1s_1 is preserved by mapping to any same duo in s2s_2 regardless of their positions in the respective strings. However, it is well-desired in comparative genomics to find mappings that consider preserving duos that are "closer" to each other under some distance measure [19]. In this paper, we introduce a generalized version of the problem, called the Maximum-Weight Duo-preservation String Mapping (MWDSM) problem that captures both duos-preservation and duos-distance measures in the sense that mapping a duo from s1s_1 to each preserved duo in s2s_2 has a weight, indicating the "closeness" of the two duos. The objective of the MWDSM problem is to find a mapping so as to maximize the total weight of preserved duos. In this paper, we give a polynomial-time 6-approximation algorithm for this problem.Comment: Appeared in proceedings of the 23rd International Computing and Combinatorics Conference (COCOON 2017

    Fast Matching-based Approximations for Maximum Duo-Preservation String Mapping and its Weighted Variant

    Get PDF

    The Maximum Duo-Preservation String Mapping Problem with Bounded Alphabet

    Get PDF
    Given two strings A and B such that B is a permutation of A, the max duo-preservation string mapping (MPSM) problem asks to find a mapping ? between them so as to preserve a maximum number of duos. A duo is any pair of consecutive characters in a string and it is preserved by ? if its two consecutive characters in A are mapped to same two consecutive characters in B. This problem has received a growing attention in recent years, partly as an alternative way to produce approximation algorithms for its minimization counterpart, min common string partition, a widely studied problem due its applications in comparative genomics. Considering this favored field of application with short alphabet, it is surprising that MPSM^?, the variant of MPSM with bounded alphabet, has received so little attention, with a single yet impressive work that provides a 2.67-approximation achieved in O(n) [Brubach, 2018], where n = |A| = |B|. Our work focuses on MPSM^?, and our main contribution is the demonstration that this problem admits a Polynomial Time Approximation Scheme (PTAS) when ? = O(1). We also provide an alternate, somewhat simpler, proof of NP-hardness for this problem compared with the NP-hardness proof presented in [Haitao Jiang et al., 2012]

    A Family of Approximation Algorithms for the Maximum Duo-Preservation String Mapping Problem

    Get PDF
    In the Maximum Duo-Preservation String Mapping problem we are given two strings and wish to map the letters of the former to the letters of the latter as to maximise the number of duos. A duo is a pair of consecutive letters that is mapped to a pair of consecutive letters in the same order. This is complementary to the well-studied Minimum Common String Partition problem, where the goal is to partition the former string into blocks that can be permuted and concatenated to obtain the latter string. Maximum Duo-Preservation String Mapping is APX-hard. After a series of improvements, Brubach [WABI 2016] showed a polynomial-time 3.25-approximation algorithm. Our main contribution is that, for any eps>0, there exists a polynomial-time (2+eps)-approximation algorithm. Similarly to a previous solution by Boria et al. [CPM 2016], our algorithm uses the local search technique. However, this is used only after a certain preliminary greedy procedure, which gives us more structure and makes a more general local search possible. We complement this with a specialised version of the algorithm that achieves 2.67-approximation in quadratic time

    Markets, Elections, and Microbes: Data-driven Algorithms from Theory to Practice

    Get PDF
    Many modern problems in algorithms and optimization are driven by data which often carries with it an element of uncertainty. In this work, we conduct an investigation into algorithmic foundations and applications across three main areas. The first area is online matching algorithms for e-commerce applications such as online sales and advertising. The importance of e-commerce in modern business cannot be overstated and even minor algorithmic improvements can have huge impacts. In online matching problems, we generally have a known offline set of goods or advertisements while users arrive online and allocations must be made immediately and irrevocably when a user arrives. However, in the real world, there is also uncertainty about a user's true interests and this can be modeled by considering matching problems in a graph with stochastic edges that only have a probability of existing. These edges can represent the probability of a user purchasing a product or clicking on an ad. Thus, we optimize over data which only provides an estimate of what types of users will arrive and what they will prefer. We survey a broad landscape of problems in this area, gain a deeper understanding of the algorithmic challenges, and present algorithms with improved worst case performance The second area is constrained clustering where we explore classical clustering problems with additional constraints on which data points should be clustered together. Utilizing these constraints is important for many clustering problems because they can be used to ensure fairness, exploit expert advice, or capture natural properties of the data. In simplest case, this can mean some pairs of points have ``must-link'' constraints requiring that that they must be clustered together. Moving into stochastic settings, we can describe more general pairwise constraints such as bounding the probability that two points are separated into different clusters. This lets us introduce a new notion of fairness for clustering and address stochastic problems such as semi-supervised learning with advice from imperfect experts. Here, we introduce new models of constrained clustering including new notions of fairness for clustering applications. Since these problems are NP-hard, we give approximation algorithms and in some cases conduct experiments to explore how the algorithms perform in practice. Finally, we look closely at the particular clustering problem of drawing election districts and show how constraining the clusters based on past voting data can interact with voter incentives. The third area is string algorithms for bioinformatics and metagenomics specifically where the data deluge from next generation sequencing drives the necessity for new algorithms that are both fast and accurate. For metagenomic analysis, we present a tool for clustering a microbial marker gene, the 16S ribosomal RNA gene. On the more theoretical side, we present a succinct application of the Method of the Four Russians to edit distance computation as well as new algorithms and bounds for the maximum duo-preservation string mapping (MPSM) problem

    28th Annual Symposium on Combinatorial Pattern Matching : CPM 2017, July 4-6, 2017, Warsaw, Poland

    Get PDF
    Peer reviewe

    Design and architecture of a stochastic programming modelling system

    Get PDF
    This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Decision making under uncertainty is an important yet challenging task; a number of alternative paradigms which address this problem have been proposed. Stochastic Programming (SP) and Robust Optimization (RO) are two such modelling ap-proaches, which we consider; these are natural extensions of Mathematical Pro-gramming modelling. The process that goes from the conceptualization of an SP model to its solution and the use of the optimization results is complex in respect to its deterministic counterpart. Many factors contribute to this complexity: (i) the representation of the random behaviour of the model parameters, (ii) the interfac-ing of the decision model with the model of randomness, (iii) the difficulty in solving (very) large model instances, (iv) the requirements for result analysis and perfor-mance evaluation through simulation techniques. An overview of the software tools which support stochastic programming modelling is given, and a conceptual struc-ture and the architecture of such tools are presented. This conceptualization is pre-sented as various interacting modules, namely (i) scenario generators, (ii) model generators, (iii) solvers and (iv) performance evaluation. Reflecting this research, we have redesigned and extended an established modelling system to support modelling under uncertainty. The collective system which integrates these other-wise disparate set of model formulations within a common framework is innovative and makes the resulting system a powerful modelling tool. The introduction of sce-nario generation in the ex-ante decision model and the integration with simulation and evaluation for the purpose of ex-post analysis by the use of workflows is novel and makes a contribution to knowledge

    String Factorizations Under Various Collision Constraints

    Get PDF
    In the NP-hard Equality-Free String Factorization problem, we are given a string S and ask whether S can be partitioned into k factors that are pairwise distinct. We describe a randomized algorithm for Equality-Free String Factorization with running time 2^k? k^{?(1)}+?(n) improving over previous algorithms with running time k^{?(k)}+?(n) [Schmid, TCS 2016; Mincu and Popa, Proc. SOFSEM 2020]. Our algorithm works for the generalization of Equality-Free String Factorization where equality can be replaced by an arbitrary polynomial-time computable equivalence relation on strings. We also consider two factorization problems to which this algorithm does not apply, namely Prefix-Free String Factorization where we ask for a factorization of size k such that no factor is a prefix of another factor and Substring-Free String Factorization where we ask for a factorization of size k such that no factor is a substring of another factor. We show that these two problems are NP-hard as well. Then, we show that Prefix-Free String Factorization with the prefix-free relation is fixed-parameter tractable with respect to k by providing a polynomial problem kernel. Finally, we show a generic ILP formulation for R-Free String Factorization where R is an arbitrary relation on strings. This formulation improves over a previous one for Equality-Free String Factorization in terms of the number of variables
    • …
    corecore