1,037 research outputs found
Approximating Weighted Duo-Preservation in Comparative Genomics
Motivated by comparative genomics, Chen et al. [9] introduced the Maximum
Duo-preservation String Mapping (MDSM) problem in which we are given two
strings and from the same alphabet and the goal is to find a
mapping between them so as to maximize the number of duos preserved. A
duo is any two consecutive characters in a string and it is preserved in the
mapping if its two consecutive characters in are mapped to same two
consecutive characters in . The MDSM problem is known to be NP-hard and
there are approximation algorithms for this problem [3, 5, 13], but all of them
consider only the "unweighted" version of the problem in the sense that a duo
from is preserved by mapping to any same duo in regardless of their
positions in the respective strings. However, it is well-desired in comparative
genomics to find mappings that consider preserving duos that are "closer" to
each other under some distance measure [19]. In this paper, we introduce a
generalized version of the problem, called the Maximum-Weight Duo-preservation
String Mapping (MWDSM) problem that captures both duos-preservation and
duos-distance measures in the sense that mapping a duo from to each
preserved duo in has a weight, indicating the "closeness" of the two
duos. The objective of the MWDSM problem is to find a mapping so as to maximize
the total weight of preserved duos. In this paper, we give a polynomial-time
6-approximation algorithm for this problem.Comment: Appeared in proceedings of the 23rd International Computing and
Combinatorics Conference (COCOON 2017
The Maximum Duo-Preservation String Mapping Problem with Bounded Alphabet
Given two strings A and B such that B is a permutation of A, the max duo-preservation string mapping (MPSM) problem asks to find a mapping ? between them so as to preserve a maximum number of duos. A duo is any pair of consecutive characters in a string and it is preserved by ? if its two consecutive characters in A are mapped to same two consecutive characters in B. This problem has received a growing attention in recent years, partly as an alternative way to produce approximation algorithms for its minimization counterpart, min common string partition, a widely studied problem due its applications in comparative genomics. Considering this favored field of application with short alphabet, it is surprising that MPSM^?, the variant of MPSM with bounded alphabet, has received so little attention, with a single yet impressive work that provides a 2.67-approximation achieved in O(n) [Brubach, 2018], where n = |A| = |B|. Our work focuses on MPSM^?, and our main contribution is the demonstration that this problem admits a Polynomial Time Approximation Scheme (PTAS) when ? = O(1). We also provide an alternate, somewhat simpler, proof of NP-hardness for this problem compared with the NP-hardness proof presented in [Haitao Jiang et al., 2012]
A Family of Approximation Algorithms for the Maximum Duo-Preservation String Mapping Problem
In the Maximum Duo-Preservation String Mapping problem we are given two strings and wish to map the letters of the former to the letters of the latter as to maximise the number of duos. A duo is a pair of consecutive letters that is mapped to a pair of consecutive letters in the same order. This is complementary to the well-studied Minimum Common String Partition problem, where the goal is to partition the former string into blocks that can be permuted and concatenated to obtain the latter string.
Maximum Duo-Preservation String Mapping is APX-hard. After a series of improvements, Brubach [WABI 2016] showed a polynomial-time 3.25-approximation algorithm. Our main contribution is that, for any eps>0, there exists a polynomial-time (2+eps)-approximation algorithm. Similarly to a previous solution by Boria et al. [CPM 2016], our algorithm uses the local search technique. However, this is used only after a certain preliminary greedy procedure, which gives us more structure and makes a more general local search possible. We complement this with a specialised version of the algorithm that achieves 2.67-approximation in quadratic time
Markets, Elections, and Microbes: Data-driven Algorithms from Theory to Practice
Many modern problems in algorithms and optimization are driven by data which often carries with it an element of uncertainty. In this work, we conduct an investigation into algorithmic foundations and applications across three main areas.
The first area is online matching algorithms for e-commerce applications such as online sales and advertising. The importance of e-commerce in modern business cannot be overstated and even minor algorithmic improvements can have huge impacts. In online matching problems, we generally have a known offline set of goods or advertisements while users arrive online and allocations must be made immediately and irrevocably when a user arrives. However, in the real world, there is also uncertainty about a user's true interests and this can be modeled by considering matching problems in a graph with stochastic edges that only have a probability of existing. These edges can represent the probability of a user purchasing a product or clicking on an ad. Thus, we optimize over data which only provides an estimate of what types of users will arrive and what they will prefer. We survey a broad landscape of problems in this area, gain a deeper understanding of the algorithmic challenges, and present algorithms with improved worst case performance
The second area is constrained clustering where we explore classical clustering problems with additional constraints on which data points should be clustered together. Utilizing these constraints is important for many clustering problems because they can be used to ensure fairness, exploit expert advice, or capture natural properties of the data. In simplest case, this can mean some pairs of points have ``must-link'' constraints requiring that that they must be clustered together. Moving into stochastic settings, we can describe more general pairwise constraints such as bounding the probability that two points are separated into different clusters. This lets us introduce a new notion of fairness for clustering and address stochastic problems such as semi-supervised learning with advice from imperfect experts. Here, we introduce new models of constrained clustering including new notions of fairness for clustering applications. Since these problems are NP-hard, we give approximation algorithms and in some cases conduct experiments to explore how the algorithms perform in practice. Finally, we look closely at the particular clustering problem of drawing election districts and show how constraining the clusters based on past voting data can interact with voter incentives.
The third area is string algorithms for bioinformatics and metagenomics specifically where the data deluge from next generation sequencing drives the necessity for new algorithms that are both fast and accurate. For metagenomic analysis, we present a tool for clustering a microbial marker gene, the 16S ribosomal RNA gene. On the more theoretical side, we present a succinct application of the Method of the Four Russians to edit distance computation as well as new algorithms and bounds for the maximum duo-preservation string mapping (MPSM) problem
28th Annual Symposium on Combinatorial Pattern Matching : CPM 2017, July 4-6, 2017, Warsaw, Poland
Peer reviewe
Design and architecture of a stochastic programming modelling system
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Decision making under uncertainty is an important yet challenging task; a number of alternative paradigms which address this problem have been proposed. Stochastic Programming (SP) and Robust Optimization (RO) are two such modelling ap-proaches, which we consider; these are natural extensions of Mathematical Pro-gramming modelling. The process that goes from the conceptualization of an SP model to its solution and the use of the optimization results is complex in respect to its deterministic counterpart. Many factors contribute to this complexity: (i) the representation of the random behaviour of the model parameters, (ii) the interfac-ing of the decision model with the model of randomness, (iii) the difficulty in solving (very) large model instances, (iv) the requirements for result analysis and perfor-mance evaluation through simulation techniques. An overview of the software tools which support stochastic programming modelling is given, and a conceptual struc-ture and the architecture of such tools are presented. This conceptualization is pre-sented as various interacting modules, namely (i) scenario generators, (ii) model generators, (iii) solvers and (iv) performance evaluation. Reflecting this research, we have redesigned and extended an established modelling system to support modelling under uncertainty. The collective system which integrates these other-wise disparate set of model formulations within a common framework is innovative and makes the resulting system a powerful modelling tool. The introduction of sce-nario generation in the ex-ante decision model and the integration with simulation and evaluation for the purpose of ex-post analysis by the use of workflows is novel and makes a contribution to knowledge
String Factorizations Under Various Collision Constraints
In the NP-hard Equality-Free String Factorization problem, we are given a string S and ask whether S can be partitioned into k factors that are pairwise distinct. We describe a randomized algorithm for Equality-Free String Factorization with running time 2^k? k^{?(1)}+?(n) improving over previous algorithms with running time k^{?(k)}+?(n) [Schmid, TCS 2016; Mincu and Popa, Proc. SOFSEM 2020]. Our algorithm works for the generalization of Equality-Free String Factorization where equality can be replaced by an arbitrary polynomial-time computable equivalence relation on strings. We also consider two factorization problems to which this algorithm does not apply, namely Prefix-Free String Factorization where we ask for a factorization of size k such that no factor is a prefix of another factor and Substring-Free String Factorization where we ask for a factorization of size k such that no factor is a substring of another factor. We show that these two problems are NP-hard as well. Then, we show that Prefix-Free String Factorization with the prefix-free relation is fixed-parameter tractable with respect to k by providing a polynomial problem kernel. Finally, we show a generic ILP formulation for R-Free String Factorization where R is an arbitrary relation on strings. This formulation improves over a previous one for Equality-Free String Factorization in terms of the number of variables
- …