81 research outputs found

    Improved Approximability Result for Test Set with Small Redundancy

    Full text link
    Test set with redundancy is one of the focuses in recent bioinformatics research. Set cover greedy algorithm (SGA for short) is a commonly used algorithm for test set with redundancy. This paper proves that the approximation ratio of SGA can be (212r)lnn+3/2lnr+O(lnlnn)(2-\frac{1}{2r})\ln n+{3/2}\ln r+O(\ln\ln n) by using the potential function technique. This result is better than the approximation ratio 2lnn2\ln n which directly derives from set multicover, when r=o(lnnlnlnn)r=o(\frac{\ln n}{\ln\ln n}), and is an extension of the approximability results for plain test set.Comment: 7 page

    On optimal approximability results for computing the strong metric dimension

    Full text link
    The strong metric dimension of a graph was first introduced by Seb\"{o} and Tannier (Mathematics of Operations Research, 29(2), 383-393, 2004) as an alternative to the (weak) metric dimension of graphs previously introduced independently by Slater (Proc. 6th Southeastern Conference on Combinatorics, Graph Theory, and Computing, 549-559, 1975) and by Harary and Melter (Ars Combinatoria, 2, 191-195, 1976), and has since been investigated in several research papers. However, the exact worst-case computational complexity of computing the strong metric dimension has remained open beyond being NP-complete. In this communication, we show that the problem of computing the strong metric dimension of a graph of nn nodes admits a polynomial-time 22-approximation, admits a O(20.287n)O^\ast\big(2^{\,0.287\,n}\big)-time exact computation algorithm, admits a O(1.2738k+nk)O\big(1.2738^k+n\,k\big)-time exact computation algorithm if the strong metric dimension is at most kk, does not admit a polynomial time (2ε)(2-\varepsilon)-approximation algorithm assuming the unique games conjecture is true, does not admit a polynomial time (10521ε)(10\sqrt{5}-21-\varepsilon)-approximation algorithm assuming P\neqNP, does not admit a O(2o(n))O^\ast\big(2^{o(n)}\big)-time exact computation algorithm assuming the exponential time hypothesis is true, and does not admit a O(no(k))O^\ast\big(n^{o(k)}\big)-time exact computation algorithm if the strong metric dimension is at most kk assuming the exponential time hypothesis is true.Comment: revised version based on reviewer comments; to appear in Discrete Applied Mathematic

    Highly Scalable Algorithms for Robust String Barcoding

    Full text link
    String barcoding is a recently introduced technique for genomic-based identification of microorganisms. In this paper we describe the engineering of highly scalable algorithms for robust string barcoding. Our methods enable distinguisher selection based on whole genomic sequences of hundreds of microorganisms of up to bacterial size on a well-equipped workstation, and can be easily parallelized to further extend the applicability range to thousands of bacterial size genomes. Experimental results on both randomly generated and NCBI genomic data show that whole-genome based selection results in a number of distinguishers nearly matching the information theoretic lower bounds for the problem

    On Approximability of Clustering Problems Without Candidate Centers

    Full text link
    The k-means objective is arguably the most widely-used cost function for modeling clustering tasks in a metric space. In practice and historically, k-means is thought of in a continuous setting, namely where the centers can be located anywhere in the metric space. For example, the popular Lloyd's heuristic locates a center at the mean of each cluster. Despite persistent efforts on understanding the approximability of k-means, and other classic clustering problems such as k-median and k-minsum, our knowledge of the hardness of approximation factors of these problems remains quite poor. In this paper, we significantly improve upon the hardness of approximation factors known in the literature for these objectives. We show that if the input lies in a general metric space, it is NP-hard to approximate: \bullet Continuous k-median to a factor of 2o(1)2-o(1); this improves upon the previous inapproximability factor of 1.36 shown by Guha and Khuller (J. Algorithms '99). \bullet Continuous k-means to a factor of 4o(1)4- o(1); this improves upon the previous inapproximability factor of 2.10 shown by Guha and Khuller (J. Algorithms '99). \bullet k-minsum to a factor of 1.4151.415; this improves upon the APX-hardness shown by Guruswami and Indyk (SODA '03). Our results shed new and perhaps counter-intuitive light on the differences between clustering problems in the continuous setting versus the discrete setting (where the candidate centers are given as part of the input)

    A Birthday Repetition Theorem and Complexity of Approximating Dense CSPs

    Get PDF
    A (k×l)(k \times l)-birthday repetition Gk×l\mathcal{G}^{k \times l} of a two-prover game G\mathcal{G} is a game in which the two provers are sent random sets of questions from G\mathcal{G} of sizes kk and ll respectively. These two sets are sampled independently uniformly among all sets of questions of those particular sizes. We prove the following birthday repetition theorem: when G\mathcal{G} satisfies some mild conditions, val(Gk×l)val(\mathcal{G}^{k \times l}) decreases exponentially in Ω(kl/n)\Omega(kl/n) where nn is the total number of questions. Our result positively resolves an open question posted by Aaronson, Impagliazzo and Moshkovitz (CCC 2014). As an application of our birthday repetition theorem, we obtain new fine-grained hardness of approximation results for dense CSPs. Specifically, we establish a tight trade-off between running time and approximation ratio for dense CSPs by showing conditional lower bounds, integrality gaps and approximation algorithms. In particular, for any sufficiently large ii and for every k2k \geq 2, we show the following results: - We exhibit an O(q1/i)O(q^{1/i})-approximation algorithm for dense Max kk-CSPs with alphabet size qq via Ok(i)O_k(i)-level of Sherali-Adams relaxation. - Through our birthday repetition theorem, we obtain an integrality gap of q1/iq^{1/i} for Ω~k(i)\tilde\Omega_k(i)-level Lasserre relaxation for fully-dense Max kk-CSP. - Assuming that there is a constant ϵ>0\epsilon > 0 such that Max 3SAT cannot be approximated to within (1ϵ)(1-\epsilon) of the optimal in sub-exponential time, our birthday repetition theorem implies that any algorithm that approximates fully-dense Max kk-CSP to within a q1/iq^{1/i} factor takes (nq)Ω~k(i)(nq)^{\tilde \Omega_k(i)} time, almost tightly matching the algorithmic result based on Sherali-Adams relaxation.Comment: 45 page

    Approximation algorithms for two-machine flow-shop scheduling with a conflict graph

    Full text link
    Path cover is a well-known intractable problem that finds a minimum number of vertex disjoint paths in a given graph to cover all the vertices. We show that a variant, where the objective function is not the number of paths but the number of length-00 paths (that is, isolated vertices), turns out to be polynomial-time solvable. We further show that another variant, where the objective function is the total number of length-00 and length-11 paths, is also polynomial-time solvable. Both variants find applications in approximating the two-machine flow-shop scheduling problem in which job processing has constraints that are formulated as a conflict graph. For the unit jobs, we present a 4/34/3-approximation algorithm for the scheduling problem with an arbitrary conflict graph, based on the exact algorithm for the variants of the path cover problem. For the arbitrary jobs while the conflict graph is the union of two disjoint cliques, that is, all the jobs can be partitioned into two groups such that the jobs in a group are pairwise conflicting, we present a simple 3/23/2-approximation algorithm.Comment: 15 pages, 2 figure

    Greed is Still Good: Maximizing Monotone Submodular+Supermodular Functions

    Full text link
    We analyze the performance of the greedy algorithm, and also a discrete semi-gradient based algorithm, for maximizing the sum of a suBmodular and suPermodular (BP) function (both of which are non-negative monotone non-decreasing) under two types of constraints, either a cardinality constraint or p1p\geq 1 matroid independence constraints. These problems occur naturally in several real-world applications in data science, machine learning, and artificial intelligence. The problems are ordinarily inapproximable to any factor (as we show). Using the curvature κf\kappa_f of the submodular term, and introducing κg\kappa^g for the supermodular term (a natural dual curvature for supermodular functions), however, both of which are computable in linear time, we show that BP maximization can be efficiently approximated by both the greedy and the semi-gradient based algorithm. The algorithms yield multiplicative guarantees of 1κf[1e(1κg)κf]\frac{1}{\kappa_f}\left[1-e^{-(1-\kappa^g)\kappa_f}\right] and 1κg(1κg)κf+p\frac{1-\kappa^g}{(1-\kappa^g)\kappa_f + p} for the two types of constraints respectively. For pure monotone supermodular constrained maximization, these yield 1κg1-\kappa^g and (1κg)/p(1-\kappa^g)/p for the two types of constraints respectively. We also analyze the hardness of BP maximization and show that our guarantees match hardness by a constant factor and by O(ln(p))O(\ln(p)) respectively. Computational experiments are also provided supporting our analysis

    New Algorithms and Lower Bounds for Sequential-Access Data Compression

    Get PDF
    This thesis concerns sequential-access data compression, i.e., by algorithms that read the input one or more times from beginning to end. In one chapter we consider adaptive prefix coding, for which we must read the input character by character, outputting each character's self-delimiting codeword before reading the next one. We show how to encode and decode each character in constant worst-case time while producing an encoding whose length is worst-case optimal. In another chapter we consider one-pass compression with memory bounded in terms of the alphabet size and context length, and prove a nearly tight tradeoff between the amount of memory we can use and the quality of the compression we can achieve. In a third chapter we consider compression in the read/write streams model, which allows us passes and memory both polylogarithmic in the size of the input. We first show how to achieve universal compression using only one pass over one stream. We then show that one stream is not sufficient for achieving good grammar-based compression. Finally, we show that two streams are necessary and sufficient for achieving entropy-only bounds.Comment: draft of PhD thesi

    Phylogenetic CSPs are Approximation Resistant

    Full text link
    We study the approximability of a broad class of computational problems -- originally motivated in evolutionary biology and phylogenetic reconstruction -- concerning the aggregation of potentially inconsistent (local) information about nn items of interest, and we present optimal hardness of approximation results under the Unique Games Conjecture. The class of problems studied here can be described as Constraint Satisfaction Problems (CSPs) over infinite domains, where instead of values {0,1}\{0,1\} or a fixed-size domain, the variables can be mapped to any of the nn leaves of a phylogenetic tree. The topology of the tree then determines whether a given constraint on the variables is satisfied or not, and the resulting CSPs are called Phylogenetic CSPs. Prominent examples of Phylogenetic CSPs with a long history and applications in various disciplines include: Triplet Reconstruction, Quartet Reconstruction, Subtree Aggregation (Forbidden or Desired). For example, in Triplet Reconstruction, we are given mm triplets of the form ijkij|k (indicating that ``items i,ji,j are more similar to each other than to kk'') and we want to construct a hierarchical clustering on the nn items, that respects the constraints as much as possible. Despite more than four decades of research, the basic question of maximizing the number of satisfied constraints is not well-understood. The current best approximation is achieved by outputting a random tree (for triplets, this achieves a 1/3 approximation). Our main result is that every Phylogenetic CSP is approximation resistant, i.e., there is no polynomial-time algorithm that does asymptotically better than a (biased) random assignment. This is a generalization of the results in Guruswami, Hastad, Manokaran, Raghavendra, and Charikar (2011), who showed that ordering CSPs are approximation resistant (e.g., Max Acyclic Subgraph, Betweenness).Comment: 45 pages, 11 figures, Abstract shortened for arxi
    corecore