Search CORE

81 research outputs found

Improved Approximability Result for Test Set with Small Redundancy

Author: Cui Peng
Publication venue
Publication date: 27/09/2007
Field of study

Test set with redundancy is one of the focuses in recent bioinformatics research. Set cover greedy algorithm (SGA for short) is a commonly used algorithm for test set with redundancy. This paper proves that the approximation ratio of SGA can be

(2-\frac{1}{2r})\ln n+{3/2}\ln r+O(\ln\ln n)

by using the potential function technique. This result is better than the approximation ratio

2\ln n

which directly derives from set multicover, when

r=o(\frac{\ln n}{\ln\ln n})

, and is an extension of the approximability results for plain test set.Comment: 7 page

arXiv.org e-Print Archive

On optimal approximability results for computing the strong metric dimension

Author: DasGupta Bhaskar
Mobasheri Nasim
Publication venue: 'Elsevier BV'
Publication date: 20/10/2016
Field of study

The strong metric dimension of a graph was first introduced by Seb\"{o} and Tannier (Mathematics of Operations Research, 29(2), 383-393, 2004) as an alternative to the (weak) metric dimension of graphs previously introduced independently by Slater (Proc. 6th Southeastern Conference on Combinatorics, Graph Theory, and Computing, 549-559, 1975) and by Harary and Melter (Ars Combinatoria, 2, 191-195, 1976), and has since been investigated in several research papers. However, the exact worst-case computational complexity of computing the strong metric dimension has remained open beyond being NP-complete. In this communication, we show that the problem of computing the strong metric dimension of a graph of

n

nodes admits a polynomial-time

2

-approximation, admits a

O^\ast\big(2^{\,0.287\,n}\big)

-time exact computation algorithm, admits a

O\big(1.2738^k+n\,k\big)

-time exact computation algorithm if the strong metric dimension is at most

k

, does not admit a polynomial time

(2-\varepsilon)

-approximation algorithm assuming the unique games conjecture is true, does not admit a polynomial time

(10\sqrt{5}-21-\varepsilon)

-approximation algorithm assuming P

\neq

NP, does not admit a

O^\ast\big(2^{o(n)}\big)

-time exact computation algorithm assuming the exponential time hypothesis is true, and does not admit a

O^\ast\big(n^{o(k)}\big)

-time exact computation algorithm if the strong metric dimension is at most

k

assuming the exponential time hypothesis is true.Comment: revised version based on reviewer comments; to appear in Discrete Applied Mathematic

arXiv.org e-Print Archive

Highly Scalable Algorithms for Robust String Barcoding

Author: DasGupta Bhaskar
Konwar Kishori M.
Mandoiu Ion I.
Shvartsman Alex A.
Publication venue
Publication date: 01/01/2005
Field of study

String barcoding is a recently introduced technique for genomic-based identification of microorganisms. In this paper we describe the engineering of highly scalable algorithms for robust string barcoding. Our methods enable distinguisher selection based on whole genomic sequences of hundreds of microorganisms of up to bacterial size on a well-equipped workstation, and can be easily parallelized to further extend the applicability range to thousands of bacterial size genomes. Experimental results on both randomly generated and NCBI genomic data show that whole-genome based selection results in a number of distinguishers nearly matching the information theoretic lower bounds for the problem

arXiv.org e-Print Archive

CiteSeerX

On Approximability of Clustering Problems Without Candidate Centers

Author: Cohen-Addad Vincent
Lee Euiwoong
S. Karthik C.
Publication venue
Publication date: 02/10/2020
Field of study

The k-means objective is arguably the most widely-used cost function for modeling clustering tasks in a metric space. In practice and historically, k-means is thought of in a continuous setting, namely where the centers can be located anywhere in the metric space. For example, the popular Lloyd's heuristic locates a center at the mean of each cluster. Despite persistent efforts on understanding the approximability of k-means, and other classic clustering problems such as k-median and k-minsum, our knowledge of the hardness of approximation factors of these problems remains quite poor. In this paper, we significantly improve upon the hardness of approximation factors known in the literature for these objectives. We show that if the input lies in a general metric space, it is NP-hard to approximate:

\bullet

Continuous k-median to a factor of

2-o(1)

; this improves upon the previous inapproximability factor of 1.36 shown by Guha and Khuller (J. Algorithms '99).

\bullet

Continuous k-means to a factor of

4- o(1)

; this improves upon the previous inapproximability factor of 2.10 shown by Guha and Khuller (J. Algorithms '99).

\bullet

k-minsum to a factor of

1.415

; this improves upon the APX-hardness shown by Guruswami and Indyk (SODA '03). Our results shed new and perhaps counter-intuitive light on the differences between clustering problems in the continuous setting versus the discrete setting (where the candidate centers are given as part of the input)

arXiv.org e-Print Archive

A Birthday Repetition Theorem and Complexity of Approximating Dense CSPs

Author: Manurangsi Pasin
Raghavendra Prasad
Publication venue
Publication date: 11/07/2016
Field of study

(k \times l)

-birthday repetition

\mathcal{G}^{k \times l}

of a two-prover game

\mathcal{G}

is a game in which the two provers are sent random sets of questions from

\mathcal{G}

of sizes

k

and

l

respectively. These two sets are sampled independently uniformly among all sets of questions of those particular sizes. We prove the following birthday repetition theorem: when

\mathcal{G}

satisfies some mild conditions,

val(\mathcal{G}^{k \times l})

decreases exponentially in

\Omega(kl/n)

where

n

is the total number of questions. Our result positively resolves an open question posted by Aaronson, Impagliazzo and Moshkovitz (CCC 2014). As an application of our birthday repetition theorem, we obtain new fine-grained hardness of approximation results for dense CSPs. Specifically, we establish a tight trade-off between running time and approximation ratio for dense CSPs by showing conditional lower bounds, integrality gaps and approximation algorithms. In particular, for any sufficiently large

i

and for every

k \geq 2

, we show the following results: - We exhibit an

O(q^{1/i})

-approximation algorithm for dense Max

k

-CSPs with alphabet size

q

via

O_k(i)

-level of Sherali-Adams relaxation. - Through our birthday repetition theorem, we obtain an integrality gap of

q^{1/i}

for

\tilde\Omega_k(i)

-level Lasserre relaxation for fully-dense Max

k

-CSP. - Assuming that there is a constant

\epsilon > 0

such that Max 3SAT cannot be approximated to within

(1-\epsilon)

of the optimal in sub-exponential time, our birthday repetition theorem implies that any algorithm that approximates fully-dense Max

k

-CSP to within a

q^{1/i}

factor takes

(nq)^{\tilde \Omega_k(i)}

time, almost tightly matching the algorithmic result based on Sherali-Adams relaxation.Comment: 45 page

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Approximation algorithms for two-machine flow-shop scheduling with a conflict graph

Author: Cai Yinhui
Chen Guangting
Chen Yong
Goebel Randy
Lin Guohui
Liu Longcheng
Zhang An
Publication venue
Publication date: 07/03/2018
Field of study

Path cover is a well-known intractable problem that finds a minimum number of vertex disjoint paths in a given graph to cover all the vertices. We show that a variant, where the objective function is not the number of paths but the number of length-

0

paths (that is, isolated vertices), turns out to be polynomial-time solvable. We further show that another variant, where the objective function is the total number of length-

0

and length-

1

paths, is also polynomial-time solvable. Both variants find applications in approximating the two-machine flow-shop scheduling problem in which job processing has constraints that are formulated as a conflict graph. For the unit jobs, we present a

4/3

-approximation algorithm for the scheduling problem with an arbitrary conflict graph, based on the exact algorithm for the variants of the path cover problem. For the arbitrary jobs while the conflict graph is the union of two disjoint cliques, that is, all the jobs can be partitioned into two groups such that the jobs in a group are pairwise conflicting, we present a simple

3/2

-approximation algorithm.Comment: 15 pages, 2 figure

arXiv.org e-Print Archive

Greed is Still Good: Maximizing Monotone Submodular+Supermodular Functions

Author: Bai Wenruo
Bilmes Jeffrey A.
Publication venue
Publication date: 23/01/2018
Field of study

We analyze the performance of the greedy algorithm, and also a discrete semi-gradient based algorithm, for maximizing the sum of a suBmodular and suPermodular (BP) function (both of which are non-negative monotone non-decreasing) under two types of constraints, either a cardinality constraint or

p\geq 1

matroid independence constraints. These problems occur naturally in several real-world applications in data science, machine learning, and artificial intelligence. The problems are ordinarily inapproximable to any factor (as we show). Using the curvature

\kappa_f

of the submodular term, and introducing

\kappa^g

for the supermodular term (a natural dual curvature for supermodular functions), however, both of which are computable in linear time, we show that BP maximization can be efficiently approximated by both the greedy and the semi-gradient based algorithm. The algorithms yield multiplicative guarantees of

\frac{1}{\kappa_f}\left[1-e^{-(1-\kappa^g)\kappa_f}\right]

and

\frac{1-\kappa^g}{(1-\kappa^g)\kappa_f + p}

for the two types of constraints respectively. For pure monotone supermodular constrained maximization, these yield

1-\kappa^g

and

(1-\kappa^g)/p

for the two types of constraints respectively. We also analyze the hardness of BP maximization and show that our guarantees match hardness by a constant factor and by

O(\ln(p))

respectively. Computational experiments are also provided supporting our analysis

arXiv.org e-Print Archive

New Algorithms and Lower Bounds for Sequential-Access Data Compression

Author: Gagie Travis
Publication venue
Publication date: 01/01/2009
Field of study

This thesis concerns sequential-access data compression, i.e., by algorithms that read the input one or more times from beginning to end. In one chapter we consider adaptive prefix coding, for which we must read the input character by character, outputting each character's self-delimiting codeword before reading the next one. We show how to encode and decode each character in constant worst-case time while producing an encoding whose length is worst-case optimal. In another chapter we consider one-pass compression with memory bounded in terms of the alphabet size and context length, and prove a nearly tight tradeoff between the amount of memory we can use and the quality of the compression we can achieve. In a third chapter we consider compression in the read/write streams model, which allows us passes and memory both polylogarithmic in the size of the input. We first show how to achieve universal compression using only one pass over one stream. We then show that one stream is not sufficient for achieving good grammar-based compression. Finally, we show that two streams are necessary and sufficient for achieving entropy-only bounds.Comment: draft of PhD thesi

arXiv.org e-Print Archive

Publications at Bielefeld University

Phylogenetic CSPs are Approximation Resistant

Author: Chatziafratis Vaggos
Makarychev Konstantin
Publication venue
Publication date: 24/12/2022
Field of study

We study the approximability of a broad class of computational problems -- originally motivated in evolutionary biology and phylogenetic reconstruction -- concerning the aggregation of potentially inconsistent (local) information about

n

items of interest, and we present optimal hardness of approximation results under the Unique Games Conjecture. The class of problems studied here can be described as Constraint Satisfaction Problems (CSPs) over infinite domains, where instead of values

\{0,1\}

or a fixed-size domain, the variables can be mapped to any of the

n

leaves of a phylogenetic tree. The topology of the tree then determines whether a given constraint on the variables is satisfied or not, and the resulting CSPs are called Phylogenetic CSPs. Prominent examples of Phylogenetic CSPs with a long history and applications in various disciplines include: Triplet Reconstruction, Quartet Reconstruction, Subtree Aggregation (Forbidden or Desired). For example, in Triplet Reconstruction, we are given

m

triplets of the form

ij|k

(indicating that ``items

i,j

are more similar to each other than to

k

'') and we want to construct a hierarchical clustering on the

n

items, that respects the constraints as much as possible. Despite more than four decades of research, the basic question of maximizing the number of satisfied constraints is not well-understood. The current best approximation is achieved by outputting a random tree (for triplets, this achieves a 1/3 approximation). Our main result is that every Phylogenetic CSP is approximation resistant, i.e., there is no polynomial-time algorithm that does asymptotically better than a (biased) random assignment. This is a generalization of the results in Guruswami, Hastad, Manokaran, Raghavendra, and Charikar (2011), who showed that ordering CSPs are approximation resistant (e.g., Max Acyclic Subgraph, Betweenness).Comment: 45 pages, 11 figures, Abstract shortened for arxi

arXiv.org e-Print Archive

Efficient combinatorial algorithms for DNA microarray design

Author: Li Ying
Publication venue
Publication date
Field of study

University of Liverpool Repository