81 research outputs found
Improved Approximability Result for Test Set with Small Redundancy
Test set with redundancy is one of the focuses in recent bioinformatics
research. Set cover greedy algorithm (SGA for short) is a commonly used
algorithm for test set with redundancy. This paper proves that the
approximation ratio of SGA can be by using the potential function technique. This result is better than the
approximation ratio which directly derives from set multicover, when
, and is an extension of the approximability
results for plain test set.Comment: 7 page
On optimal approximability results for computing the strong metric dimension
The strong metric dimension of a graph was first introduced by Seb\"{o} and
Tannier (Mathematics of Operations Research, 29(2), 383-393, 2004) as an
alternative to the (weak) metric dimension of graphs previously introduced
independently by Slater (Proc. 6th Southeastern Conference on Combinatorics,
Graph Theory, and Computing, 549-559, 1975) and by Harary and Melter (Ars
Combinatoria, 2, 191-195, 1976), and has since been investigated in several
research papers. However, the exact worst-case computational complexity of
computing the strong metric dimension has remained open beyond being
NP-complete. In this communication, we show that the problem of computing the
strong metric dimension of a graph of nodes admits a polynomial-time
-approximation, admits a -time exact
computation algorithm, admits a -time exact
computation algorithm if the strong metric dimension is at most , does not
admit a polynomial time -approximation algorithm assuming the
unique games conjecture is true, does not admit a polynomial time
-approximation algorithm assuming PNP, does
not admit a -time exact computation algorithm
assuming the exponential time hypothesis is true, and does not admit a
-time exact computation algorithm if the strong
metric dimension is at most assuming the exponential time hypothesis is
true.Comment: revised version based on reviewer comments; to appear in Discrete
Applied Mathematic
Highly Scalable Algorithms for Robust String Barcoding
String barcoding is a recently introduced technique for genomic-based
identification of microorganisms. In this paper we describe the engineering of
highly scalable algorithms for robust string barcoding. Our methods enable
distinguisher selection based on whole genomic sequences of hundreds of
microorganisms of up to bacterial size on a well-equipped workstation, and can
be easily parallelized to further extend the applicability range to thousands
of bacterial size genomes. Experimental results on both randomly generated and
NCBI genomic data show that whole-genome based selection results in a number of
distinguishers nearly matching the information theoretic lower bounds for the
problem
On Approximability of Clustering Problems Without Candidate Centers
The k-means objective is arguably the most widely-used cost function for
modeling clustering tasks in a metric space. In practice and historically,
k-means is thought of in a continuous setting, namely where the centers can be
located anywhere in the metric space. For example, the popular Lloyd's
heuristic locates a center at the mean of each cluster.
Despite persistent efforts on understanding the approximability of k-means,
and other classic clustering problems such as k-median and k-minsum, our
knowledge of the hardness of approximation factors of these problems remains
quite poor. In this paper, we significantly improve upon the hardness of
approximation factors known in the literature for these objectives. We show
that if the input lies in a general metric space, it is NP-hard to approximate:
Continuous k-median to a factor of ; this improves upon the
previous inapproximability factor of 1.36 shown by Guha and Khuller (J.
Algorithms '99).
Continuous k-means to a factor of ; this improves upon the
previous inapproximability factor of 2.10 shown by Guha and Khuller (J.
Algorithms '99).
k-minsum to a factor of ; this improves upon the
APX-hardness shown by Guruswami and Indyk (SODA '03).
Our results shed new and perhaps counter-intuitive light on the differences
between clustering problems in the continuous setting versus the discrete
setting (where the candidate centers are given as part of the input)
A Birthday Repetition Theorem and Complexity of Approximating Dense CSPs
A -birthday repetition of a
two-prover game is a game in which the two provers are sent
random sets of questions from of sizes and respectively.
These two sets are sampled independently uniformly among all sets of questions
of those particular sizes. We prove the following birthday repetition theorem:
when satisfies some mild conditions, decreases exponentially in where is the total number of
questions. Our result positively resolves an open question posted by Aaronson,
Impagliazzo and Moshkovitz (CCC 2014).
As an application of our birthday repetition theorem, we obtain new
fine-grained hardness of approximation results for dense CSPs. Specifically, we
establish a tight trade-off between running time and approximation ratio for
dense CSPs by showing conditional lower bounds, integrality gaps and
approximation algorithms. In particular, for any sufficiently large and for
every , we show the following results:
- We exhibit an -approximation algorithm for dense Max -CSPs
with alphabet size via -level of Sherali-Adams relaxation.
- Through our birthday repetition theorem, we obtain an integrality gap of
for -level Lasserre relaxation for fully-dense Max
-CSP.
- Assuming that there is a constant such that Max 3SAT cannot
be approximated to within of the optimal in sub-exponential
time, our birthday repetition theorem implies that any algorithm that
approximates fully-dense Max -CSP to within a factor takes
time, almost tightly matching the algorithmic
result based on Sherali-Adams relaxation.Comment: 45 page
Approximation algorithms for two-machine flow-shop scheduling with a conflict graph
Path cover is a well-known intractable problem that finds a minimum number of
vertex disjoint paths in a given graph to cover all the vertices. We show that
a variant, where the objective function is not the number of paths but the
number of length- paths (that is, isolated vertices), turns out to be
polynomial-time solvable. We further show that another variant, where the
objective function is the total number of length- and length- paths, is
also polynomial-time solvable. Both variants find applications in approximating
the two-machine flow-shop scheduling problem in which job processing has
constraints that are formulated as a conflict graph. For the unit jobs, we
present a -approximation algorithm for the scheduling problem with an
arbitrary conflict graph, based on the exact algorithm for the variants of the
path cover problem. For the arbitrary jobs while the conflict graph is the
union of two disjoint cliques, that is, all the jobs can be partitioned into
two groups such that the jobs in a group are pairwise conflicting, we present a
simple -approximation algorithm.Comment: 15 pages, 2 figure
Greed is Still Good: Maximizing Monotone Submodular+Supermodular Functions
We analyze the performance of the greedy algorithm, and also a discrete
semi-gradient based algorithm, for maximizing the sum of a suBmodular and
suPermodular (BP) function (both of which are non-negative monotone
non-decreasing) under two types of constraints, either a cardinality constraint
or matroid independence constraints. These problems occur naturally
in several real-world applications in data science, machine learning, and
artificial intelligence. The problems are ordinarily inapproximable to any
factor (as we show). Using the curvature of the submodular term, and
introducing for the supermodular term (a natural dual curvature for
supermodular functions), however, both of which are computable in linear time,
we show that BP maximization can be efficiently approximated by both the greedy
and the semi-gradient based algorithm. The algorithms yield multiplicative
guarantees of and
for the two types of constraints
respectively. For pure monotone supermodular constrained maximization, these
yield and for the two types of constraints
respectively. We also analyze the hardness of BP maximization and show that our
guarantees match hardness by a constant factor and by respectively.
Computational experiments are also provided supporting our analysis
New Algorithms and Lower Bounds for Sequential-Access Data Compression
This thesis concerns sequential-access data compression, i.e., by algorithms
that read the input one or more times from beginning to end. In one chapter we
consider adaptive prefix coding, for which we must read the input character by
character, outputting each character's self-delimiting codeword before reading
the next one. We show how to encode and decode each character in constant
worst-case time while producing an encoding whose length is worst-case optimal.
In another chapter we consider one-pass compression with memory bounded in
terms of the alphabet size and context length, and prove a nearly tight
tradeoff between the amount of memory we can use and the quality of the
compression we can achieve. In a third chapter we consider compression in the
read/write streams model, which allows us passes and memory both
polylogarithmic in the size of the input. We first show how to achieve
universal compression using only one pass over one stream. We then show that
one stream is not sufficient for achieving good grammar-based compression.
Finally, we show that two streams are necessary and sufficient for achieving
entropy-only bounds.Comment: draft of PhD thesi
Phylogenetic CSPs are Approximation Resistant
We study the approximability of a broad class of computational problems --
originally motivated in evolutionary biology and phylogenetic reconstruction --
concerning the aggregation of potentially inconsistent (local) information
about items of interest, and we present optimal hardness of approximation
results under the Unique Games Conjecture. The class of problems studied here
can be described as Constraint Satisfaction Problems (CSPs) over infinite
domains, where instead of values or a fixed-size domain, the
variables can be mapped to any of the leaves of a phylogenetic tree. The
topology of the tree then determines whether a given constraint on the
variables is satisfied or not, and the resulting CSPs are called Phylogenetic
CSPs. Prominent examples of Phylogenetic CSPs with a long history and
applications in various disciplines include: Triplet Reconstruction, Quartet
Reconstruction, Subtree Aggregation (Forbidden or Desired). For example, in
Triplet Reconstruction, we are given triplets of the form
(indicating that ``items are more similar to each other than to '')
and we want to construct a hierarchical clustering on the items, that
respects the constraints as much as possible. Despite more than four decades of
research, the basic question of maximizing the number of satisfied constraints
is not well-understood. The current best approximation is achieved by
outputting a random tree (for triplets, this achieves a 1/3 approximation). Our
main result is that every Phylogenetic CSP is approximation resistant, i.e.,
there is no polynomial-time algorithm that does asymptotically better than a
(biased) random assignment. This is a generalization of the results in
Guruswami, Hastad, Manokaran, Raghavendra, and Charikar (2011), who showed that
ordering CSPs are approximation resistant (e.g., Max Acyclic Subgraph,
Betweenness).Comment: 45 pages, 11 figures, Abstract shortened for arxi
- …