13 research outputs found
Hardness Amplification of Optimization Problems
In this paper, we prove a general hardness amplification scheme for optimization problems based on the technique of direct products.
We say that an optimization problem ? is direct product feasible if it is possible to efficiently aggregate any k instances of ? and form one large instance of ? such that given an optimal feasible solution to the larger instance, we can efficiently find optimal feasible solutions to all the k smaller instances. Given a direct product feasible optimization problem ?, our hardness amplification theorem may be informally stated as follows:
If there is a distribution D over instances of ? of size n such that every randomized algorithm running in time t(n) fails to solve ? on 1/?(n) fraction of inputs sampled from D, then, assuming some relationships on ?(n) and t(n), there is a distribution D\u27 over instances of ? of size O(n??(n)) such that every randomized algorithm running in time t(n)/poly(?(n)) fails to solve ? on 99/100 fraction of inputs sampled from D\u27.
As a consequence of the above theorem, we show hardness amplification of problems in various classes such as NP-hard problems like Max-Clique, Knapsack, and Max-SAT, problems in P such as Longest Common Subsequence, Edit Distance, Matrix Multiplication, and even problems in TFNP such as Factoring and computing Nash equilibrium
Towards a General Direct Product Testing Theorem
The Direct Product encoding of a string a in {0,1}^n on an underlying domain V subseteq ([n] choose k), is a function DP_V(a) which gets as input a set S in V and outputs a restricted to S. In the Direct Product Testing Problem, we are given a function F:V -> {0,1}^k, and our goal is to test whether F is close to a direct product encoding, i.e., whether there exists some a in {0,1}^n such that on most sets S, we have F(S)=DP_V(a)(S). A natural test is as follows: select a pair (S,S\u27)in V according to some underlying distribution over V x V, query F on this pair, and check for consistency on their intersection. Note that the above distribution may be viewed as a weighted graph over the vertex set V and is referred to as a test graph.
The testability of direct products was studied over various domains and test graphs: Dinur and Steurer (CCC \u2714) analyzed it when V equals the k-th slice of the Boolean hypercube and the test graph is a member of the Johnson graph family. Dinur and Kaufman (FOCS \u2717) analyzed it for the case where V is the set of faces of a Ramanujan complex, where in this case V=O_k(n). In this paper, we study the testability of direct products in a general setting, addressing the question: what properties of the domain and the test graph allow one to prove a direct product testing theorem?
Towards this goal we introduce the notion of coordinate expansion of a test graph. Roughly speaking a test graph is a coordinate expander if it has global and local expansion, and has certain nice intersection properties on sampling. We show that whenever the test graph has coordinate expansion then it admits a direct product testing theorem. Additionally, for every k and n we provide a direct product domain V subseteq (n choose k) of size n, called the Sliding Window domain for which we prove direct product testability
Approximating Edit Distance Within Constant Factor in Truly Sub-Quadratic Time
Edit distance is a measure of similarity of two strings based on the minimum
number of character insertions, deletions, and substitutions required to
transform one string into the other. The edit distance can be computed exactly
using a dynamic programming algorithm that runs in quadratic time. Andoni,
Krauthgamer and Onak (2010) gave a nearly linear time algorithm that
approximates edit distance within approximation factor .
In this paper, we provide an algorithm with running time
that approximates the edit distance within a constant
factor
An Algorithmic Bridge Between Hamming and Levenshtein Distances
The edit distance between strings classically assigns unit cost to every
character insertion, deletion, and substitution, whereas the Hamming distance
only allows substitutions. In many real-life scenarios, insertions and
deletions (abbreviated indels) appear frequently but significantly less so than
substitutions. To model this, we consider substitutions being cheaper than
indels, with cost for a parameter . This basic variant, denoted
, bridges classical edit distance () with Hamming distance
(), leading to interesting algorithmic challenges: Does the time
complexity of computing interpolate between that of Hamming distance
(linear time) and edit distance (quadratic time)? What about approximating
?
We first present a simple deterministic exact algorithm for and
further prove that it is near-optimal assuming the Orthogonal Vectors
Conjecture. Our main result is a randomized algorithm computing a
-approximation of , given strings of total
length and a bound . For simplicity, let us focus on and a constant ; then, our algorithm takes time. Unless and for small enough , this running
time is sublinear in . We also consider a very natural version that asks to
find a -alignment -- an alignment with at most indels and
substitutions. In this setting, we give an exact algorithm and, more
importantly, an -time
-bicriteria approximation algorithm. The latter solution is
based on the techniques we develop for for . These
bounds are in stark contrast to unit-cost edit distance, where state-of-the-art
algorithms are far from achieving -approximation in sublinear
time, even for a favorable choice of .Comment: The full version of a paper accepted to ITCS 2023; abstract shortened
to meet arXiv requirement
Gap Edit Distance via Non-Adaptive Queries: Simple and Optimal
We study the problem of approximating edit distance in sublinear time. This
is formalized as a promise problem -Gap Edit Distance, where the input
is a pair of strings and parameters , and the goal is to return
YES if and NO if . Recent years have witnessed
significant interest in designing sublinear-time algorithms for Gap Edit
Distance.
We resolve the non-adaptive query complexity of Gap Edit Distance, improving
over several previous results. Specifically, we design a non-adaptive algorithm
with query complexity , and further prove that
this bound is optimal up to polylogarithmic factors.
Our algorithm also achieves optimal time complexity
whenever . For , the
running time of our algorithm is . For the
restricted case of , this matches a known result [Batu, Erg\"un,
Kilian, Magen, Raskhodnikova, Rubinfeld, and Sami, STOC 2003], and in all other
(nontrivial) cases, our running time is strictly better than all previous
algorithms, including the adaptive ones
Can You Solve Closest String Faster than Exhaustive Search?
We study the fundamental problem of finding the best string to represent a
given set, in the form of the Closest String problem: Given a set of strings, find the string minimizing the radius of the
smallest Hamming ball around that encloses all the strings in . In
this paper, we investigate whether the Closest String problem admits algorithms
that are faster than the trivial exhaustive search algorithm. We obtain the
following results for the two natural versions of the problem:
In the continuous Closest String problem, the goal is to find the
solution string anywhere in . For binary strings, the
exhaustive search algorithm runs in time and we prove that it
cannot be improved to time , for any , unless the Strong Exponential Time Hypothesis fails.
In the discrete Closest String problem, is required to be in
the input set . While this problem is clearly in polynomial time, its
fine-grained complexity has been pinpointed to be quadratic time whenever the dimension is . We complement
this known hardness result with new algorithms, proving essentially that
whenever falls out of this hard range, the discrete Closest String problem
can be solved faster than exhaustive search. In the small- regime, our
algorithm is based on a novel application of the inclusion-exclusion principle.
Interestingly, all of our results apply (and some are even stronger) to the
natural dual of the Closest String problem, called the Remotest String problem,
where the task is to find a string maximizing the Hamming distance to all the
strings in