7,161 research outputs found
Lower bounds for approximation schemes for Closest String
In the Closest String problem one is given a family of
equal-length strings over some fixed alphabet, and the task is to find a string
that minimizes the maximum Hamming distance between and a string from
. While polynomial-time approximation schemes (PTASes) for this
problem are known for a long time [Li et al., J. ACM'02], no efficient
polynomial-time approximation scheme (EPTAS) has been proposed so far. In this
paper, we prove that the existence of an EPTAS for Closest String is in fact
unlikely, as it would imply that , a highly
unexpected collapse in the hierarchy of parameterized complexity classes. Our
proof also shows that the existence of a PTAS for Closest String with running
time , for any computable function
, would contradict the Exponential Time Hypothesis
Maximum Scatter TSP in Doubling Metrics
We study the problem of finding a tour of points in which every edge is
long. More precisely, we wish to find a tour that visits every point exactly
once, maximizing the length of the shortest edge in the tour. The problem is
known as Maximum Scatter TSP, and was introduced by Arkin et al. (SODA 1997),
motivated by applications in manufacturing and medical imaging. Arkin et al.
gave a -approximation for the metric version of the problem and showed
that this is the best possible ratio achievable in polynomial time (assuming ). Arkin et al. raised the question of whether a better approximation
ratio can be obtained in the Euclidean plane.
We answer this question in the affirmative in a more general setting, by
giving a -approximation algorithm for -dimensional doubling
metrics, with running time , where . As a corollary we obtain (i) an
efficient polynomial-time approximation scheme (EPTAS) for all constant
dimensions , (ii) a polynomial-time approximation scheme (PTAS) for
dimension , for a sufficiently large constant , and (iii)
a PTAS for constant and . Furthermore, we
show the dependence on in our approximation scheme to be essentially
optimal, unless Satisfiability can be solved in subexponential time
On Computing Centroids According to the p-Norms of Hamming Distance Vectors
In this paper we consider the p-Norm Hamming Centroid problem which asks to determine whether some given strings have a centroid with a bound on the p-norm of its Hamming distances to the strings. Specifically, given a set S of strings and a real k, we consider the problem of determining whether there exists a string s^* with (sum_{s in S} d^{p}(s^*,s))^(1/p) <=k, where d(,) denotes the Hamming distance metric. This problem has important applications in data clustering and multi-winner committee elections, and is a generalization of the well-known polynomial-time solvable Consensus String (p=1) problem, as well as the NP-hard Closest String (p=infty) problem.
Our main result shows that the problem is NP-hard for all fixed rational p > 1, closing the gap for all rational values of p between 1 and infty. Under standard complexity assumptions the reduction also implies that the problem has no 2^o(n+m)-time or 2^o(k^(p/(p+1)))-time algorithm, where m denotes the number of input strings and n denotes the length of each string, for any fixed p > 1. The first bound matches a straightforward brute-force algorithm. The second bound is tight in the sense that for each fixed epsilon > 0, we provide a 2^(k^(p/((p+1))+epsilon))-time algorithm. In the last part of the paper, we complement our hardness result by presenting a fixed-parameter algorithm and a factor-2 approximation algorithm for the problem
Hashing for Similarity Search: A Survey
Similarity search (nearest neighbor search) is a problem of pursuing the data
items whose distances to a query item are the smallest from a large database.
Various methods have been developed to address this problem, and recently a lot
of efforts have been devoted to approximate search. In this paper, we present a
survey on one of the main solutions, hashing, which has been widely studied
since the pioneering work locality sensitive hashing. We divide the hashing
algorithms two main categories: locality sensitive hashing, which designs hash
functions without exploring the data distribution and learning to hash, which
learns hash functions according the data distribution, and review them from
various aspects, including hash function design and distance measure and search
scheme in the hash coding space
Online Multistage Subset Maximization Problems
Numerous combinatorial optimization problems (knapsack, maximum-weight matching, etc.) can be expressed as subset maximization problems: One is given a ground set N={1,...,n}, a collection F subseteq 2^N of subsets thereof such that the empty set is in F, and an objective (profit) function p: F -> R_+. The task is to choose a set S in F that maximizes p(S). We consider the multistage version (Eisenstat et al., Gupta et al., both ICALP 2014) of such problems: The profit function p_t (and possibly the set of feasible solutions F_t) may change over time. Since in many applications changing the solution is costly, the task becomes to find a sequence of solutions that optimizes the trade-off between good per-time solutions and stable solutions taking into account an additional similarity bonus. As similarity measure for two consecutive solutions, we consider either the size of the intersection of the two solutions or the difference of n and the Hamming distance between the two characteristic vectors.
We study multistage subset maximization problems in the online setting, that is, p_t (along with possibly F_t) only arrive one by one and, upon such an arrival, the online algorithm has to output the corresponding solution without knowledge of the future.
We develop general techniques for online multistage subset maximization and thereby characterize those models (given by the type of data evolution and the type of similarity measure) that admit a constant-competitive online algorithm. When no constant competitive ratio is possible, we employ lookahead to circumvent this issue. When a constant competitive ratio is possible, we provide almost matching lower and upper bounds on the best achievable one
- âŠ