7,161 research outputs found

    Lower bounds for approximation schemes for Closest String

    Get PDF
    In the Closest String problem one is given a family S\mathcal S of equal-length strings over some fixed alphabet, and the task is to find a string yy that minimizes the maximum Hamming distance between yy and a string from S\mathcal S. While polynomial-time approximation schemes (PTASes) for this problem are known for a long time [Li et al., J. ACM'02], no efficient polynomial-time approximation scheme (EPTAS) has been proposed so far. In this paper, we prove that the existence of an EPTAS for Closest String is in fact unlikely, as it would imply that FPT=W[1]\mathrm{FPT}=\mathrm{W}[1], a highly unexpected collapse in the hierarchy of parameterized complexity classes. Our proof also shows that the existence of a PTAS for Closest String with running time f(Δ)⋅no(1/Δ)f(\varepsilon)\cdot n^{o(1/\varepsilon)}, for any computable function ff, would contradict the Exponential Time Hypothesis

    Maximum Scatter TSP in Doubling Metrics

    Full text link
    We study the problem of finding a tour of nn points in which every edge is long. More precisely, we wish to find a tour that visits every point exactly once, maximizing the length of the shortest edge in the tour. The problem is known as Maximum Scatter TSP, and was introduced by Arkin et al. (SODA 1997), motivated by applications in manufacturing and medical imaging. Arkin et al. gave a 0.50.5-approximation for the metric version of the problem and showed that this is the best possible ratio achievable in polynomial time (assuming P≠NPP \neq NP). Arkin et al. raised the question of whether a better approximation ratio can be obtained in the Euclidean plane. We answer this question in the affirmative in a more general setting, by giving a (1−ϔ)(1-\epsilon)-approximation algorithm for dd-dimensional doubling metrics, with running time O~(n3+2O(Klog⁥K))\tilde{O}\big(n^3 + 2^{O(K \log K)}\big), where K≀(13Ï”)dK \leq \left( \frac{13}{\epsilon} \right)^d. As a corollary we obtain (i) an efficient polynomial-time approximation scheme (EPTAS) for all constant dimensions dd, (ii) a polynomial-time approximation scheme (PTAS) for dimension d=log⁥log⁥n/cd = \log\log{n}/c, for a sufficiently large constant cc, and (iii) a PTAS for constant dd and Ï”=Ω(1/log⁥log⁥n)\epsilon = \Omega(1/\log\log{n}). Furthermore, we show the dependence on dd in our approximation scheme to be essentially optimal, unless Satisfiability can be solved in subexponential time

    On Computing Centroids According to the p-Norms of Hamming Distance Vectors

    Get PDF
    In this paper we consider the p-Norm Hamming Centroid problem which asks to determine whether some given strings have a centroid with a bound on the p-norm of its Hamming distances to the strings. Specifically, given a set S of strings and a real k, we consider the problem of determining whether there exists a string s^* with (sum_{s in S} d^{p}(s^*,s))^(1/p) <=k, where d(,) denotes the Hamming distance metric. This problem has important applications in data clustering and multi-winner committee elections, and is a generalization of the well-known polynomial-time solvable Consensus String (p=1) problem, as well as the NP-hard Closest String (p=infty) problem. Our main result shows that the problem is NP-hard for all fixed rational p > 1, closing the gap for all rational values of p between 1 and infty. Under standard complexity assumptions the reduction also implies that the problem has no 2^o(n+m)-time or 2^o(k^(p/(p+1)))-time algorithm, where m denotes the number of input strings and n denotes the length of each string, for any fixed p > 1. The first bound matches a straightforward brute-force algorithm. The second bound is tight in the sense that for each fixed epsilon > 0, we provide a 2^(k^(p/((p+1))+epsilon))-time algorithm. In the last part of the paper, we complement our hardness result by presenting a fixed-parameter algorithm and a factor-2 approximation algorithm for the problem

    Hashing for Similarity Search: A Survey

    Full text link
    Similarity search (nearest neighbor search) is a problem of pursuing the data items whose distances to a query item are the smallest from a large database. Various methods have been developed to address this problem, and recently a lot of efforts have been devoted to approximate search. In this paper, we present a survey on one of the main solutions, hashing, which has been widely studied since the pioneering work locality sensitive hashing. We divide the hashing algorithms two main categories: locality sensitive hashing, which designs hash functions without exploring the data distribution and learning to hash, which learns hash functions according the data distribution, and review them from various aspects, including hash function design and distance measure and search scheme in the hash coding space

    Online Multistage Subset Maximization Problems

    Get PDF
    Numerous combinatorial optimization problems (knapsack, maximum-weight matching, etc.) can be expressed as subset maximization problems: One is given a ground set N={1,...,n}, a collection F subseteq 2^N of subsets thereof such that the empty set is in F, and an objective (profit) function p: F -> R_+. The task is to choose a set S in F that maximizes p(S). We consider the multistage version (Eisenstat et al., Gupta et al., both ICALP 2014) of such problems: The profit function p_t (and possibly the set of feasible solutions F_t) may change over time. Since in many applications changing the solution is costly, the task becomes to find a sequence of solutions that optimizes the trade-off between good per-time solutions and stable solutions taking into account an additional similarity bonus. As similarity measure for two consecutive solutions, we consider either the size of the intersection of the two solutions or the difference of n and the Hamming distance between the two characteristic vectors. We study multistage subset maximization problems in the online setting, that is, p_t (along with possibly F_t) only arrive one by one and, upon such an arrival, the online algorithm has to output the corresponding solution without knowledge of the future. We develop general techniques for online multistage subset maximization and thereby characterize those models (given by the type of data evolution and the type of similarity measure) that admit a constant-competitive online algorithm. When no constant competitive ratio is possible, we employ lookahead to circumvent this issue. When a constant competitive ratio is possible, we provide almost matching lower and upper bounds on the best achievable one
    • 

    corecore