Search CORE

7,161 research outputs found

Lower bounds for approximation schemes for Closest String

Author: Cygan Marek
Lokshtanov Daniel
Pilipczuk Marcin
Pilipczuk Michał
Saurabh Saket
Publication venue
Publication date: 18/09/2015
Field of study

In the Closest String problem one is given a family

\mathcal S

of equal-length strings over some fixed alphabet, and the task is to find a string

y

that minimizes the maximum Hamming distance between

y

and a string from

\mathcal S

. While polynomial-time approximation schemes (PTASes) for this problem are known for a long time [Li et al., J. ACM'02], no efficient polynomial-time approximation scheme (EPTAS) has been proposed so far. In this paper, we prove that the existence of an EPTAS for Closest String is in fact unlikely, as it would imply that

\mathrm{FPT}=\mathrm{W}[1]

, a highly unexpected collapse in the hierarchy of parameterized complexity classes. Our proof also shows that the existence of a PTAS for Closest String with running time

f(\varepsilon)\cdot n^{o(1/\varepsilon)}

, for any computable function

f

, would contradict the Exponential Time Hypothesis

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Maximum Scatter TSP in Doubling Metrics

Author: Kozma László
Mömke Tobias
Publication venue
Publication date: 28/06/2016
Field of study

We study the problem of finding a tour of

n

points in which every edge is long. More precisely, we wish to find a tour that visits every point exactly once, maximizing the length of the shortest edge in the tour. The problem is known as Maximum Scatter TSP, and was introduced by Arkin et al. (SODA 1997), motivated by applications in manufacturing and medical imaging. Arkin et al. gave a

0.5

-approximation for the metric version of the problem and showed that this is the best possible ratio achievable in polynomial time (assuming

P \neq NP

). Arkin et al. raised the question of whether a better approximation ratio can be obtained in the Euclidean plane. We answer this question in the affirmative in a more general setting, by giving a

(1-\epsilon)

-approximation algorithm for

d

-dimensional doubling metrics, with running time

\tilde{O}\big(n^3 + 2^{O(K \log K)}\big)

, where

K \leq \left( \frac{13}{\epsilon} \right)^d

. As a corollary we obtain (i) an efficient polynomial-time approximation scheme (EPTAS) for all constant dimensions

d

, (ii) a polynomial-time approximation scheme (PTAS) for dimension

d = \log\log{n}/c

, for a sufficiently large constant

c

, and (iii) a PTAS for constant

d

and

\epsilon = \Omega(1/\log\log{n})

. Furthermore, we show the dependence on

d

in our approximation scheme to be essentially optimal, unless Satisfiability can be solved in subexponential time

arXiv.org e-Print Archive

OPUS Augsburg

On Computing Centroids According to the p-Norms of Hamming Distance Vectors

Author: Chen Jiehua
Hermelin Danny
Sorge Manuel
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 27th Annual European Symposium on Algorithms (ESA 2019)
Publication date: 01/01/2019
Field of study

In this paper we consider the p-Norm Hamming Centroid problem which asks to determine whether some given strings have a centroid with a bound on the p-norm of its Hamming distances to the strings. Specifically, given a set S of strings and a real k, we consider the problem of determining whether there exists a string s^* with (sum_{s in S} d^{p}(s^*,s))^(1/p) <=k, where d(,) denotes the Hamming distance metric. This problem has important applications in data clustering and multi-winner committee elections, and is a generalization of the well-known polynomial-time solvable Consensus String (p=1) problem, as well as the NP-hard Closest String (p=infty) problem. Our main result shows that the problem is NP-hard for all fixed rational p > 1, closing the gap for all rational values of p between 1 and infty. Under standard complexity assumptions the reduction also implies that the problem has no 2^o(n+m)-time or 2^o(k^(p/(p+1)))-time algorithm, where m denotes the number of input strings and n denotes the length of each string, for any fixed p > 1. The first bound matches a straightforward brute-force algorithm. The second bound is tight in the sense that for each fixed epsilon > 0, we provide a 2^(k^(p/((p+1))+epsilon))-time algorithm. In the last part of the paper, we complement our hardness result by presenting a fixed-parameter algorithm and a factor-2 approximation algorithm for the problem

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Hashing for Similarity Search: A Survey

Author: Ji Jianqiu
Shen Heng Tao
Song Jingkuan
Wang Jingdong
Publication venue
Publication date: 13/08/2014
Field of study

Similarity search (nearest neighbor search) is a problem of pursuing the data items whose distances to a query item are the smallest from a large database. Various methods have been developed to address this problem, and recently a lot of efforts have been devoted to approximate search. In this paper, we present a survey on one of the main solutions, hashing, which has been widely studied since the pioneering work locality sensitive hashing. We divide the hashing algorithms two main categories: locality sensitive hashing, which designs hash functions without exploring the data distribution and learning to hash, which learns hash functions according the data distribution, and review them from various aspects, including hash function design and distance measure and search scheme in the hash coding space

arXiv.org e-Print Archive

CiteSeerX

Online Multistage Subset Maximization Problems

Author: Bampis Evripidis
Escoffier Bruno
Schewior Kevin
Teiller Alexandre
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 27th Annual European Symposium on Algorithms (ESA 2019)
Publication date: 01/01/2019
Field of study

Numerous combinatorial optimization problems (knapsack, maximum-weight matching, etc.) can be expressed as subset maximization problems: One is given a ground set N={1,...,n}, a collection F subseteq 2^N of subsets thereof such that the empty set is in F, and an objective (profit) function p: F -> R_+. The task is to choose a set S in F that maximizes p(S). We consider the multistage version (Eisenstat et al., Gupta et al., both ICALP 2014) of such problems: The profit function p_t (and possibly the set of feasible solutions F_t) may change over time. Since in many applications changing the solution is costly, the task becomes to find a sequence of solutions that optimizes the trade-off between good per-time solutions and stable solutions taking into account an additional similarity bonus. As similarity measure for two consecutive solutions, we consider either the size of the intersection of the two solutions or the difference of n and the Hamming distance between the two characteristic vectors. We study multistage subset maximization problems in the online setting, that is, p_t (along with possibly F_t) only arrive one by one and, upon such an arrival, the online algorithm has to output the corresponding solution without knowledge of the future. We develop general techniques for online multistage subset maximization and thereby characterize those models (given by the type of data evolution and the type of similarity measure) that admit a constant-competitive online algorithm. When no constant competitive ratio is possible, we employ lookahead to circumvent this issue. When a constant competitive ratio is possible, we provide almost matching lower and upper bounds on the best achievable one

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server