Search CORE

1,388 research outputs found

Lower bounds for approximation schemes for Closest String

Author: Cygan Marek
Lokshtanov Daniel
Pilipczuk Marcin
Pilipczuk Michał
Saurabh Saket
Publication venue
Publication date: 18/09/2015
Field of study

In the Closest String problem one is given a family

\mathcal S

of equal-length strings over some fixed alphabet, and the task is to find a string

y

that minimizes the maximum Hamming distance between

y

and a string from

\mathcal S

. While polynomial-time approximation schemes (PTASes) for this problem are known for a long time [Li et al., J. ACM'02], no efficient polynomial-time approximation scheme (EPTAS) has been proposed so far. In this paper, we prove that the existence of an EPTAS for Closest String is in fact unlikely, as it would imply that

\mathrm{FPT}=\mathrm{W}[1]

, a highly unexpected collapse in the hierarchy of parameterized complexity classes. Our proof also shows that the existence of a PTAS for Closest String with running time

f(\varepsilon)\cdot n^{o(1/\varepsilon)}

, for any computable function

f

, would contradict the Exponential Time Hypothesis

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Approximate Hamming distance in a stream

Author: Clifford Raphael
Starikovskaya Tatiana
Publication venue
Publication date: 01/01/2016
Field of study

We consider the problem of computing a

(1+\epsilon)

-approximation of the Hamming distance between a pattern of length

n

and successive substrings of a stream. We first look at the one-way randomised communication complexity of this problem, giving Alice the first half of the stream and Bob the second half. We show the following: (1) If Alice and Bob both share the pattern then there is an

O(\epsilon^{-4} \log^2 n)

bit randomised one-way communication protocol. (2) If only Alice has the pattern then there is an

O(\epsilon^{-2}\sqrt{n}\log n)

bit randomised one-way communication protocol. We then go on to develop small space streaming algorithms for

(1+\epsilon)

-approximate Hamming distance which give worst case running time guarantees per arriving symbol. (1) For binary input alphabets there is an

O(\epsilon^{-3} \sqrt{n} \log^{2} n)

space and

O(\epsilon^{-2} \log{n})

time streaming

(1+\epsilon)

-approximate Hamming distance algorithm. (2) For general input alphabets there is an

O(\epsilon^{-5} \sqrt{n} \log^{4} n)

space and

O(\epsilon^{-4} \log^3 {n})

time streaming

(1+\epsilon)

-approximate Hamming distance algorithm.Comment: Submitted to ICALP' 201

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Explore Bristol Research

Fast Exact Search in Hamming Space with Multi-Index Hashing

Author: Fleet David J.
Norouzi Mohammad
Punjani Ali
Publication venue
Publication date: 24/04/2014
Field of study

There is growing interest in representing image data and feature descriptors using compact binary codes for fast near neighbor search. Although binary codes are motivated by their use as direct indices (addresses) into a hash table, codes longer than 32 bits are not being used as such, as it was thought to be ineffective. We introduce a rigorous way to build multiple hash tables on binary code substrings that enables exact k-nearest neighbor search in Hamming space. The approach is storage efficient and straightforward to implement. Theoretical analysis shows that the algorithm exhibits sub-linear run-time behavior for uniformly distributed codes. Empirical results show dramatic speedups over a linear scan baseline for datasets of up to one billion codes of 64, 128, or 256 bits

arXiv.org e-Print Archive

CiteSeerX

On Computing Centroids According to the p-Norms of Hamming Distance Vectors

Author: Chen Jiehua
Hermelin Danny
Sorge Manuel
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 27th Annual European Symposium on Algorithms (ESA 2019)
Publication date: 01/01/2019
Field of study

In this paper we consider the p-Norm Hamming Centroid problem which asks to determine whether some given strings have a centroid with a bound on the p-norm of its Hamming distances to the strings. Specifically, given a set S of strings and a real k, we consider the problem of determining whether there exists a string s^* with (sum_{s in S} d^{p}(s^*,s))^(1/p) <=k, where d(,) denotes the Hamming distance metric. This problem has important applications in data clustering and multi-winner committee elections, and is a generalization of the well-known polynomial-time solvable Consensus String (p=1) problem, as well as the NP-hard Closest String (p=infty) problem. Our main result shows that the problem is NP-hard for all fixed rational p > 1, closing the gap for all rational values of p between 1 and infty. Under standard complexity assumptions the reduction also implies that the problem has no 2^o(n+m)-time or 2^o(k^(p/(p+1)))-time algorithm, where m denotes the number of input strings and n denotes the length of each string, for any fixed p > 1. The first bound matches a straightforward brute-force algorithm. The second bound is tight in the sense that for each fixed epsilon > 0, we provide a 2^(k^(p/((p+1))+epsilon))-time algorithm. In the last part of the paper, we complement our hardness result by presenting a fixed-parameter algorithm and a factor-2 approximation algorithm for the problem

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

An Efficient Rank Based Approach for Closest String and Closest Substring

Author: A Ben-Dor
A Dinu
AS Fraser
AS Fraser
AWC Liew
C de la Higuera
Chuhsing Kate Hsiao
DJ States
EV Koonin
F Nicolas
F Nicolas
J Gramm
J Palmer
JC Wooley
K Lanctot
L Schmitt
L Wang
Liviu P. Dinu
LP Dinu
LP Dinu
LP Dinu
LP Dinu
LP Dinu
LP Dinu
M Chimani
M Frances
M Karpovsky
M Li
P Diaconis
R Holmquist
Radu Ionescu
S Roman
VI Levenshtein
VY Popov
W Banzhaf
X Deng
X Liu
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

This paper aims to present a new genetic approach that uses rank distance for solving two known NP-hard problems, and to compare rank distance with other distance measures for strings. The two NP-hard problems we are trying to solve are closest string and closest substring. For each problem we build a genetic algorithm and we describe the genetic operations involved. Both genetic algorithms use a fitness function based on rank distance. We compare our algorithms with other genetic algorithms that use different distance measures, such as Hamming distance or Levenshtein distance, on real DNA sequences. Our experiments show that the genetic algorithms based on rank distance have the best results

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Accurate long read mapping using enhanced suffix arrays

Author: Dawyndt Peter
De Schrijver Joachim
Fack Veerle
Van Criekinge Wim
Vyverman Michaël
Publication venue: 'Scitepress'
Publication date: 01/01/2010
Field of study

With the rise of high throughput sequencing, new programs have been developed for dealing with the alignment of a huge amount of short read data to reference genomes. Recent developments in sequencing technology allow longer reads, but the mappers for short reads are not suited for reads of several hundreds of base pairs. We propose an algorithm for mapping longer reads, which is based on chaining maximal exact matches and uses heuristics and the Needleman-Wunsch algorithm to bridge the gaps. To compute maximal exact matches we use a specialized index structure, called enhanced suffix array. The proposed algorithm is very accurate and can handle large reads with mutations and long insertions and deletions

Ghent University Academic Bibliography

Consensus Strings with Small Maximum Distance and Small Distance Sum

Author: Bulteau Laurent
Schmid Markus L.
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 43rd International Symposium on Mathematical Foundations of Computer Science (MFCS 2018)
Publication date: 01/01/2018
Field of study

The parameterised complexity of consensus string problems (Closest String, Closest Substring, Closest String with Outliers) is investigated in a more general setting, i. e., with a bound on the maximum Hamming distance and a bound on the sum of Hamming distances between solution and input strings. We completely settle the parameterised complexity of these generalised variants of Closest String and Closest Substring, and partly for Closest String with Outliers; in addition, we answer some open questions from the literature regarding the classical problem variants with only one distance bound. Finally, we investigate the question of polynomial kernels and respective lower bounds

HAL Descartes

Dagstuhl Research Online Publication Server

Hal-Diderot

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM