Search CORE

5,359 research outputs found

On Computing Centroids According to the p-Norms of Hamming Distance Vectors

Author: Chen Jiehua
Hermelin Danny
Sorge Manuel
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 27th Annual European Symposium on Algorithms (ESA 2019)
Publication date: 01/01/2019
Field of study

In this paper we consider the p-Norm Hamming Centroid problem which asks to determine whether some given strings have a centroid with a bound on the p-norm of its Hamming distances to the strings. Specifically, given a set S of strings and a real k, we consider the problem of determining whether there exists a string s^* with (sum_{s in S} d^{p}(s^*,s))^(1/p) <=k, where d(,) denotes the Hamming distance metric. This problem has important applications in data clustering and multi-winner committee elections, and is a generalization of the well-known polynomial-time solvable Consensus String (p=1) problem, as well as the NP-hard Closest String (p=infty) problem. Our main result shows that the problem is NP-hard for all fixed rational p > 1, closing the gap for all rational values of p between 1 and infty. Under standard complexity assumptions the reduction also implies that the problem has no 2^o(n+m)-time or 2^o(k^(p/(p+1)))-time algorithm, where m denotes the number of input strings and n denotes the length of each string, for any fixed p > 1. The first bound matches a straightforward brute-force algorithm. The second bound is tight in the sense that for each fixed epsilon > 0, we provide a 2^(k^(p/((p+1))+epsilon))-time algorithm. In the last part of the paper, we complement our hardness result by presenting a fixed-parameter algorithm and a factor-2 approximation algorithm for the problem

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Lower bounds for approximation schemes for Closest String

Author: Cygan Marek
Lokshtanov Daniel
Pilipczuk Marcin
Pilipczuk Michał
Saurabh Saket
Publication venue
Publication date: 18/09/2015
Field of study

In the Closest String problem one is given a family

\mathcal S

of equal-length strings over some fixed alphabet, and the task is to find a string

y

that minimizes the maximum Hamming distance between

y

and a string from

\mathcal S

. While polynomial-time approximation schemes (PTASes) for this problem are known for a long time [Li et al., J. ACM'02], no efficient polynomial-time approximation scheme (EPTAS) has been proposed so far. In this paper, we prove that the existence of an EPTAS for Closest String is in fact unlikely, as it would imply that

\mathrm{FPT}=\mathrm{W}[1]

, a highly unexpected collapse in the hierarchy of parameterized complexity classes. Our proof also shows that the existence of a PTAS for Closest String with running time

f(\varepsilon)\cdot n^{o(1/\varepsilon)}

, for any computable function

f

, would contradict the Exponential Time Hypothesis

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Distributed Computing with Adaptive Heuristics

Author: Jaggard Aaron D.
Schapira Michael
Wright Rebecca N.
Publication venue
Publication date: 12/10/2010
Field of study

We use ideas from distributed computing to study dynamic environments in which computational nodes, or decision makers, follow adaptive heuristics (Hart 2005), i.e., simple and unsophisticated rules of behavior, e.g., repeatedly "best replying" to others' actions, and minimizing "regret", that have been extensively studied in game theory and economics. We explore when convergence of such simple dynamics to an equilibrium is guaranteed in asynchronous computational environments, where nodes can act at any time. Our research agenda, distributed computing with adaptive heuristics, lies on the borderline of computer science (including distributed computing and learning) and game theory (including game dynamics and adaptive heuristics). We exhibit a general non-termination result for a broad class of heuristics with bounded recall---that is, simple rules of behavior that depend only on recent history of interaction between nodes. We consider implications of our result across a wide variety of interesting and timely applications: game theory, circuit design, social networks, routing and congestion control. We also study the computational and communication complexity of asynchronous dynamics and present some basic observations regarding the effects of asynchrony on no-regret dynamics. We believe that our work opens a new avenue for research in both distributed computing and game theory.Comment: 36 pages, four figures. Expands both technical results and discussion of v1. Revised version will appear in the proceedings of Innovations in Computer Science 201

arXiv.org e-Print Archive

CiteSeerX

Complexity of Combinatorial Matrix Completion With Diameter Constraints

Author: Froese Vincent
Koana Tomohiro
Niedermeier Rolf
Publication venue
Publication date: 12/02/2020
Field of study

We thoroughly study a novel and still basic combinatorial matrix completion problem: Given a binary incomplete matrix, fill in the missing entries so that the resulting matrix has a specified maximum diameter (that is, upper-bounding the maximum Hamming distance between any two rows of the completed matrix) as well as a specified minimum Hamming distance between any two of the matrix rows. This scenario is closely related to consensus string problems as well as to recently studied clustering problems on incomplete data. We obtain an almost complete complexity dichotomy between polynomial-time solvable and NP-hard cases in terms of the minimum distance lower bound and the number of missing entries per row of the incomplete matrix. Further, we develop polynomial-time algorithms for maximum diameter three, which are based on Deza's theorem from extremal set theory. On the negative side we prove NP-hardness for diameter at least four. For the parameter number of missing entries per row, we show polynomial-time solvability when there is only one missing entry and NP-hardness when there can be at least two missing entries. In general, our algorithms heavily rely on Deza's theorem and the correspondingly identified sunflower structures pave the way towards solutions based on computing graph factors and solving 2-SAT instances

arXiv.org e-Print Archive

On the Complexity of the Single Individual SNP Haplotyping Problem

Author: Cilibrasi Rudi
Kelk Steven
Tromp John
van Iersel Leo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2005
Field of study

We present several new results pertaining to haplotyping. These results concern the combinatorial problem of reconstructing haplotypes from incomplete and/or imperfectly sequenced haplotype fragments. We consider the complexity of the problems Minimum Error Correction (MEC) and Longest Haplotype Reconstruction (LHR) for different restrictions on the input data. Specifically, we look at the gapless case, where every row of the input corresponds to a gapless haplotype-fragment, and the 1-gap case, where at most one gap per fragment is allowed. We prove that MEC is APX-hard in the 1-gap case and still NP-hard in the gapless case. In addition, we question earlier claims that MEC is NP-hard even when the input matrix is restricted to being completely binary. Concerning LHR, we show that this problem is NP-hard and APX-hard in the 1-gap case (and thus also in the general case), but is polynomial time solvable in the gapless case.Comment: 26 pages. Related to the WABI2005 paper, "On the Complexity of Several Haplotyping Problems", but with more/different results. This papers has just been submitted to the IEEE/ACM Transactions on Computational Biology and Bioinformatics and we are awaiting a decision on acceptance. It differs from the mid-August version of this paper because here we prove that 1-gap LHR is APX-hard. (In the earlier version of the paper we could prove only that it was NP-hard.

arXiv.org e-Print Archive

CiteSeerX

Repository TU/e

Pattern Matching and Consensus Problems on Weighted Sequences and Profiles

Author: Kociumaka Tomasz
Pissis Solon P.
Radoszewski Jakub
Publication venue
Publication date: 01/01/2016
Field of study

We study pattern matching problems on two major representations of uncertain sequences used in molecular biology: weighted sequences (also known as position weight matrices, PWM) and profiles (i.e., scoring matrices). In the simple version, in which only the pattern or only the text is uncertain, we obtain efficient algorithms with theoretically-provable running times using a variation of the lookahead scoring technique. We also consider a general variant of the pattern matching problems in which both the pattern and the text are uncertain. Central to our solution is a special case where the sequences have equal length, called the consensus problem. We propose algorithms for the consensus problem parameterized by the number of strings that match one of the sequences. As our basic approach, a careful adaptation of the classic meet-in-the-middle algorithm for the knapsack problem is used. On the lower bound side, we prove that our dependence on the parameter is optimal up to lower-order terms conditioned on the optimality of the original algorithm for the knapsack problem.Comment: 22 page

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

King's Research Portal

Multivariate Fine-Grained Complexity of Longest Common Subsequence

Author: Bringmann Karl
Künnemann Marvin
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2018
Field of study

We revisit the classic combinatorial pattern matching problem of finding a longest common subsequence (LCS). For strings

x

and

y

of length

n

, a textbook algorithm solves LCS in time

O(n^2)

, but although much effort has been spent, no

O(n^{2-\varepsilon})

-time algorithm is known. Recent work indeed shows that such an algorithm would refute the Strong Exponential Time Hypothesis (SETH) [Abboud, Backurs, Vassilevska Williams + Bringmann, K\"unnemann FOCS'15]. Despite the quadratic-time barrier, for over 40 years an enduring scientific interest continued to produce fast algorithms for LCS and its variations. Particular attention was put into identifying and exploiting input parameters that yield strongly subquadratic time algorithms for special cases of interest, e.g., differential file comparison. This line of research was successfully pursued until 1990, at which time significant improvements came to a halt. In this paper, using the lens of fine-grained complexity, our goal is to (1) justify the lack of further improvements and (2) determine whether some special cases of LCS admit faster algorithms than currently known. To this end, we provide a systematic study of the multivariate complexity of LCS, taking into account all parameters previously discussed in the literature: the input size

n:=\max\{|x|,|y|\}

, the length of the shorter string

m:=\min\{|x|,|y|\}

, the length

L

of an LCS of

x

and

y

, the numbers of deletions

\delta := m-L

and

\Delta := n-L

, the alphabet size, as well as the numbers of matching pairs

M

and dominant pairs

d

. For any class of instances defined by fixing each parameter individually to a polynomial in terms of the input size, we prove a SETH-based lower bound matching one of three known algorithms. Specifically, we determine the optimal running time for LCS under SETH as

(n+\min\{d, \delta \Delta, \delta m\})^{1\pm o(1)}

. [...]Comment: Presented at SODA'18. Full Version. 66 page

arXiv.org e-Print Archive