24,633 research outputs found

    Lower bounds for approximation schemes for Closest String

    Get PDF
    In the Closest String problem one is given a family S\mathcal S of equal-length strings over some fixed alphabet, and the task is to find a string yy that minimizes the maximum Hamming distance between yy and a string from S\mathcal S. While polynomial-time approximation schemes (PTASes) for this problem are known for a long time [Li et al., J. ACM'02], no efficient polynomial-time approximation scheme (EPTAS) has been proposed so far. In this paper, we prove that the existence of an EPTAS for Closest String is in fact unlikely, as it would imply that FPT=W[1]\mathrm{FPT}=\mathrm{W}[1], a highly unexpected collapse in the hierarchy of parameterized complexity classes. Our proof also shows that the existence of a PTAS for Closest String with running time f(Δ)⋅no(1/Δ)f(\varepsilon)\cdot n^{o(1/\varepsilon)}, for any computable function ff, would contradict the Exponential Time Hypothesis

    On Computing Centroids According to the p-Norms of Hamming Distance Vectors

    Get PDF
    In this paper we consider the p-Norm Hamming Centroid problem which asks to determine whether some given strings have a centroid with a bound on the p-norm of its Hamming distances to the strings. Specifically, given a set S of strings and a real k, we consider the problem of determining whether there exists a string s^* with (sum_{s in S} d^{p}(s^*,s))^(1/p) <=k, where d(,) denotes the Hamming distance metric. This problem has important applications in data clustering and multi-winner committee elections, and is a generalization of the well-known polynomial-time solvable Consensus String (p=1) problem, as well as the NP-hard Closest String (p=infty) problem. Our main result shows that the problem is NP-hard for all fixed rational p > 1, closing the gap for all rational values of p between 1 and infty. Under standard complexity assumptions the reduction also implies that the problem has no 2^o(n+m)-time or 2^o(k^(p/(p+1)))-time algorithm, where m denotes the number of input strings and n denotes the length of each string, for any fixed p > 1. The first bound matches a straightforward brute-force algorithm. The second bound is tight in the sense that for each fixed epsilon > 0, we provide a 2^(k^(p/((p+1))+epsilon))-time algorithm. In the last part of the paper, we complement our hardness result by presenting a fixed-parameter algorithm and a factor-2 approximation algorithm for the problem

    Approximation and Parameterized Complexity of Minimax Approval Voting

    Full text link
    We present three results on the complexity of Minimax Approval Voting. First, we study Minimax Approval Voting parameterized by the Hamming distance dd from the solution to the votes. We show Minimax Approval Voting admits no algorithm running in time O⋆(2o(dlog⁥d))\mathcal{O}^\star(2^{o(d\log d)}), unless the Exponential Time Hypothesis (ETH) fails. This means that the O⋆(d2d)\mathcal{O}^\star(d^{2d}) algorithm of Misra et al. [AAMAS 2015] is essentially optimal. Motivated by this, we then show a parameterized approximation scheme, running in time O⋆((3/Ï”)2d)\mathcal{O}^\star(\left({3}/{\epsilon}\right)^{2d}), which is essentially tight assuming ETH. Finally, we get a new polynomial-time randomized approximation scheme for Minimax Approval Voting, which runs in time nO(1/Ï”2⋅log⁥(1/Ï”))⋅poly(m)n^{\mathcal{O}(1/\epsilon^2 \cdot \log(1/\epsilon))} \cdot \mathrm{poly}(m), almost matching the running time of the fastest known PTAS for Closest String due to Ma and Sun [SIAM J. Comp. 2009].Comment: 14 pages, 3 figures, 2 pseudocode

    On the String Consensus Problem and the Manhattan Sequence Consensus Problem

    Full text link
    In the Manhattan Sequence Consensus problem (MSC problem) we are given kk integer sequences, each of length ll, and we are to find an integer sequence xx of length ll (called a consensus sequence), such that the maximum Manhattan distance of xx from each of the input sequences is minimized. For binary sequences Manhattan distance coincides with Hamming distance, hence in this case the string consensus problem (also called string center problem or closest string problem) is a special case of MSC. Our main result is a practically efficient O(l)O(l)-time algorithm solving MSC for k≀5k\le 5 sequences. Practicality of our algorithms has been verified experimentally. It improves upon the quadratic algorithm by Amir et al.\ (SPIRE 2012) for string consensus problem for k=5k=5 binary strings. Similarly as in Amir's algorithm we use a column-based framework. We replace the implied general integer linear programming by its easy special cases, due to combinatorial properties of the MSC for k≀5k\le 5. We also show that for a general parameter kk any instance can be reduced in linear time to a kernel of size k!k!, so the problem is fixed-parameter tractable. Nevertheless, for k≄4k\ge 4 this is still too large for any naive solution to be feasible in practice.Comment: accepted to SPIRE 201

    Optimal Parameter Choices Through Self-Adjustment: Applying the 1/5-th Rule in Discrete Settings

    Full text link
    While evolutionary algorithms are known to be very successful for a broad range of applications, the algorithm designer is often left with many algorithmic choices, for example, the size of the population, the mutation rates, and the crossover rates of the algorithm. These parameters are known to have a crucial influence on the optimization time, and thus need to be chosen carefully, a task that often requires substantial efforts. Moreover, the optimal parameters can change during the optimization process. It is therefore of great interest to design mechanisms that dynamically choose best-possible parameters. An example for such an update mechanism is the one-fifth success rule for step-size adaption in evolutionary strategies. While in continuous domains this principle is well understood also from a mathematical point of view, no comparable theory is available for problems in discrete domains. In this work we show that the one-fifth success rule can be effective also in discrete settings. We regard the (1+(λ,λ))(1+(\lambda,\lambda))~GA proposed in [Doerr/Doerr/Ebel: From black-box complexity to designing new genetic algorithms, TCS 2015]. We prove that if its population size is chosen according to the one-fifth success rule then the expected optimization time on \textsc{OneMax} is linear. This is better than what \emph{any} static population size λ\lambda can achieve and is asymptotically optimal also among all adaptive parameter choices.Comment: This is the full version of a paper that is to appear at GECCO 201

    Query-Driven Sampling for Collective Entity Resolution

    Full text link
    Probabilistic databases play a preeminent role in the processing and management of uncertain data. Recently, many database research efforts have integrated probabilistic models into databases to support tasks such as information extraction and labeling. Many of these efforts are based on batch oriented inference which inhibits a realtime workflow. One important task is entity resolution (ER). ER is the process of determining records (mentions) in a database that correspond to the same real-world entity. Traditional pairwise ER methods can lead to inconsistencies and low accuracy due to localized decisions. Leading ER systems solve this problem by collectively resolving all records using a probabilistic graphical model and Markov chain Monte Carlo (MCMC) inference. However, for large datasets this is an extremely expensive process. One key observation is that, such exhaustive ER process incurs a huge up-front cost, which is wasteful in practice because most users are interested in only a small subset of entities. In this paper, we advocate pay-as-you-go entity resolution by developing a number of query-driven collective ER techniques. We introduce two classes of SQL queries that involve ER operators --- selection-driven ER and join-driven ER. We implement novel variations of the MCMC Metropolis Hastings algorithm to generate biased samples and selectivity-based scheduling algorithms to support the two classes of ER queries. Finally, we show that query-driven ER algorithms can converge and return results within minutes over a database populated with the extraction from a newswire dataset containing 71 million mentions
    • 

    corecore