22 research outputs found

    NP-hardness of hypercube 2-segmentation

    Full text link
    The hypercube 2-segmentation problem is a certain biclustering problem that was previously claimed to be NP-hard, but for which there does not appear to be a publicly available proof of NP-hardness. This manuscript provides such a proof

    On the Complexity of the Single Individual SNP Haplotyping Problem

    Full text link
    We present several new results pertaining to haplotyping. These results concern the combinatorial problem of reconstructing haplotypes from incomplete and/or imperfectly sequenced haplotype fragments. We consider the complexity of the problems Minimum Error Correction (MEC) and Longest Haplotype Reconstruction (LHR) for different restrictions on the input data. Specifically, we look at the gapless case, where every row of the input corresponds to a gapless haplotype-fragment, and the 1-gap case, where at most one gap per fragment is allowed. We prove that MEC is APX-hard in the 1-gap case and still NP-hard in the gapless case. In addition, we question earlier claims that MEC is NP-hard even when the input matrix is restricted to being completely binary. Concerning LHR, we show that this problem is NP-hard and APX-hard in the 1-gap case (and thus also in the general case), but is polynomial time solvable in the gapless case.Comment: 26 pages. Related to the WABI2005 paper, "On the Complexity of Several Haplotyping Problems", but with more/different results. This papers has just been submitted to the IEEE/ACM Transactions on Computational Biology and Bioinformatics and we are awaiting a decision on acceptance. It differs from the mid-August version of this paper because here we prove that 1-gap LHR is APX-hard. (In the earlier version of the paper we could prove only that it was NP-hard.

    Clustering Boolean Tensors

    Full text link
    Tensor factorizations are computationally hard problems, and in particular, are often significantly harder than their matrix counterparts. In case of Boolean tensor factorizations -- where the input tensor and all the factors are required to be binary and we use Boolean algebra -- much of that hardness comes from the possibility of overlapping components. Yet, in many applications we are perfectly happy to partition at least one of the modes. In this paper we investigate what consequences does this partitioning have on the computational complexity of the Boolean tensor factorizations and present a new algorithm for the resulting clustering problem. This algorithm can alternatively be seen as a particularly regularized clustering algorithm that can handle extremely high-dimensional observations. We analyse our algorithms with the goal of maximizing the similarity and argue that this is more meaningful than minimizing the dissimilarity. As a by-product we obtain a PTAS and an efficient 0.828-approximation algorithm for rank-1 binary factorizations. Our algorithm for Boolean tensor clustering achieves high scalability, high similarity, and good generalization to unseen data with both synthetic and real-world data sets

    A QPTAS for Gapless MEC

    Get PDF
    We consider the problem Minimum Error Correction (MEC). A MEC instance is an n x m matrix M with entries from {0,1,-}. Feasible solutions are composed of two binary m-bit strings, together with an assignment of each row of M to one of the two strings. The objective is to minimize the number of mismatches (errors) where the row has a value that differs from the assigned solution string. The symbol "-" is a wildcard that matches both 0 and 1. A MEC instance is gapless, if in each row of M all binary entries are consecutive. Gapless-MEC is a relevant problem in computational biology, and it is closely related to segmentation problems that were introduced by {[}Kleinberg-Papadimitriou-Raghavan STOC\u2798{]} in the context of data mining. Without restrictions, it is known to be UG-hard to compute an O(1)-approximate solution to MEC. For both MEC and Gapless-MEC, the best polynomial time approximation algorithm has a logarithmic performance guarantee. We partially settle the approximation status of Gapless-MEC by providing a quasi-polynomial time approximation scheme (QPTAS). Additionally, for the relevant case where the binary part of a row is not contained in the binary part of another row, we provide a polynomial time approximation scheme (PTAS)

    Sub-Markov random walk for image segmentation

    Get PDF
    A novel sub-Markov random walk (subRW) algorithm with label prior is proposed for seeded image segmentation, which can be interpreted as a traditional random walker on a graph with added auxiliary nodes. Under this explanation, we unify the proposed subRW and other popular random walk (RW) algorithms. This unifying view will make it possible for transferring intrinsic findings between different RW algorithms, and offer new ideas for designing novel RW algorithms by adding or changing auxiliary nodes. To verify the second benefit, we design a new subRW algorithm with label prior to solve the segmentation problem of objects with thin and elongated parts. The experimental results on both synthetic and natural images with twigs demonstrate that the proposed subRW method outperforms previous RW algorithms for seeded image segmentation

    Parameterized Low-Rank Binary Matrix Approximation

    Get PDF
    We provide a number of algorithmic results for the following family of problems: For a given binary m x n matrix A and a nonnegative integer k, decide whether there is a "simple" binary matrix B which differs from A in at most k entries. For an integer r, the "simplicity" of B is characterized as follows. - Binary r-Means: Matrix B has at most r different columns. This problem is known to be NP-complete already for r=2. We show that the problem is solvable in time 2^{O(k log k)}*(nm)^O(1) and thus is fixed-parameter tractable parameterized by k. We also complement this result by showing that when being parameterized by r and k, the problem admits an algorithm of running time 2^{O(r^{3/2}* sqrt{k log k})}(nm)^O(1), which is subexponential in k for r in o((k/log k)^{1/3}). - Low GF(2)-Rank Approximation: Matrix B is of GF(2)-rank at most r. This problem is known to be NP-complete already for r=1. It is also known to be W[1]-hard when parameterized by k. Interestingly, when parameterized by r and k, the problem is not only fixed-parameter tractable, but it is solvable in time 2^{O(r^{3/2}* sqrt{k log k})}(nm)^O(1), which is subexponential in k for r in o((k/log k)^{1/3}). - Low Boolean-Rank Approximation: Matrix B is of Boolean rank at most r. The problem is known to be NP-complete for k=0 as well as for r=1. We show that it is solvable in subexponential in k time 2^{O(r2^r * sqrt{k log k})}(nm)^O(1)