794 research outputs found

    Approximating LCS and Alignment Distance over Multiple Sequences

    Get PDF

    Lower Bounds for Combinatorial Algorithms for Boolean Matrix Multiplication

    Get PDF
    In this paper we propose models of combinatorial algorithms for the Boolean Matrix Multiplication (BMM), and prove lower bounds on computing BMM in these models. First, we give a relatively relaxed combinatorial model which is an extension of the model by Angluin (1976), and we prove that the time required by any algorithm for the BMM is at least Omega(n^3 / 2^{O( sqrt{ log n })}). Subsequently, we propose a more general model capable of simulating the "Four Russian Algorithm". We prove a lower bound of Omega(n^{7/3} / 2^{O(sqrt{ log n })}) for the BMM under this model. We use a special class of graphs, called (r,t)-graphs, originally discovered by Rusza and Szemeredi (1978), along with randomization, to construct matrices that are hard instances for our combinatorial models

    Approximate Online Pattern Matching in Sublinear Time

    Get PDF

    Approximating LCS and Alignment Distance over Multiple Sequences

    Get PDF
    We study the problem of aligning multiple sequences with the goal of finding an alignment that either maximizes the number of aligned symbols (the longest common subsequence (LCS)), or minimizes the number of unaligned symbols (the alignment distance (AD)). Multiple sequence alignment is a well-studied problem in bioinformatics and is used to identify regions of similarity among DNA, RNA, or protein sequences to detect functional, structural, or evolutionary relationships among them. It is known that exact computation of LCS or AD of mm sequences each of length nn requires Θ(nm)\Theta(n^m) time unless the Strong Exponential Time Hypothesis is false. In this paper, we provide several results to approximate LCS and AD of multiple sequences. If the LCS of mm sequences each of length nn is λn\lambda n for some λ∈[0,1]\lambda \in [0,1], then in O~m(n⌊m2⌋+1)\tilde{O}_m(n^{\lfloor\frac{m}{2}\rfloor+1}) time, we can return a common subsequence of length at least λ2n2+ϵ\frac{\lambda^2 n}{2+\epsilon} for any arbitrary constant ϵ>0\epsilon >0. It is possible to approximate the AD within a factor of two in time O~m(n⌈m2⌉)\tilde{O}_m(n^{\lceil\frac{m}{2}\rceil}). However, going below-2 approximation requires breaking the triangle inequality barrier which is a major challenge in this area. No such algorithm with a running time of O(nαm)O(n^{\alpha m}) for any α<1\alpha < 1 is known. If the AD is θn\theta n, then we design an algorithm that approximates the AD within an approximation factor of (2−3θ16+ϵ)\left(2-\frac{3\theta}{16}+\epsilon\right) in O~m(n⌊m2⌋+2)\tilde{O}_m(n^{\lfloor\frac{m}{2}\rfloor+2}) time. Thus, if θ\theta is a constant, we get a below-two approximation in O~m(n⌊m2⌋+2)\tilde{O}_m(n^{\lfloor\frac{m}{2}\rfloor+2}) time. Moreover, we show if just one out of mm sequences is (p,B)(p,B)-pseudorandom then, we get a below-2 approximation in O~m(nBm−1+n⌊m2⌋+3)\tilde{O}_m(nB^{m-1}+n^{\lfloor \frac{m}{2}\rfloor+3}) time irrespective of θ\theta

    A Linear-Time n^{0.4}-Approximation for Longest Common Subsequence

    Get PDF
    • …
    corecore