Search CORE

794 research outputs found

Approximating LCS and Alignment Distance over Multiple Sequences

Author: Das Debarati
Saha Barna
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2022)
Publication date: 01/01/2022
Field of study

Dagstuhl Research Online Publication Server

Lower Bounds for Combinatorial Algorithms for Boolean Matrix Multiplication

Author: Das Debarati
Saks Michael
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 35th Symposium on Theoretical Aspects of Computer Science (STACS 2018)
Publication date: 01/01/2018
Field of study

In this paper we propose models of combinatorial algorithms for the Boolean Matrix Multiplication (BMM), and prove lower bounds on computing BMM in these models. First, we give a relatively relaxed combinatorial model which is an extension of the model by Angluin (1976), and we prove that the time required by any algorithm for the BMM is at least Omega(n^3 / 2^{O( sqrt{ log n })}). Subsequently, we propose a more general model capable of simulating the "Four Russian Algorithm". We prove a lower bound of Omega(n^{7/3} / 2^{O(sqrt{ log n })}) for the BMM under this model. We use a special class of graphs, called (r,t)-graphs, originally discovered by Rusza and Szemeredi (1978), along with randomization, to construct matrices that are hard instances for our combinatorial models

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Approximate Online Pattern Matching in Sublinear Time

Author: Chakraborty Diptarka
Das Debarati
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 39th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2019)
Publication date: 01/01/2019
Field of study

Dagstuhl Research Online Publication Server

Approximating LCS and Alignment Distance over Multiple Sequences

Author: Das Debarati
Saha Barna
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2022)
Publication date: 24/10/2021
Field of study

We study the problem of aligning multiple sequences with the goal of finding an alignment that either maximizes the number of aligned symbols (the longest common subsequence (LCS)), or minimizes the number of unaligned symbols (the alignment distance (AD)). Multiple sequence alignment is a well-studied problem in bioinformatics and is used to identify regions of similarity among DNA, RNA, or protein sequences to detect functional, structural, or evolutionary relationships among them. It is known that exact computation of LCS or AD of

m

sequences each of length

n

requires

\Theta(n^m)

time unless the Strong Exponential Time Hypothesis is false. In this paper, we provide several results to approximate LCS and AD of multiple sequences. If the LCS of

m

sequences each of length

n

\lambda n

for some

\lambda \in [0,1]

, then in

\tilde{O}_m(n^{\lfloor\frac{m}{2}\rfloor+1})

time, we can return a common subsequence of length at least

\frac{\lambda^2 n}{2+\epsilon}

for any arbitrary constant

\epsilon >0

. It is possible to approximate the AD within a factor of two in time

\tilde{O}_m(n^{\lceil\frac{m}{2}\rceil})

. However, going below-2 approximation requires breaking the triangle inequality barrier which is a major challenge in this area. No such algorithm with a running time of

O(n^{\alpha m})

for any

\alpha < 1

is known. If the AD is

\theta n

, then we design an algorithm that approximates the AD within an approximation factor of

\left(2-\frac{3\theta}{16}+\epsilon\right)

\tilde{O}_m(n^{\lfloor\frac{m}{2}\rfloor+2})

time. Thus, if

\theta

is a constant, we get a below-two approximation in

\tilde{O}_m(n^{\lfloor\frac{m}{2}\rfloor+2})

time. Moreover, we show if just one out of

m

sequences is

(p,B)

-pseudorandom then, we get a below-2 approximation in

\tilde{O}_m(nB^{m-1}+n^{\lfloor \frac{m}{2}\rfloor+3})

time irrespective of

\theta

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

A Linear-Time n^{0.4}-Approximation for Longest Common Subsequence

Author: Bringmann Karl
Das Debarati
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021)
Publication date: 01/01/2021
Field of study

Copenhagen University Research Information System

Dagstuhl Research Online Publication Server

MPG.PuRe