Search CORE

368 research outputs found

An Efficient Dynamic Programming Algorithm for the Generalized LCS Problem with Multiple Substring Exclusion Constrains

Author: Wang Lei
Wang Xiaodong
Wu Yingjie
Zhu Daxin
Publication venue
Publication date: 07/03/2013
Field of study

In this paper, we consider a generalized longest common subsequence problem with multiple substring exclusion constrains. For the two input sequences

X

and

Y

of lengths

n

and

m

, and a set of

d

constrains

P=\{P_1,...,P_d\}

of total length

r

, the problem is to find a common subsequence

Z

X

and

Y

excluding each of constrain string in

P

as a substring and the length of

Z

is maximized. The problem was declared to be NP-hard\cite{1}, but we finally found that this is not true. A new dynamic programming solution for this problem is presented in this paper. The correctness of the new algorithm is proved. The time complexity of our algorithm is

O(nmr)

.Comment: arXiv admin note: substantial text overlap with arXiv:1301.718

arXiv.org e-Print Archive

CiteSeerX

Quantifying sequential subsumption

Author: Elzinga Cees
Lin Zhiwei
Vincent Jordan
Wang H.
Publication venue: 'Elsevier BV'
Publication date: 01/11/2019
Field of study

Queen's University Belfast Research Portal

Ulster University's Research Portal

The String-to-String Correction Problem with Block Moves

Author: Tichy Walter F.
Publication venue: 'Purdue University (bepress)'
Publication date: 01/01/1983
Field of study

Purdue E-Pubs

CAD Tools for DNA Micro-Array Design, Manufacture and Application

Author: Hundewale Nisar
Publication venue: ScholarWorks @ Georgia State University
Publication date: 04/12/2006
Field of study

Motivation: As the human genome project progresses and some microbial and eukaryotic genomes are recognized, numerous biotechnological processes have attracted increasing number of biologists, bioengineers and computer scientists recently. Biotechnological processes profoundly involve production and analysis of highthroughput experimental data. Numerous sequence libraries of DNA and protein structures of a large number of micro-organisms and a variety of other databases related to biology and chemistry are available. For example, microarray technology, a novel biotechnology, promises to monitor the whole genome at once, so that researchers can study the whole genome on the global level and have a better picture of the expressions among millions of genes simultaneously. Today, it is widely used in many fields- disease diagnosis, gene classification, gene regulatory network, and drug discovery. For example, designing organism specific microarray and analysis of experimental data require combining heterogeneous computational tools that usually differ in the data format; such as, GeneMark for ORF extraction, Promide for DNA probe selection, Chip for probe placement on microarray chip, BLAST to compare sequences, MEGA for phylogenetic analysis, and ClustalX for multiple alignments. Solution: Surprisingly enough, despite huge research efforts invested in DNA array applications, very few works are devoted to computer-aided optimization of DNA array design and manufacturing. Current design practices are dominated by ad-hoc heuristics incorporated in proprietary tools with unknown suboptimality. This will soon become a bottleneck for the new generation of high-density arrays, such as the ones currently being designed at Perlegen [109]. The goal of the already accomplished research was to develop highly scalable tools, with predictable runtime and quality, for cost-effective, computer-aided design and manufacturing of DNA probe arrays. We illustrate the utility of our approach by taking a concrete example of combining the design tools of microarray technology for Harpes B virus DNA data

ScholarWorks @ Georgia State University

Efficient Parallel Output-Sensitive Edit Distance

Author: Ding Xiangyun
Dong Xiaojun
Gu Yan
Liu Youzhe
Sun Yihan
Publication venue
Publication date: 01/01/2023
Field of study

Given two strings

A[1..n]

and

B[1..m]

, and a set of operations allowed to edit the strings, the edit distance between

A

and

B

is the minimum number of operations required to transform

A

into

B

. Sequentially, a standard Dynamic Programming (DP) algorithm solves edit distance with

\Theta(nm)

cost. In many real-world applications, the strings to be compared are similar and have small edit distances. To achieve highly practical implementations, we focus on output-sensitive parallel edit-distance algorithms, i.e., to achieve asymptotically better cost bounds than the standard

\Theta(nm)

algorithm when the edit distance is small. We study four algorithms in the paper, including three algorithms based on Breadth-First Search (BFS) and one algorithm based on Divide-and-Conquer (DaC). Our BFS-based solution is based on the Landau-Vishkin algorithm. We implement three different data structures for the longest common prefix (LCP) queries needed in the algorithm: the classic solution using parallel suffix array, and two hash-based solutions proposed in this paper. Our DaC-based solution is inspired by the output-insensitive solution proposed by Apostolico et al., and we propose a non-trivial adaption to make it output-sensitive. All our algorithms have good theoretical guarantees, and they achieve different tradeoffs between work (total number of operations), span (longest dependence chain in the computation), and space. We test and compare our algorithms on both synthetic data and real-world data. Our BFS-based algorithms outperform the existing parallel edit-distance implementation in ParlayLib in all test cases. By comparing our algorithms, we also provide a better understanding of the choice of algorithms for different input patterns. We believe that our paper is the first systematic study in the theory and practice of parallel edit distance

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Quantum Meets Fine-Grained Complexity: Sublinear Time Quantum Algorithms for String Problems

Author: Seddighin Saeed
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 13th Innovations in Theoretical Computer Science Conference (ITCS 2022)
Publication date: 01/01/2022
Field of study

Dagstuhl Research Online Publication Server