Search CORE

3,304 research outputs found

A Dynamic Algorithm for the Longest Common Subsequence Problem using Ant Colony Optimization Technique

Author: Chaudhuri Arindam
Publication venue
Publication date: 07/07/2013
Field of study

We present a dynamic algorithm for solving the Longest Common Subsequence Problem using Ant Colony Optimization Technique. The Ant Colony Optimization Technique has been applied to solve many problems in Optimization Theory, Machine Learning and Telecommunication Networks etc. In particular, application of this theory in NP-Hard Problems has a remarkable significance. Given two strings, the traditional technique for finding Longest Common Subsequence is based on Dynamic Programming which consists of creating a recurrence relation and filling a table of size . The proposed algorithm draws analogy with behavior of ant colonies function and this new computational paradigm is known as Ant System. It is a viable new approach to Stochastic Combinatorial Optimization. The main characteristics of this model are positive feedback, distributed computation, and the use of constructive greedy heuristic. Positive feedback accounts for rapid discovery of good solutions, distributed computation avoids premature convergence and greedy heuristic helps find acceptable solutions in minimum number of stages. We apply the proposed methodology to Longest Common Subsequence Problem and give the simulation results. The effectiveness of this approach is demonstrated by efficient Computational Complexity. To the best of our knowledge, this is the first Ant Colony Optimization Algorithm for Longest Common Subsequence Problem.Comment: Proceedings of 2nd International Conference on Mathematics: Trends and Developments, Al Azhar University, Cairo, Egypt, 200

arXiv.org e-Print Archive

Computing a Longest Common Palindromic Subsequence

Author: Chowdhury Shihabur Rahman
Hasan Md. Mahbubul
Iqbal Sumaiya
Rahman M. Sohel
Publication venue
Publication date: 24/10/2011
Field of study

The {\em longest common subsequence (LCS)} problem is a classic and well-studied problem in computer science. Palindrome is a word which reads the same forward as it does backward. The {\em longest common palindromic subsequence (LCPS)} problem is an interesting variant of the classic LCS problem which finds the longest common subsequence between two given strings such that the computed subsequence is also a palindrome. In this paper, we study the LCPS problem and give efficient algorithms to solve this problem. To the best of our knowledge, this is the first attempt to study and solve this interesting problem

arXiv.org e-Print Archive

Space efficient streaming algorithms for the distance to monotonicity and asymmetric edit distance

Author: Saks Michael
Seshadhri C.
Publication venue
Publication date: 12/04/2013
Field of study

Approximating the length of the longest increasing sequence (LIS) of an array is a well-studied problem. We study this problem in the data stream model, where the algorithm is allowed to make a single left-to-right pass through the array and the key resource to be minimized is the amount of additional memory used. We present an algorithm which, for any

\delta > 0

, given streaming access to an array of length

n

provides a

(1+\delta)

-multiplicative approximation to the \emph{distance to monotonicity} (

n

minus the length of the LIS), and uses only

O((\log^2 n)/\delta)

space. The previous best known approximation using polylogarithmic space was a multiplicative 2-factor. Our algorithm can be used to estimate the length of the LIS to within an additive

\delta n

for any

\delta >0

while previous algorithms could only achieve additive error

n(1/2-o(1))

. Our algorithm is very simple, being just 3 lines of pseudocode, and has a small update time. It is essentially a polylogarithmic space approximate implementation of a classic dynamic program that computes the LIS. We also give a streaming algorithm for approximating

LCS(x,y)

, the length of the longest common subsequence between strings

x

and

y

, each of length

n

. Our algorithm works in the asymmetric setting (inspired by \cite{AKO10}), in which we have random access to

y

and streaming access to

x

, and runs in small space provided that no single symbol appears very often in

y

. More precisely, it gives an additive-

\delta n

approximation to

LCS(x,y)

(and hence also to

E(x,y) = n-LCS(x,y)

, the edit distance between

x

and

y

when insertions and deletions, but not substitutions, are allowed), with space complexity

O(k(\log^2 n)/\delta)

, where

k

is the maximum number of times any one symbol appears in

y

.Comment: Final SODA 2013 version. Fixed bugs. We get a \delta n-additive approximation for edit distance, not multiplicative as said in the earlier tech repor

arXiv.org e-Print Archive

A W[1]-Completeness Result for Generalized Permutation Pattern Matching

Author: Bruner Marie-Louise
Lackner Martin
Publication venue
Publication date: 14/01/2013
Field of study

The NP-complete Permutation Pattern Matching problem asks whether a permutation P (the pattern) can be matched into a permutation T (the text). A matching is an order-preserving embedding of P into T. In the Generalized Permutation Pattern Matching problem one can additionally enforce that certain adjacent elements in the pattern must be mapped to adjacent elements in the text. This paper studies the parameterized complexity of this more general problem. We show W[1]-completeness with respect to the length of the pattern P. Under standard complexity theoretic assumptions this implies that no fixed-parameter tractable algorithm can be found for any parameter depending solely on P.Comment: The contents of this paper have been integrated in the more comprehensive paper "The computational landscape of permutation patterns", arXiv:1301.034

arXiv.org e-Print Archive

Practical Algorithmic Techniques for Several String Processing Problems

Author: Andreica Mugurel Ionut
Tapus Nicolae
Publication venue
Publication date: 04/12/2009
Field of study

The domains of data mining and knowledge discovery make use of large amounts of textual data, which need to be handled efficiently. Specific problems, like finding the maximum weight ordered common subset of a set of ordered sets or searching for specific patterns within texts, occur frequently in this context. In this paper we present several novel and practical algorithmic techniques for processing textual data (strings) in order to efficiently solve multiple problems. Our techniques make use of efficient string algorithms and data structures, like KMP, suffix arrays, tries and deterministic finite automata

arXiv.org e-Print Archive

$LCSk$ ++: Practical similarity metric for long strings

Author: Pavetić Filip
Šikić Mile
Žužić Goran
Publication venue
Publication date: 09/07/2014
Field of study

In this paper we present

LCSk

++: a new metric for measuring the similarity of long strings, and provide an algorithm for its efficient computation. With ever increasing size of strings occuring in practice, e.g. large genomes of plants and animals, classic algorithms such as Longest Common Subsequence (LCS) fail due to demanding computational complexity. Recently, Benson et al. defined a similarity metric named

LCSk

. By relaxing the requirement that the

k

-length substrings should not overlap, we extend their definition into a new metric. An efficient algorithm is presented which computes

LCSk

++ with complexity of

O((|X|+|Y|)\log(|X|+|Y|))

for strings

X

and

Y

under a realistic random model. The algorithm has been designed with implementation simplicity in mind. Additionally, we describe how it can be adjusted to compute

LCSk

as well, which gives an improvement of the

O(|X|\dot|Y|)

algorithm presented in the original

LCSk

paper

arXiv.org e-Print Archive

Simple, efficient maxima-finding algorithms for multidimensional samples

Author: Chen Wei-Mei
Hwang Hsien-Kuei
Tsai Tsung-Hsi
Publication venue
Publication date: 07/10/2009
Field of study

New algorithms are devised for finding the maxima of multidimensional point samples, one of the very first problems studied in computational geometry. The algorithms are very simple and easily coded and modified for practical needs. The expected complexity of some measures related to the performance of the algorithms is analyzed. We also compare the efficiency of the algorithms with a few major ones used in practice, and apply our algorithms to find the maximal layers and the longest common subsequences of multiple sequences

arXiv.org e-Print Archive

Shouji: A Fast and Efficient Pre-Alignment Filter for Sequence Alignment

Author: Alkan Can
Alser Mohammed
Hassan Hasan
Kumar Akash
Mutlu Onur
Publication venue: 'Oxford University Press (OUP)'
Publication date: 15/04/2019
Field of study

Motivation: The ability to generate massive amounts of sequencing data continues to overwhelm the processing capability of existing algorithms and compute infrastructures. In this work, we explore the use of hardware/software co-design and hardware acceleration to significantly reduce the execution time of short sequence alignment, a crucial step in analyzing sequenced genomes. We introduce Shouji, a highly-parallel and accurate pre-alignment filter that remarkably reduces the need for computationally-costly dynamic programming algorithms. The first key idea of our proposed pre-alignment filter is to provide high filtering accuracy by correctly detecting all common subsequences shared between two given sequences. The second key idea is to design a hardware accelerator that adopts modern FPGA (Field-Programmable Gate Array) architectures to further boost the performance of our algorithm. Results: Shouji significantly improves the accuracy of pre-alignment filtering by up to two orders of magnitude compared to the state-of-the-art pre-alignment filters, GateKeeper and SHD. Our FPGA-based accelerator is up to three orders of magnitude faster than the equivalent CPU implementation of Shouji. Using a single FPGA chip, we benchmark the benefits of integrating Shouji with five state-of-the-art sequence aligners, designed for different computing platforms. The addition of Shouji as a pre-alignment step reduces the execution time of the five state-of-the-art sequence aligners by up to 18.8x. Shouji can be adapted for any bioinformatics pipeline that performs sequence alignment for verification. Unlike most existing methods that aim to accelerate sequence alignment, Shouji does not sacrifice any of the aligner capabilities, as it does not modify or replace the alignment step. Availability: https://github.com/CMU-SAFARI/ShoujiComment: https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/btz234/5421509, Bioinformatics Journal 201

arXiv.org e-Print Archive

Reusing an FM-index

Author: Belazzougui Djamal
Gagie Travis
Gog Simon
Manzini Giovanni
Sirén Jouni
Publication venue
Publication date: 09/05/2014
Field of study

Intuitively, if two strings

S_1

and

S_2

are sufficiently similar and we already have an FM-index for

S_1

then, by storing a little extra information, we should be able to reuse parts of that index in an FM-index for

S_2

. We formalize this intuition and show that it can lead to significant space savings in practice, as well as to some interesting theoretical problems

arXiv.org e-Print Archive

Solving The Longest Overlap Region Problem for Noncoding DNA Sequences with GPU

Author: Lin JianBiao
Nian Che
Tao Chen
Wang BaoQiu
Wen Xie
Zhong YuKun
Publication venue
Publication date: 27/10/2014
Field of study

Early hardware limitations of GPU (lack of synchronization primitives and limited memory caching mechanisms) can make GPU-based computation inefficient. Now Bio-technologies bring more chances to Bioinformatics and Biological Engineering. Our paper introduces a way to solve the longest overlap region of non-coding DNA sequences on using the Compute Unified Device Architecture (CUDA) platform Intel(R) Core(TM) i3- 3110m quad-core. Compared to standard CPU implementation, CUDA performance proves the method of the longest overlap region recognition of noncoding DNA is an efficient approach to high-performance bioinformatics applications. Studies show the fact that efficiency of GPU performance is more than 20 times speedup than that of CPU serial implementation. We believe our method gives a cost-efficient solution to the bioinformatics community for solving longest overlap region recognition problem and other related fields.Comment: 6 pages, 6 figure

arXiv.org e-Print Archive