3,304 research outputs found
A Dynamic Algorithm for the Longest Common Subsequence Problem using Ant Colony Optimization Technique
We present a dynamic algorithm for solving the Longest Common Subsequence
Problem using Ant Colony Optimization Technique. The Ant Colony Optimization
Technique has been applied to solve many problems in Optimization Theory,
Machine Learning and Telecommunication Networks etc. In particular, application
of this theory in NP-Hard Problems has a remarkable significance. Given two
strings, the traditional technique for finding Longest Common Subsequence is
based on Dynamic Programming which consists of creating a recurrence relation
and filling a table of size . The proposed algorithm draws analogy with
behavior of ant colonies function and this new computational paradigm is known
as Ant System. It is a viable new approach to Stochastic Combinatorial
Optimization. The main characteristics of this model are positive feedback,
distributed computation, and the use of constructive greedy heuristic. Positive
feedback accounts for rapid discovery of good solutions, distributed
computation avoids premature convergence and greedy heuristic helps find
acceptable solutions in minimum number of stages. We apply the proposed
methodology to Longest Common Subsequence Problem and give the simulation
results. The effectiveness of this approach is demonstrated by efficient
Computational Complexity. To the best of our knowledge, this is the first Ant
Colony Optimization Algorithm for Longest Common Subsequence Problem.Comment: Proceedings of 2nd International Conference on Mathematics: Trends
and Developments, Al Azhar University, Cairo, Egypt, 200
Computing a Longest Common Palindromic Subsequence
The {\em longest common subsequence (LCS)} problem is a classic and
well-studied problem in computer science. Palindrome is a word which reads the
same forward as it does backward. The {\em longest common palindromic
subsequence (LCPS)} problem is an interesting variant of the classic LCS
problem which finds the longest common subsequence between two given strings
such that the computed subsequence is also a palindrome. In this paper, we
study the LCPS problem and give efficient algorithms to solve this problem. To
the best of our knowledge, this is the first attempt to study and solve this
interesting problem
Space efficient streaming algorithms for the distance to monotonicity and asymmetric edit distance
Approximating the length of the longest increasing sequence (LIS) of an array
is a well-studied problem. We study this problem in the data stream model,
where the algorithm is allowed to make a single left-to-right pass through the
array and the key resource to be minimized is the amount of additional memory
used. We present an algorithm which, for any , given streaming
access to an array of length provides a -multiplicative
approximation to the \emph{distance to monotonicity} ( minus the length of
the LIS), and uses only space. The previous best known
approximation using polylogarithmic space was a multiplicative 2-factor. Our
algorithm can be used to estimate the length of the LIS to within an additive
for any while previous algorithms could only achieve
additive error .
Our algorithm is very simple, being just 3 lines of pseudocode, and has a
small update time. It is essentially a polylogarithmic space approximate
implementation of a classic dynamic program that computes the LIS.
We also give a streaming algorithm for approximating , the length
of the longest common subsequence between strings and , each of length
. Our algorithm works in the asymmetric setting (inspired by \cite{AKO10}),
in which we have random access to and streaming access to , and runs in
small space provided that no single symbol appears very often in . More
precisely, it gives an additive- approximation to (and
hence also to , the edit distance between and when
insertions and deletions, but not substitutions, are allowed), with space
complexity , where is the maximum number of times
any one symbol appears in .Comment: Final SODA 2013 version. Fixed bugs. We get a \delta n-additive
approximation for edit distance, not multiplicative as said in the earlier
tech repor
A W[1]-Completeness Result for Generalized Permutation Pattern Matching
The NP-complete Permutation Pattern Matching problem asks whether a
permutation P (the pattern) can be matched into a permutation T (the text). A
matching is an order-preserving embedding of P into T. In the Generalized
Permutation Pattern Matching problem one can additionally enforce that certain
adjacent elements in the pattern must be mapped to adjacent elements in the
text. This paper studies the parameterized complexity of this more general
problem. We show W[1]-completeness with respect to the length of the pattern P.
Under standard complexity theoretic assumptions this implies that no
fixed-parameter tractable algorithm can be found for any parameter depending
solely on P.Comment: The contents of this paper have been integrated in the more
comprehensive paper "The computational landscape of permutation patterns",
arXiv:1301.034
Practical Algorithmic Techniques for Several String Processing Problems
The domains of data mining and knowledge discovery make use of large amounts
of textual data, which need to be handled efficiently. Specific problems, like
finding the maximum weight ordered common subset of a set of ordered sets or
searching for specific patterns within texts, occur frequently in this context.
In this paper we present several novel and practical algorithmic techniques for
processing textual data (strings) in order to efficiently solve multiple
problems. Our techniques make use of efficient string algorithms and data
structures, like KMP, suffix arrays, tries and deterministic finite automata
++: Practical similarity metric for long strings
In this paper we present ++: a new metric for measuring the similarity
of long strings, and provide an algorithm for its efficient computation. With
ever increasing size of strings occuring in practice, e.g. large genomes of
plants and animals, classic algorithms such as Longest Common Subsequence (LCS)
fail due to demanding computational complexity. Recently, Benson et al. defined
a similarity metric named . By relaxing the requirement that the
-length substrings should not overlap, we extend their definition into a new
metric. An efficient algorithm is presented which computes ++ with
complexity of for strings and under a
realistic random model. The algorithm has been designed with implementation
simplicity in mind. Additionally, we describe how it can be adjusted to compute
as well, which gives an improvement of the algorithm
presented in the original paper
Simple, efficient maxima-finding algorithms for multidimensional samples
New algorithms are devised for finding the maxima of multidimensional point
samples, one of the very first problems studied in computational geometry. The
algorithms are very simple and easily coded and modified for practical needs.
The expected complexity of some measures related to the performance of the
algorithms is analyzed. We also compare the efficiency of the algorithms with a
few major ones used in practice, and apply our algorithms to find the maximal
layers and the longest common subsequences of multiple sequences
Shouji: A Fast and Efficient Pre-Alignment Filter for Sequence Alignment
Motivation: The ability to generate massive amounts of sequencing data
continues to overwhelm the processing capability of existing algorithms and
compute infrastructures. In this work, we explore the use of hardware/software
co-design and hardware acceleration to significantly reduce the execution time
of short sequence alignment, a crucial step in analyzing sequenced genomes. We
introduce Shouji, a highly-parallel and accurate pre-alignment filter that
remarkably reduces the need for computationally-costly dynamic programming
algorithms. The first key idea of our proposed pre-alignment filter is to
provide high filtering accuracy by correctly detecting all common subsequences
shared between two given sequences. The second key idea is to design a hardware
accelerator that adopts modern FPGA (Field-Programmable Gate Array)
architectures to further boost the performance of our algorithm.
Results: Shouji significantly improves the accuracy of pre-alignment
filtering by up to two orders of magnitude compared to the state-of-the-art
pre-alignment filters, GateKeeper and SHD. Our FPGA-based accelerator is up to
three orders of magnitude faster than the equivalent CPU implementation of
Shouji. Using a single FPGA chip, we benchmark the benefits of integrating
Shouji with five state-of-the-art sequence aligners, designed for different
computing platforms. The addition of Shouji as a pre-alignment step reduces the
execution time of the five state-of-the-art sequence aligners by up to 18.8x.
Shouji can be adapted for any bioinformatics pipeline that performs sequence
alignment for verification. Unlike most existing methods that aim to accelerate
sequence alignment, Shouji does not sacrifice any of the aligner capabilities,
as it does not modify or replace the alignment step.
Availability: https://github.com/CMU-SAFARI/ShoujiComment: https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/btz234/5421509,
Bioinformatics Journal 201
Reusing an FM-index
Intuitively, if two strings and are sufficiently similar and we
already have an FM-index for then, by storing a little extra information,
we should be able to reuse parts of that index in an FM-index for . We
formalize this intuition and show that it can lead to significant space savings
in practice, as well as to some interesting theoretical problems
Solving The Longest Overlap Region Problem for Noncoding DNA Sequences with GPU
Early hardware limitations of GPU (lack of synchronization primitives and
limited memory caching mechanisms) can make GPU-based computation inefficient.
Now Bio-technologies bring more chances to Bioinformatics and Biological
Engineering. Our paper introduces a way to solve the longest overlap region of
non-coding DNA sequences on using the Compute Unified Device Architecture
(CUDA) platform Intel(R) Core(TM) i3- 3110m quad-core. Compared to standard CPU
implementation, CUDA performance proves the method of the longest overlap
region recognition of noncoding DNA is an efficient approach to
high-performance bioinformatics applications. Studies show the fact that
efficiency of GPU performance is more than 20 times speedup than that of CPU
serial implementation. We believe our method gives a cost-efficient solution to
the bioinformatics community for solving longest overlap region recognition
problem and other related fields.Comment: 6 pages, 6 figure
- …