3,304 research outputs found

    A Dynamic Algorithm for the Longest Common Subsequence Problem using Ant Colony Optimization Technique

    Full text link
    We present a dynamic algorithm for solving the Longest Common Subsequence Problem using Ant Colony Optimization Technique. The Ant Colony Optimization Technique has been applied to solve many problems in Optimization Theory, Machine Learning and Telecommunication Networks etc. In particular, application of this theory in NP-Hard Problems has a remarkable significance. Given two strings, the traditional technique for finding Longest Common Subsequence is based on Dynamic Programming which consists of creating a recurrence relation and filling a table of size . The proposed algorithm draws analogy with behavior of ant colonies function and this new computational paradigm is known as Ant System. It is a viable new approach to Stochastic Combinatorial Optimization. The main characteristics of this model are positive feedback, distributed computation, and the use of constructive greedy heuristic. Positive feedback accounts for rapid discovery of good solutions, distributed computation avoids premature convergence and greedy heuristic helps find acceptable solutions in minimum number of stages. We apply the proposed methodology to Longest Common Subsequence Problem and give the simulation results. The effectiveness of this approach is demonstrated by efficient Computational Complexity. To the best of our knowledge, this is the first Ant Colony Optimization Algorithm for Longest Common Subsequence Problem.Comment: Proceedings of 2nd International Conference on Mathematics: Trends and Developments, Al Azhar University, Cairo, Egypt, 200

    Computing a Longest Common Palindromic Subsequence

    Full text link
    The {\em longest common subsequence (LCS)} problem is a classic and well-studied problem in computer science. Palindrome is a word which reads the same forward as it does backward. The {\em longest common palindromic subsequence (LCPS)} problem is an interesting variant of the classic LCS problem which finds the longest common subsequence between two given strings such that the computed subsequence is also a palindrome. In this paper, we study the LCPS problem and give efficient algorithms to solve this problem. To the best of our knowledge, this is the first attempt to study and solve this interesting problem

    Space efficient streaming algorithms for the distance to monotonicity and asymmetric edit distance

    Full text link
    Approximating the length of the longest increasing sequence (LIS) of an array is a well-studied problem. We study this problem in the data stream model, where the algorithm is allowed to make a single left-to-right pass through the array and the key resource to be minimized is the amount of additional memory used. We present an algorithm which, for any δ>0\delta > 0, given streaming access to an array of length nn provides a (1+δ)(1+\delta)-multiplicative approximation to the \emph{distance to monotonicity} (nn minus the length of the LIS), and uses only O((log2n)/δ)O((\log^2 n)/\delta) space. The previous best known approximation using polylogarithmic space was a multiplicative 2-factor. Our algorithm can be used to estimate the length of the LIS to within an additive δn\delta n for any δ>0\delta >0 while previous algorithms could only achieve additive error n(1/2o(1))n(1/2-o(1)). Our algorithm is very simple, being just 3 lines of pseudocode, and has a small update time. It is essentially a polylogarithmic space approximate implementation of a classic dynamic program that computes the LIS. We also give a streaming algorithm for approximating LCS(x,y)LCS(x,y), the length of the longest common subsequence between strings xx and yy, each of length nn. Our algorithm works in the asymmetric setting (inspired by \cite{AKO10}), in which we have random access to yy and streaming access to xx, and runs in small space provided that no single symbol appears very often in yy. More precisely, it gives an additive-δn\delta n approximation to LCS(x,y)LCS(x,y) (and hence also to E(x,y)=nLCS(x,y)E(x,y) = n-LCS(x,y), the edit distance between xx and yy when insertions and deletions, but not substitutions, are allowed), with space complexity O(k(log2n)/δ)O(k(\log^2 n)/\delta), where kk is the maximum number of times any one symbol appears in yy.Comment: Final SODA 2013 version. Fixed bugs. We get a \delta n-additive approximation for edit distance, not multiplicative as said in the earlier tech repor

    A W[1]-Completeness Result for Generalized Permutation Pattern Matching

    Full text link
    The NP-complete Permutation Pattern Matching problem asks whether a permutation P (the pattern) can be matched into a permutation T (the text). A matching is an order-preserving embedding of P into T. In the Generalized Permutation Pattern Matching problem one can additionally enforce that certain adjacent elements in the pattern must be mapped to adjacent elements in the text. This paper studies the parameterized complexity of this more general problem. We show W[1]-completeness with respect to the length of the pattern P. Under standard complexity theoretic assumptions this implies that no fixed-parameter tractable algorithm can be found for any parameter depending solely on P.Comment: The contents of this paper have been integrated in the more comprehensive paper "The computational landscape of permutation patterns", arXiv:1301.034

    Practical Algorithmic Techniques for Several String Processing Problems

    Full text link
    The domains of data mining and knowledge discovery make use of large amounts of textual data, which need to be handled efficiently. Specific problems, like finding the maximum weight ordered common subset of a set of ordered sets or searching for specific patterns within texts, occur frequently in this context. In this paper we present several novel and practical algorithmic techniques for processing textual data (strings) in order to efficiently solve multiple problems. Our techniques make use of efficient string algorithms and data structures, like KMP, suffix arrays, tries and deterministic finite automata

    LCSkLCSk++: Practical similarity metric for long strings

    Full text link
    In this paper we present LCSkLCSk++: a new metric for measuring the similarity of long strings, and provide an algorithm for its efficient computation. With ever increasing size of strings occuring in practice, e.g. large genomes of plants and animals, classic algorithms such as Longest Common Subsequence (LCS) fail due to demanding computational complexity. Recently, Benson et al. defined a similarity metric named LCSkLCSk. By relaxing the requirement that the kk-length substrings should not overlap, we extend their definition into a new metric. An efficient algorithm is presented which computes LCSkLCSk++ with complexity of O((X+Y)log(X+Y))O((|X|+|Y|)\log(|X|+|Y|)) for strings XX and YY under a realistic random model. The algorithm has been designed with implementation simplicity in mind. Additionally, we describe how it can be adjusted to compute LCSkLCSk as well, which gives an improvement of the O(X˙Y)O(|X|\dot|Y|) algorithm presented in the original LCSkLCSk paper

    Simple, efficient maxima-finding algorithms for multidimensional samples

    Full text link
    New algorithms are devised for finding the maxima of multidimensional point samples, one of the very first problems studied in computational geometry. The algorithms are very simple and easily coded and modified for practical needs. The expected complexity of some measures related to the performance of the algorithms is analyzed. We also compare the efficiency of the algorithms with a few major ones used in practice, and apply our algorithms to find the maximal layers and the longest common subsequences of multiple sequences

    Shouji: A Fast and Efficient Pre-Alignment Filter for Sequence Alignment

    Full text link
    Motivation: The ability to generate massive amounts of sequencing data continues to overwhelm the processing capability of existing algorithms and compute infrastructures. In this work, we explore the use of hardware/software co-design and hardware acceleration to significantly reduce the execution time of short sequence alignment, a crucial step in analyzing sequenced genomes. We introduce Shouji, a highly-parallel and accurate pre-alignment filter that remarkably reduces the need for computationally-costly dynamic programming algorithms. The first key idea of our proposed pre-alignment filter is to provide high filtering accuracy by correctly detecting all common subsequences shared between two given sequences. The second key idea is to design a hardware accelerator that adopts modern FPGA (Field-Programmable Gate Array) architectures to further boost the performance of our algorithm. Results: Shouji significantly improves the accuracy of pre-alignment filtering by up to two orders of magnitude compared to the state-of-the-art pre-alignment filters, GateKeeper and SHD. Our FPGA-based accelerator is up to three orders of magnitude faster than the equivalent CPU implementation of Shouji. Using a single FPGA chip, we benchmark the benefits of integrating Shouji with five state-of-the-art sequence aligners, designed for different computing platforms. The addition of Shouji as a pre-alignment step reduces the execution time of the five state-of-the-art sequence aligners by up to 18.8x. Shouji can be adapted for any bioinformatics pipeline that performs sequence alignment for verification. Unlike most existing methods that aim to accelerate sequence alignment, Shouji does not sacrifice any of the aligner capabilities, as it does not modify or replace the alignment step. Availability: https://github.com/CMU-SAFARI/ShoujiComment: https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/btz234/5421509, Bioinformatics Journal 201

    Reusing an FM-index

    Full text link
    Intuitively, if two strings S1S_1 and S2S_2 are sufficiently similar and we already have an FM-index for S1S_1 then, by storing a little extra information, we should be able to reuse parts of that index in an FM-index for S2S_2. We formalize this intuition and show that it can lead to significant space savings in practice, as well as to some interesting theoretical problems

    Solving The Longest Overlap Region Problem for Noncoding DNA Sequences with GPU

    Full text link
    Early hardware limitations of GPU (lack of synchronization primitives and limited memory caching mechanisms) can make GPU-based computation inefficient. Now Bio-technologies bring more chances to Bioinformatics and Biological Engineering. Our paper introduces a way to solve the longest overlap region of non-coding DNA sequences on using the Compute Unified Device Architecture (CUDA) platform Intel(R) Core(TM) i3- 3110m quad-core. Compared to standard CPU implementation, CUDA performance proves the method of the longest overlap region recognition of noncoding DNA is an efficient approach to high-performance bioinformatics applications. Studies show the fact that efficiency of GPU performance is more than 20 times speedup than that of CPU serial implementation. We believe our method gives a cost-efficient solution to the bioinformatics community for solving longest overlap region recognition problem and other related fields.Comment: 6 pages, 6 figure
    corecore