3,183 research outputs found

    Heuristic algorithms for the Longest Filled Common Subsequence Problem

    Full text link
    At CPM 2017, Castelli et al. define and study a new variant of the Longest Common Subsequence Problem, termed the Longest Filled Common Subsequence Problem (LFCS). For the LFCS problem, the input consists of two strings AA and BB and a multiset of characters M\mathcal{M}. The goal is to insert the characters from M\mathcal{M} into the string BB, thus obtaining a new string B∗B^*, such that the Longest Common Subsequence (LCS) between AA and B∗B^* is maximized. Casteli et al. show that the problem is NP-hard and provide a 3/5-approximation algorithm for the problem. In this paper we study the problem from the experimental point of view. We introduce, implement and test new heuristic algorithms and compare them with the approximation algorithm of Casteli et al. Moreover, we introduce an Integer Linear Program (ILP) model for the problem and we use the state of the art ILP solver, Gurobi, to obtain exact solution for moderate sized instances.Comment: Accepted and presented as a proceedings paper at SYNASC 201

    Bounds on the Number of Longest Common Subsequences

    Get PDF
    This paper performs the analysis necessary to bound the running time of known, efficient algorithms for generating all longest common subsequences. That is, we bound the running time as a function of input size for algorithms with time essentially proportional to the output size. This paper considers both the case of computing all distinct LCSs and the case of computing all LCS embeddings. Also included is an analysis of how much better the efficient algorithms are than the standard method of generating LCS embeddings. A full analysis is carried out with running times measured as a function of the total number of input characters, and much of the analysis is also provided for cases in which the two input sequences are of the same specified length or of two independently specified lengths.Comment: 13 pages. Corrected typos, corrected operation of hyperlinks, improved presentatio

    High precision simulations of the longest common subsequence problem

    Full text link
    The longest common subsequence problem is a long studied prototype of pattern matching problems. In spite of the effort dedicated to it, the numerical value of its central quantity, the Chvatal-Sankoff constant, is not yet known. Numerical estimations of this constant are very difficult due to finite size effects. We propose a numerical method to estimate the Chvatal-Sankoff constant which combines the advantages of an analytically known functional form of the finite size effects with an efficient multi-spin coding scheme. This method yields very high precision estimates of the Chvatal-Sankoff constant. Our results correct earlier estimates for small alphabet size while they are consistent with (albeit more precise than) earlier results for larger alphabet size.Comment: 8 pages, 4 figure

    Fast Arc-Annotated Subsequence Matching in Linear Space

    Full text link
    An arc-annotated string is a string of characters, called bases, augmented with a set of pairs, called arcs, each connecting two bases. Given arc-annotated strings PP and QQ the arc-preserving subsequence problem is to determine if PP can be obtained from QQ by deleting bases from QQ. Whenever a base is deleted any arc with an endpoint in that base is also deleted. Arc-annotated strings where the arcs are ``nested'' are a natural model of RNA molecules that captures both the primary and secondary structure of these. The arc-preserving subsequence problem for nested arc-annotated strings is basic primitive for investigating the function of RNA molecules. Gramm et al. [ACM Trans. Algorithms 2006] gave an algorithm for this problem using O(nm)O(nm) time and space, where mm and nn are the lengths of PP and QQ, respectively. In this paper we present a new algorithm using O(nm)O(nm) time and O(n+m)O(n + m) space, thereby matching the previous time bound while significantly reducing the space from a quadratic term to linear. This is essential to process large RNA molecules where the space is likely to be a bottleneck. To obtain our result we introduce several novel ideas which may be of independent interest for related problems on arc-annotated strings.Comment: To appear in Algoritmic
    • …
    corecore