3,279 research outputs found
Heuristic algorithms for the Longest Filled Common Subsequence Problem
At CPM 2017, Castelli et al. define and study a new variant of the Longest
Common Subsequence Problem, termed the Longest Filled Common Subsequence
Problem (LFCS). For the LFCS problem, the input consists of two strings and
and a multiset of characters . The goal is to insert the
characters from into the string , thus obtaining a new string
, such that the Longest Common Subsequence (LCS) between and is
maximized. Casteli et al. show that the problem is NP-hard and provide a
3/5-approximation algorithm for the problem.
In this paper we study the problem from the experimental point of view. We
introduce, implement and test new heuristic algorithms and compare them with
the approximation algorithm of Casteli et al. Moreover, we introduce an Integer
Linear Program (ILP) model for the problem and we use the state of the art ILP
solver, Gurobi, to obtain exact solution for moderate sized instances.Comment: Accepted and presented as a proceedings paper at SYNASC 201
Bounds on the Number of Longest Common Subsequences
This paper performs the analysis necessary to bound the running time of
known, efficient algorithms for generating all longest common subsequences.
That is, we bound the running time as a function of input size for algorithms
with time essentially proportional to the output size. This paper considers
both the case of computing all distinct LCSs and the case of computing all LCS
embeddings. Also included is an analysis of how much better the efficient
algorithms are than the standard method of generating LCS embeddings. A full
analysis is carried out with running times measured as a function of the total
number of input characters, and much of the analysis is also provided for cases
in which the two input sequences are of the same specified length or of two
independently specified lengths.Comment: 13 pages. Corrected typos, corrected operation of hyperlinks,
improved presentatio
High precision simulations of the longest common subsequence problem
The longest common subsequence problem is a long studied prototype of pattern
matching problems. In spite of the effort dedicated to it, the numerical value
of its central quantity, the Chvatal-Sankoff constant, is not yet known.
Numerical estimations of this constant are very difficult due to finite size
effects. We propose a numerical method to estimate the Chvatal-Sankoff constant
which combines the advantages of an analytically known functional form of the
finite size effects with an efficient multi-spin coding scheme. This method
yields very high precision estimates of the Chvatal-Sankoff constant. Our
results correct earlier estimates for small alphabet size while they are
consistent with (albeit more precise than) earlier results for larger alphabet
size.Comment: 8 pages, 4 figure
Fast Arc-Annotated Subsequence Matching in Linear Space
An arc-annotated string is a string of characters, called bases, augmented
with a set of pairs, called arcs, each connecting two bases. Given
arc-annotated strings and the arc-preserving subsequence problem is to
determine if can be obtained from by deleting bases from . Whenever
a base is deleted any arc with an endpoint in that base is also deleted.
Arc-annotated strings where the arcs are ``nested'' are a natural model of RNA
molecules that captures both the primary and secondary structure of these. The
arc-preserving subsequence problem for nested arc-annotated strings is basic
primitive for investigating the function of RNA molecules. Gramm et al. [ACM
Trans. Algorithms 2006] gave an algorithm for this problem using time
and space, where and are the lengths of and , respectively. In
this paper we present a new algorithm using time and space,
thereby matching the previous time bound while significantly reducing the space
from a quadratic term to linear. This is essential to process large RNA
molecules where the space is likely to be a bottleneck. To obtain our result we
introduce several novel ideas which may be of independent interest for related
problems on arc-annotated strings.Comment: To appear in Algoritmic
- …