Search CORE

3,183 research outputs found

Heuristic algorithms for the Longest Filled Common Subsequence Problem

Author: Mincu Radu Stefan
Popa Alexandru
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 16/04/2019
Field of study

At CPM 2017, Castelli et al. define and study a new variant of the Longest Common Subsequence Problem, termed the Longest Filled Common Subsequence Problem (LFCS). For the LFCS problem, the input consists of two strings

A

and

B

and a multiset of characters

\mathcal{M}

. The goal is to insert the characters from

\mathcal{M}

into the string

B

, thus obtaining a new string

B^*

, such that the Longest Common Subsequence (LCS) between

A

and

B^*

is maximized. Casteli et al. show that the problem is NP-hard and provide a 3/5-approximation algorithm for the problem. In this paper we study the problem from the experimental point of view. We introduce, implement and test new heuristic algorithms and compare them with the approximation algorithm of Casteli et al. Moreover, we introduce an Integer Linear Program (ILP) model for the problem and we use the state of the art ILP solver, Gurobi, to obtain exact solution for moderate sized instances.Comment: Accepted and presented as a proceedings paper at SYNASC 201

arXiv.org e-Print Archive

Crossref

Bounds on the Number of Longest Common Subsequences

Author: Greenberg Ronald I.
Publication venue
Publication date: 01/08/2003
Field of study

This paper performs the analysis necessary to bound the running time of known, efficient algorithms for generating all longest common subsequences. That is, we bound the running time as a function of input size for algorithms with time essentially proportional to the output size. This paper considers both the case of computing all distinct LCSs and the case of computing all LCS embeddings. Also included is an analysis of how much better the efficient algorithms are than the standard method of generating LCS embeddings. A full analysis is carried out with running times measured as a function of the total number of input characters, and much of the analysis is also provided for cases in which the two input sequences are of the same specified length or of two independently specified lengths.Comment: 13 pages. Corrected typos, corrected operation of hyperlinks, improved presentatio

arXiv.org e-Print Archive

Loyola eCommons

High precision simulations of the longest common subsequence problem

Author: Bundschuh R.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 17/06/2001
Field of study

The longest common subsequence problem is a long studied prototype of pattern matching problems. In spite of the effort dedicated to it, the numerical value of its central quantity, the Chvatal-Sankoff constant, is not yet known. Numerical estimations of this constant are very difficult due to finite size effects. We propose a numerical method to estimate the Chvatal-Sankoff constant which combines the advantages of an analytically known functional form of the finite size effects with an efficient multi-spin coding scheme. This method yields very high precision estimates of the Chvatal-Sankoff constant. Our results correct earlier estimates for small alphabet size while they are consistent with (albeit more precise than) earlier results for larger alphabet size.Comment: 8 pages, 4 figure

arXiv.org e-Print Archive

EDP Sciences OAI-PMH repository (1.2.0)

Fast Arc-Annotated Subsequence Matching in Linear Space

Author: D. Harel
G. Blin
G. Lin
I. Munro
J. Alber
P. Bille
P. Damaschke
P. Kilpeläinen
T. Kida
V. Bafna
W. Chen
Publication venue
Publication date: 01/01/2010
Field of study

An arc-annotated string is a string of characters, called bases, augmented with a set of pairs, called arcs, each connecting two bases. Given arc-annotated strings

P

and

Q

the arc-preserving subsequence problem is to determine if

P

can be obtained from

Q

by deleting bases from

Q

. Whenever a base is deleted any arc with an endpoint in that base is also deleted. Arc-annotated strings where the arcs are ``nested'' are a natural model of RNA molecules that captures both the primary and secondary structure of these. The arc-preserving subsequence problem for nested arc-annotated strings is basic primitive for investigating the function of RNA molecules. Gramm et al. [ACM Trans. Algorithms 2006] gave an algorithm for this problem using

O(nm)

time and space, where

m

and

n

are the lengths of

P

and

Q

, respectively. In this paper we present a new algorithm using

O(nm)

time and

O(n + m)

space, thereby matching the previous time bound while significantly reducing the space from a quadratic term to linear. This is essential to process large RNA molecules where the space is likely to be a bottleneck. To obtain our result we introduce several novel ideas which may be of independent interest for related problems on arc-annotated strings.Comment: To appear in Algoritmic

arXiv.org e-Print Archive

CiteSeerX

Crossref

Online Research Database In Technology