1,403 research outputs found
Variants of Constrained Longest Common Subsequence
In this work, we consider a variant of the classical Longest Common
Subsequence problem called Doubly-Constrained Longest Common Subsequence
(DC-LCS). Given two strings s1 and s2 over an alphabet A, a set C_s of strings,
and a function Co from A to N, the DC-LCS problem consists in finding the
longest subsequence s of s1 and s2 such that s is a supersequence of all the
strings in Cs and such that the number of occurrences in s of each symbol a in
A is upper bounded by Co(a). The DC-LCS problem provides a clear mathematical
formulation of a sequence comparison problem in Computational Biology and
generalizes two other constrained variants of the LCS problem: the Constrained
LCS and the Repetition-Free LCS. We present two results for the DC-LCS problem.
First, we illustrate a fixed-parameter algorithm where the parameter is the
length of the solution. Secondly, we prove a parameterized hardness result for
the Constrained LCS problem when the parameter is the number of the constraint
strings and the size of the alphabet A. This hardness result also implies the
parameterized hardness of the DC-LCS problem (with the same parameters) and its
NP-hardness when the size of the alphabet is constant
The substring inclusion constraint longest common subsequence problem can be solved in quadratic time
AbstractIn this paper, we study some variants of the Constrained Longest Common Subsequence (CLCS) problem, namely, the substring inclusion CLCS (Substring-IC-CLCS) problem and a generalized version thereof. In the Substring-IC-CLCS problem, we are to find a longest common subsequence (LCS) of two given strings containing a third constraint string (given) as a substring. Previous solution to this problem runs in cubic time, i.e, O(nmk) time, where n,m and k are the length of the 3 input strings. In this paper, we present simple O(nm) time algorithms to solve the Substring-IC-CLCS problem. We also study the Generalized Substring-IC-LCS problem where we are given two strings of length n and m respectively and an ordered list of p strings and the goal is to find an LCS containing each of them as a substring in the order they appear in the list. We present an O(nmp) algorithm for this generalized version of the problem
Heuristic algorithms for the Longest Filled Common Subsequence Problem
At CPM 2017, Castelli et al. define and study a new variant of the Longest
Common Subsequence Problem, termed the Longest Filled Common Subsequence
Problem (LFCS). For the LFCS problem, the input consists of two strings and
and a multiset of characters . The goal is to insert the
characters from into the string , thus obtaining a new string
, such that the Longest Common Subsequence (LCS) between and is
maximized. Casteli et al. show that the problem is NP-hard and provide a
3/5-approximation algorithm for the problem.
In this paper we study the problem from the experimental point of view. We
introduce, implement and test new heuristic algorithms and compare them with
the approximation algorithm of Casteli et al. Moreover, we introduce an Integer
Linear Program (ILP) model for the problem and we use the state of the art ILP
solver, Gurobi, to obtain exact solution for moderate sized instances.Comment: Accepted and presented as a proceedings paper at SYNASC 201
Repetition-free longest common subsequence of random sequences
A repetition free Longest Common Subsequence (LCS) of two sequences x and y
is an LCS of x and y where each symbol may appear at most once. Let R denote
the length of a repetition free LCS of two sequences of n symbols each one
chosen randomly, uniformly, and independently over a k-ary alphabet. We study
the asymptotic, in n and k, behavior of R and establish that there are three
distinct regimes, depending on the relative speed of growth of n and k. For
each regime we establish the limiting behavior of R. In fact, we do more, since
we actually establish tail bounds for large deviations of R from its limiting
behavior.
Our study is motivated by the so called exemplar model proposed by Sankoff
(1999) and the related similarity measure introduced by Adi et al. (2007). A
natural question that arises in this context, which as we show is related to
long standing open problems in the area of probabilistic combinatorics, is to
understand the asymptotic, in n and k, behavior of parameter R.Comment: 15 pages, 1 figur
Tight Conditional Lower Bounds for Longest Common Increasing Subsequence
We consider the canonical generalization of the well-studied Longest Increasing Subsequence problem to multiple sequences, called k-LCIS: Given k integer sequences X_1,...,X_k of length at most n, the task is to determine the length of the longest common subsequence of X_1,...,X_k that is also strictly increasing. Especially for the case of k=2 (called LCIS for short), several algorithms have been proposed that require quadratic time in the worst case.
Assuming the Strong Exponential Time Hypothesis (SETH), we prove a tight lower bound, specifically, that no algorithm solves LCIS in (strongly) subquadratic time. Interestingly, the proof makes no use of normalization tricks common to hardness proofs for similar problems such as LCS. We further strengthen this lower bound to rule out O((nL)^{1-epsilon}) time algorithms for LCIS, where L denotes the solution size, and to rule out O(n^{k-epsilon}) time algorithms for k-LCIS. We obtain the same conditional lower bounds for the related Longest Common Weakly Increasing Subsequence problem
The Longest Filled Common Subsequence Problem
Inspired by a recent approach for genome reconstruction from incomplete data, we consider a variant of the longest common subsequence problem for the comparison of two sequences, one of which is incomplete, i.e. it has some missing elements. The new combinatorial problem, called Longest Filled Common Subsequence, given two sequences A and B, and a multiset M of symbols missing in B, asks for a sequence B* obtained by inserting the symbols of M into B so that B* induces a common subsequence with A of maximum length. First, we investigate the computational and approximation complexity of the problem and we show that it is NP-hard and APX-hard when A contains at most two occurrences of each symbol. Then, we give a 3/5 approximation algorithm for the problem. Finally, we present a fixed-parameter algorithm, when the problem is parameterized by the number of symbols inserted in B that "match" symbols of A
- …