Search CORE

930 research outputs found

Efficient algorithms for enumerating maximal common subsequences of two strings

Author: Hirota Miyuji
Sakai Yoshifumi
Publication venue
Publication date: 19/07/2023
Field of study

We propose efficient algorithms for enumerating maximal common subsequences (MCSs) of two strings. Efficiency of the algorithms are estimated by the preprocessing-time, space, and delay-time complexities. One algorithm prepares a cubic-space data structure in cubic time to output each MCS in linear time. This data structure can be used to search for particular MCSs satisfying some condition without performing an explicit enumeration. Another prepares a quadratic-space data structure in quadratic time to output each MCS in linear time, and the other prepares a linear-space data structure in quadratic time to output each MCS in linearithmic time.Comment: 23 pages, 5 Postscript figure

arXiv.org e-Print Archive

Faster STR-IC-LCS Computation via RLE

Author: Bannai Hideo
Fujishige Yuta
Inenaga Shunsuke
Kuboi Keita
Takeda Masayuki
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)
Publication date: 01/01/2017
Field of study

The constrained LCS problem asks one to find a longest common subsequence of two input strings A and B with some constraints. The STR-IC-LCS problem is a variant of the constrained LCS problem, where the solution must include a given constraint string C as a substring. Given two strings A and B of respective lengths M and N, and a constraint string C of length at most min{M, N}, the best known algorithm for the STR-IC-LCS problem, proposed by Deorowicz (Inf. Process. Lett., 11:423-426, 2012), runs in O(MN) time. In this work, we present an O(mN + nM)-time solution to the STR-IC-LCS problem, where m and n denote the sizes of the run-length encodings of A and B, respectively. Since m <= M and n <= N always hold, our algorithm is always as fast as Deorowicz\u27s algorithm, and is faster when input strings are compressible via RLE

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Subsequence Automata with Default Transitions

Author: Bille Philip
Gørtz Inge Li
Skjoldjensen Frederik Rye
Publication venue
Publication date: 01/01/2016
Field of study

Let

S

be a string of length

n

with characters from an alphabet of size

\sigma

. The \emph{subsequence automaton} of

S

(often called the \emph{directed acyclic subsequence graph}) is the minimal deterministic finite automaton accepting all subsequences of

S

. A straightforward construction shows that the size (number of states and transitions) of the subsequence automaton is

O(n\sigma)

and that this bound is asymptotically optimal. In this paper, we consider subsequence automata with \emph{default transitions}, that is, special transitions to be taken only if none of the regular transitions match the current character, and which do not consume the current character. We show that with default transitions, much smaller subsequence automata are possible, and provide a full trade-off between the size of the automaton and the \emph{delay}, i.e., the maximum number of consecutive default transitions followed before consuming a character. Specifically, given any integer parameter

k

1 < k \leq \sigma

, we present a subsequence automaton with default transitions of size

O(nk\log_{k}\sigma)

and delay

O(\log_k \sigma)

. Hence, with

k = 2

we obtain an automaton of size

O(n \log \sigma)

and delay

O(\log \sigma)

. On the other extreme, with

k = \sigma

, we obtain an automaton of size

O(n \sigma)

and delay

O(1)

, thus matching the bound for the standard subsequence automaton construction. Finally, we generalize the result to multiple strings. The key component of our result is a novel hierarchical automata construction of independent interest.Comment: Corrected typo

arXiv.org e-Print Archive

Crossref

Online Research Database In Technology

A fast algorithm for the constrained multiple sequence alignment problem

Author: Arslan Abdullah N.
He Dan
Ling Alan C. H.
Publication venue
Publication date: 01/01/2006
Field of study

Given n strings S1, S2, ..., Sn, and a pattern string P, the constrained multiple sequence alignment (CMSA) problem is to find an optimal multiple alignment of S1, S2, ..., Sn such that the alignment contains P, i.e. in the alignment matrix there exists a sequence of columns each entirely composed of symbol P[k] for every k, where P[k] is the kth symbol in P, 1 ≤ k ≤ |P|, and in the sequence, a column containing P[i] appears before the column containing P[j] for all i,j, i < j. The problem is motivated from the problem of comparing multiple sequences that share a common structure, or sequence pattern. There are O(2ns1s2...snr)-time dynamic programming algorithms for the problem, where s1,s2, ...,sn and r are, respectively, the lengths of the input strings and the pattern string. Feasibility of these algorithms in practice is limited when the number of sequences is large, or the sequences are long because of the impractically long time required by these algorithms. We present a new algorithm with worst-case time complexity also O(2ns1s2...snr), but the algorithm avoids redundant computations in existing dynamic programming solutions. Experiments on both randomly generated strings and real data show that this algorithm is much faster than the existing algorithms. We present an analysis that explains the speed-up obtained in our experiments by our algorithm over the naive dynamic programming algorithm for constrained multiple sequence alignment of protein sequences. The speed-up is more significant when pattern is long, or n is large. For example in the case of constrained pairwise sequence alignment (the CMSA problem with n=2) when the pattern is sufficiently long for strings S1 and S2, the asymptotic time complexity is observed to be O(s1s2) instead of O(s1s2r). Main ideas in our algorithm can also be used in other constrained sequence alignment problems

University of Szeged

Algebraic aspects of increasing subsequences

Author: Baik Jinho
Rains Eric M.
Publication venue
Publication date: 01/01/1999
Field of study

We present a number of results relating partial Cauchy-Littlewood sums, integrals over the compact classical groups, and increasing subsequences of permutations. These include: integral formulae for the distribution of the longest increasing subsequence of a random involution with constrained number of fixed points; new formulae for partial Cauchy-Littlewood sums, as well as new proofs of old formulae; relations of these expressions to orthogonal polynomials on the unit circle; and explicit bases for invariant spaces of the classical groups, together with appropriate generalizations of the straightening algorithm.Comment: LaTeX+amsmath+eepic; 52 pages. Expanded introduction, new references, other minor change

arXiv.org e-Print Archive

CiteSeerX

Caltech Authors

Optimization flow control -- I: Basic algorithm and convergence

Author: David E. Lapsley
Senior Member
Steven H. Low
Publication venue
Publication date: 01/01/1999
Field of study

We propose an optimization approach to flow control where the objective is to maximize the aggregate source utility over their transmission rates. We view network links and sources as processors of a distributed computation system to solve the dual problem using a gradient projection algorithm. In this system, sources select transmission rates that maximize their own benefits, utility minus bandwidth cost, and network links adjust bandwidth prices to coordinate the sources' decisions. We allow feedback delays to be different, substantial, and time varying, and links and sources to update at different times and with different frequencies. We provide asynchronous distributed algorithms and prove their convergence in a static environment. We present measurements obtained from a preliminary prototype to illustrate the convergence of the algorithm in a slowly time-varying environment. We discuss its fairness property

CiteSeerX

Caltech Authors

Matching and Compression of Strings with Automata and Word Packing

Author: Skjoldjensen Frederik Rye
Publication venue: DTU Compute
Publication date: 01/01/2017
Field of study

Online Research Database In Technology