930 research outputs found
Efficient algorithms for enumerating maximal common subsequences of two strings
We propose efficient algorithms for enumerating maximal common subsequences
(MCSs) of two strings. Efficiency of the algorithms are estimated by the
preprocessing-time, space, and delay-time complexities. One algorithm prepares
a cubic-space data structure in cubic time to output each MCS in linear time.
This data structure can be used to search for particular MCSs satisfying some
condition without performing an explicit enumeration. Another prepares a
quadratic-space data structure in quadratic time to output each MCS in linear
time, and the other prepares a linear-space data structure in quadratic time to
output each MCS in linearithmic time.Comment: 23 pages, 5 Postscript figure
Faster STR-IC-LCS Computation via RLE
The constrained LCS problem asks one to find a longest common subsequence of two input strings A and B with some constraints. The STR-IC-LCS problem is a variant of the constrained LCS problem, where the solution must include a given constraint string C as a substring. Given two strings A and B of respective lengths M and N, and a constraint string C of length at most min{M, N}, the best known algorithm for the STR-IC-LCS problem, proposed by Deorowicz (Inf. Process. Lett., 11:423-426, 2012), runs in O(MN) time. In this work, we present an O(mN + nM)-time solution to the STR-IC-LCS problem, where m and n denote the sizes of the run-length encodings of A and B, respectively. Since m <= M and n <= N always hold, our algorithm is always as fast as Deorowicz\u27s algorithm, and is faster when input strings are compressible via RLE
Subsequence Automata with Default Transitions
Let be a string of length with characters from an alphabet of size
. The \emph{subsequence automaton} of (often called the
\emph{directed acyclic subsequence graph}) is the minimal deterministic finite
automaton accepting all subsequences of . A straightforward construction
shows that the size (number of states and transitions) of the subsequence
automaton is and that this bound is asymptotically optimal.
In this paper, we consider subsequence automata with \emph{default
transitions}, that is, special transitions to be taken only if none of the
regular transitions match the current character, and which do not consume the
current character. We show that with default transitions, much smaller
subsequence automata are possible, and provide a full trade-off between the
size of the automaton and the \emph{delay}, i.e., the maximum number of
consecutive default transitions followed before consuming a character.
Specifically, given any integer parameter , , we
present a subsequence automaton with default transitions of size
and delay . Hence, with we
obtain an automaton of size and delay . On
the other extreme, with , we obtain an automaton of size and delay , thus matching the bound for the standard subsequence
automaton construction. Finally, we generalize the result to multiple strings.
The key component of our result is a novel hierarchical automata construction
of independent interest.Comment: Corrected typo
A fast algorithm for the constrained multiple sequence alignment problem
Given n strings S1, S2, ..., Sn, and a pattern string P, the constrained multiple sequence alignment (CMSA) problem is to find an optimal multiple alignment of S1, S2, ..., Sn such that the alignment contains P, i.e. in the alignment matrix there exists a sequence of columns each entirely composed of symbol P[k] for every k, where P[k] is the kth symbol in P, 1 ⤠k ⤠|P|, and in the sequence, a column containing P[i] appears before the column containing P[j] for all i,j, i < j. The problem is motivated from the problem of comparing multiple sequences that share a common structure, or sequence pattern. There are O(2ns1s2...snr)-time dynamic programming algorithms for the problem, where s1,s2, ...,sn and r are, respectively, the lengths of the input strings and the pattern string. Feasibility of these algorithms in practice is limited when the number of sequences is large, or the sequences are long because of the impractically long time required by these algorithms. We present a new algorithm with worst-case time complexity also O(2ns1s2...snr), but the algorithm avoids redundant computations in existing dynamic programming solutions. Experiments on both randomly generated strings and real data show that this algorithm is much faster than the existing algorithms. We present an analysis that explains the speed-up obtained in our experiments by our algorithm over the naive dynamic programming algorithm for constrained multiple sequence alignment of protein sequences. The speed-up is more significant when pattern is long, or n is large. For example in the case of constrained pairwise sequence alignment (the CMSA problem with n=2) when the pattern is sufficiently long for strings S1 and S2, the asymptotic time complexity is observed to be O(s1s2) instead of O(s1s2r). Main ideas in our algorithm can also be used in other constrained sequence alignment problems
Algebraic aspects of increasing subsequences
We present a number of results relating partial Cauchy-Littlewood sums,
integrals over the compact classical groups, and increasing subsequences of
permutations. These include: integral formulae for the distribution of the
longest increasing subsequence of a random involution with constrained number
of fixed points; new formulae for partial Cauchy-Littlewood sums, as well as
new proofs of old formulae; relations of these expressions to orthogonal
polynomials on the unit circle; and explicit bases for invariant spaces of the
classical groups, together with appropriate generalizations of the
straightening algorithm.Comment: LaTeX+amsmath+eepic; 52 pages. Expanded introduction, new references,
other minor change
Optimization flow control -- I: Basic algorithm and convergence
We propose an optimization approach to flow control where the objective is to maximize the aggregate source utility over their transmission rates. We view network links and sources as processors of a distributed computation system to solve the dual problem using a gradient projection algorithm. In this system, sources select transmission rates that maximize their own benefits, utility minus bandwidth cost, and network links adjust bandwidth prices to coordinate the sources' decisions. We allow feedback delays to be different, substantial, and time varying, and links and sources to update at different times and with different frequencies. We provide asynchronous distributed algorithms and prove their convergence in a static environment. We present measurements obtained from a preliminary prototype to illustrate the convergence of the algorithm in a slowly time-varying environment. We discuss its fairness property
- âŚ