930 research outputs found

    Efficient algorithms for enumerating maximal common subsequences of two strings

    Full text link
    We propose efficient algorithms for enumerating maximal common subsequences (MCSs) of two strings. Efficiency of the algorithms are estimated by the preprocessing-time, space, and delay-time complexities. One algorithm prepares a cubic-space data structure in cubic time to output each MCS in linear time. This data structure can be used to search for particular MCSs satisfying some condition without performing an explicit enumeration. Another prepares a quadratic-space data structure in quadratic time to output each MCS in linear time, and the other prepares a linear-space data structure in quadratic time to output each MCS in linearithmic time.Comment: 23 pages, 5 Postscript figure

    Faster STR-IC-LCS Computation via RLE

    Get PDF
    The constrained LCS problem asks one to find a longest common subsequence of two input strings A and B with some constraints. The STR-IC-LCS problem is a variant of the constrained LCS problem, where the solution must include a given constraint string C as a substring. Given two strings A and B of respective lengths M and N, and a constraint string C of length at most min{M, N}, the best known algorithm for the STR-IC-LCS problem, proposed by Deorowicz (Inf. Process. Lett., 11:423-426, 2012), runs in O(MN) time. In this work, we present an O(mN + nM)-time solution to the STR-IC-LCS problem, where m and n denote the sizes of the run-length encodings of A and B, respectively. Since m <= M and n <= N always hold, our algorithm is always as fast as Deorowicz\u27s algorithm, and is faster when input strings are compressible via RLE

    Subsequence Automata with Default Transitions

    Get PDF
    Let SS be a string of length nn with characters from an alphabet of size σ\sigma. The \emph{subsequence automaton} of SS (often called the \emph{directed acyclic subsequence graph}) is the minimal deterministic finite automaton accepting all subsequences of SS. A straightforward construction shows that the size (number of states and transitions) of the subsequence automaton is O(nσ)O(n\sigma) and that this bound is asymptotically optimal. In this paper, we consider subsequence automata with \emph{default transitions}, that is, special transitions to be taken only if none of the regular transitions match the current character, and which do not consume the current character. We show that with default transitions, much smaller subsequence automata are possible, and provide a full trade-off between the size of the automaton and the \emph{delay}, i.e., the maximum number of consecutive default transitions followed before consuming a character. Specifically, given any integer parameter kk, 1<k≤σ1 < k \leq \sigma, we present a subsequence automaton with default transitions of size O(nklog⁡kσ)O(nk\log_{k}\sigma) and delay O(log⁡kσ)O(\log_k \sigma). Hence, with k=2k = 2 we obtain an automaton of size O(nlog⁡σ)O(n \log \sigma) and delay O(log⁡σ)O(\log \sigma). On the other extreme, with k=σk = \sigma, we obtain an automaton of size O(nσ)O(n \sigma) and delay O(1)O(1), thus matching the bound for the standard subsequence automaton construction. Finally, we generalize the result to multiple strings. The key component of our result is a novel hierarchical automata construction of independent interest.Comment: Corrected typo

    A fast algorithm for the constrained multiple sequence alignment problem

    Get PDF
    Given n strings S1, S2, ..., Sn, and a pattern string P, the constrained multiple sequence alignment (CMSA) problem is to find an optimal multiple alignment of S1, S2, ..., Sn such that the alignment contains P, i.e. in the alignment matrix there exists a sequence of columns each entirely composed of symbol P[k] for every k, where P[k] is the kth symbol in P, 1 ≤ k ≤ |P|, and in the sequence, a column containing P[i] appears before the column containing P[j] for all i,j, i < j. The problem is motivated from the problem of comparing multiple sequences that share a common structure, or sequence pattern. There are O(2ns1s2...snr)-time dynamic programming algorithms for the problem, where s1,s2, ...,sn and r are, respectively, the lengths of the input strings and the pattern string. Feasibility of these algorithms in practice is limited when the number of sequences is large, or the sequences are long because of the impractically long time required by these algorithms. We present a new algorithm with worst-case time complexity also O(2ns1s2...snr), but the algorithm avoids redundant computations in existing dynamic programming solutions. Experiments on both randomly generated strings and real data show that this algorithm is much faster than the existing algorithms. We present an analysis that explains the speed-up obtained in our experiments by our algorithm over the naive dynamic programming algorithm for constrained multiple sequence alignment of protein sequences. The speed-up is more significant when pattern is long, or n is large. For example in the case of constrained pairwise sequence alignment (the CMSA problem with n=2) when the pattern is sufficiently long for strings S1 and S2, the asymptotic time complexity is observed to be O(s1s2) instead of O(s1s2r). Main ideas in our algorithm can also be used in other constrained sequence alignment problems

    Algebraic aspects of increasing subsequences

    Get PDF
    We present a number of results relating partial Cauchy-Littlewood sums, integrals over the compact classical groups, and increasing subsequences of permutations. These include: integral formulae for the distribution of the longest increasing subsequence of a random involution with constrained number of fixed points; new formulae for partial Cauchy-Littlewood sums, as well as new proofs of old formulae; relations of these expressions to orthogonal polynomials on the unit circle; and explicit bases for invariant spaces of the classical groups, together with appropriate generalizations of the straightening algorithm.Comment: LaTeX+amsmath+eepic; 52 pages. Expanded introduction, new references, other minor change

    Optimization flow control -- I: Basic algorithm and convergence

    Get PDF
    We propose an optimization approach to flow control where the objective is to maximize the aggregate source utility over their transmission rates. We view network links and sources as processors of a distributed computation system to solve the dual problem using a gradient projection algorithm. In this system, sources select transmission rates that maximize their own benefits, utility minus bandwidth cost, and network links adjust bandwidth prices to coordinate the sources' decisions. We allow feedback delays to be different, substantial, and time varying, and links and sources to update at different times and with different frequencies. We provide asynchronous distributed algorithms and prove their convergence in a static environment. We present measurements obtained from a preliminary prototype to illustrate the convergence of the algorithm in a slowly time-varying environment. We discuss its fairness property
    • …
    corecore