20 research outputs found

    Subsequence Combinatorics and Applications to Microarray Production, DNA Sequencing and Chaining Algorithms

    No full text
    Abstract. We investigate combinatorial enumeration problems related to subsequences of strings; in contrast to substrings, subsequences need not be contiguous. For a finite alphabet Σ, the following three problems are solved. (1) Number of distinct subsequences: Given a sequence s ∈ Σn and a nonnegative integer k ≤ n, how many distinct subse-quences of length k does s contain? A previous result by Chase states that this number is maximized by choosing s as a repeated permutation of the alphabet. This has applications in DNA microarray production. (2) Number of ρ-restricted ρ-generated sequences: Given s ∈ Σn and integers k ≥ 1 and ρ ≥ 1, how many distinct sequences in Σk contain no single nucleotide repeat longer than ρ and can be written as sr11... s rn n with 0 ≤ ri ≤ ρ for all i? For ρ = ∞, the question becomes how many length-k sequences match the regular expression s1*s2 *... sn*. These considerations allow a detailed analysis of a new DNA sequencing tech-nology (“454 sequencing”). (3) Exact length distribution of the longest increasing subsequence: Given Σ = {1,...,K} and an in-teger n ≥ 1, determine the number of sequences in Σn whose longest strictly increasing subsequence has length k, where 0 ≤ k ≤ K. This has applications to significance computations for chaining algorithms.
    corecore