3,334 research outputs found

    Fluctuations of the Longest Common Subsequence for Sequences of Independent Blocks

    Full text link
    The problem of the fluctuation of the Longest Common Subsequence (LCS) of two i.i.d. sequences of length n>0n>0 has been open for decades. There exist contradicting conjectures on the topic. Chvatal and Sankoff conjectured in 1975 that asymptotically the order should be n2/3n^{2/3}, while Waterman conjectured in 1994 that asymptotically the order should be nn. A contiguous substring consisting only of one type of symbol is called a block. In the present work, we determine the order of the fluctuation of the LCS for a special model of sequences consisting of i.i.d. blocks whose lengths are uniformly distributed on the set {lβˆ’1,l,l+1}\{l-1,l,l+1\}, with ll a given positive integer. We showed that the fluctuation in this model is asymptotically of order nn, which confirm Waterman's conjecture. For achieving this goal, we developed a new method which allows us to reformulate the problem of the order of the variance as a (relatively) low dimensional optimization problem.Comment: PDFLatex, 40 page

    Distribution of Aligned Letter Pairs in Optimal Alignments of Random Sequences

    Full text link
    Considering the optimal alignment of two i.i.d. random sequences of length nn, we show that when the scoring function is chosen randomly, almost surely the empirical distribution of aligned letter pairs in all optimal alignments converges to a unique limiting distribution as nn tends to infinity. This result is interesting because it helps understanding the microscopic path structure of a special type of last passage percolation problem with correlated weights, an area of long-standing open problems. Characterizing the microscopic path structure yields furthermore a robust alternative to optimal alignment scores for testing the relatedness of genetic sequences

    Convergence of the stochastic mesh estimator for pricing American options

    Get PDF
    Broadie and Glasserman proposed a simulation-based method they named {\em stochastic mesh} for pricing high-dimensional American options. Based on simulated states of the assets underlying the option at each exercise opportunity, the method produces an estimator of the option value at each sampled state. Under the mild assumption of the finiteness of certain moments, we derive an asymptotic upper bound on the probability of error of the mesh estimator, where both the error size and the probability bound vanish as the sample size increases. We include the empirical performance for the test problems used by Broadie and Glasserman in a recent unpublished manuscript. We find that the mesh estimator has large bias that decays very slowly with the sample size, suggesting that in applications it will most likely be necessary to employ bias and/or variance reduction techniques

    An Upper Bound on the Convergence Rate of a Second Functional in Optimal Sequence Alignment

    Get PDF
    Consider finite sequences X[1,n]=X1…XnX_{[1,n]}=X_1\dots X_n and Y[1,n]=Y1…YnY_{[1,n]}=Y_1\dots Y_n of length nn, consisting of i.i.d.\ samples of random letters from a finite alphabet, and let SS and TT be chosen i.i.d.\ randomly from the unit ball in the space of symmetric scoring functions over this alphabet augmented by a gap symbol. We prove a probabilistic upper bound of linear order in n0.75n^{0.75} for the deviation of the score relative to TT of optimal alignments with gaps of X[1,n]X_{[1,n]} and Y[1,n]Y_{[1,n]} relative to SS. It remains an open problem to prove a lower bound. Our result contributes to the understanding of the microstructure of optimal alignments relative to one given scoring function, extending a theory begun by the first two authors

    A Monte Carlo Approach to the Fluctuation Problem in Optimal Alignments of Random Strings

    Get PDF
    The problem of determining the correct order of fluctuation of the optimal alignment score of two random strings of length nn has been open for several decades. It is known [12] that the biased expected effect of a random letter-change on the optimal score implies an order of fluctuation linear in √nn. However, in many situations where such a biased effect is observed empirically, it has been impossible to prove analytically. The main result of this paper shows that when the rescaled-limit of the optimal alignment score increases in a certain direction, then the biased effect exists. On the basis of this result one can quantify a confidence level for the existence of such a biased effect and hence of an order √nn fluctuation based on simulation of optimal alignments scores. This is an important step forward, as the correct order of fluctuation was previously known only for certain special distributions [12],[13],[5],[10]. To illustrate the usefulness of our new methodology, we apply it to optimal alignments of strings written in the DNA-alphabet. As scoring function, we use the BLASTZ default-substitution matrix together with a realistic gap penalty. BLASTZ is one of the most widely used sequence alignment methodologies in bioinformatics. For this DNA-setting, we show that with a high level of confidence, the fluctuation of the optimal alignment score is of order Θ(√nn). An important special case of optimal alignment score is the Longest Common Subsequence (LCS) of random strings. For binary sequences with equiprobably symbols the question of the fluctuation of the LCS remains open. The symmetry in that case does not allow for our method. On the other hand, in real-life DNA sequences, it is not the case that all letters occur with the same frequency. So, for many real life situations, our method allows to determine the order of the fluctuation up to a high confidence level
    • …
    corecore