3,334 research outputs found
Fluctuations of the Longest Common Subsequence for Sequences of Independent Blocks
The problem of the fluctuation of the Longest Common Subsequence (LCS) of two
i.i.d. sequences of length has been open for decades. There exist
contradicting conjectures on the topic. Chvatal and Sankoff conjectured in 1975
that asymptotically the order should be , while Waterman conjectured
in 1994 that asymptotically the order should be . A contiguous substring
consisting only of one type of symbol is called a block. In the present work,
we determine the order of the fluctuation of the LCS for a special model of
sequences consisting of i.i.d. blocks whose lengths are uniformly distributed
on the set , with a given positive integer. We showed that
the fluctuation in this model is asymptotically of order , which confirm
Waterman's conjecture. For achieving this goal, we developed a new method which
allows us to reformulate the problem of the order of the variance as a
(relatively) low dimensional optimization problem.Comment: PDFLatex, 40 page
Distribution of Aligned Letter Pairs in Optimal Alignments of Random Sequences
Considering the optimal alignment of two i.i.d. random sequences of length
, we show that when the scoring function is chosen randomly, almost surely
the empirical distribution of aligned letter pairs in all optimal alignments
converges to a unique limiting distribution as tends to infinity. This
result is interesting because it helps understanding the microscopic path
structure of a special type of last passage percolation problem with correlated
weights, an area of long-standing open problems. Characterizing the microscopic
path structure yields furthermore a robust alternative to optimal alignment
scores for testing the relatedness of genetic sequences
Convergence of the stochastic mesh estimator for pricing American options
Broadie and Glasserman proposed a
simulation-based method they named {\em stochastic mesh} for pricing
high-dimensional American options. Based on simulated states of the
assets underlying the option at each exercise opportunity, the
method produces an estimator of the option value at each sampled state.
Under the mild assumption of the finiteness of certain moments,
we derive an asymptotic upper bound on the probability of error
of the mesh estimator, where both the error size and the probability bound
vanish as the sample size increases.
We include the empirical performance
for the test problems used by Broadie and Glasserman in a recent unpublished
manuscript. We find that the mesh estimator
has large bias that decays very slowly with the sample size, suggesting that
in applications it will most likely be necessary to employ bias and/or
variance reduction techniques
An Upper Bound on the Convergence Rate of a Second Functional in Optimal Sequence Alignment
Consider finite sequences and of length , consisting of i.i.d.\ samples of random letters from a
finite alphabet, and let and be chosen i.i.d.\ randomly from the unit
ball in the space of symmetric scoring functions over this alphabet augmented
by a gap symbol. We prove a probabilistic upper bound of linear order in
for the deviation of the score relative to of optimal alignments
with gaps of and relative to . It remains an open
problem to prove a lower bound. Our result contributes to the understanding of
the microstructure of optimal alignments relative to one given scoring
function, extending a theory begun by the first two authors
A Monte Carlo Approach to the Fluctuation Problem in Optimal Alignments of Random Strings
The problem of determining the correct order of fluctuation of the optimal alignment score of two random strings of length has been open for several decades. It is known [12] that the biased expected effect of a random letter-change on the optimal score implies an order of fluctuation linear in β. However, in many situations where such a biased effect is observed empirically, it has been impossible to prove analytically. The main result of this paper shows that when the rescaled-limit of the optimal alignment score increases in a certain direction, then the biased effect exists. On the basis of this result one can quantify a confidence level for the existence of such a biased effect and hence of an order β fluctuation based on simulation of optimal alignments scores. This is an important step forward, as the correct order of fluctuation was previously known only for certain special distributions [12],[13],[5],[10]. To illustrate the usefulness of our new methodology, we apply it to optimal alignments of strings written in the DNA-alphabet. As scoring function, we use the BLASTZ default-substitution matrix together with a realistic gap penalty. BLASTZ is one of the most widely used sequence alignment methodologies in bioinformatics. For this DNA-setting, we show that with a high level of confidence, the fluctuation of the optimal alignment score is of order Ξ(β). An important special case of optimal alignment score is the Longest Common Subsequence (LCS) of random strings. For binary sequences with equiprobably symbols the question of the fluctuation of the LCS remains open. The symmetry in that case does not allow for our method. On the other hand, in real-life DNA sequences, it is not the case that all letters occur with the same frequency. So, for many real life situations, our method allows to determine the order of the fluctuation up to a high confidence level
- β¦