Search CORE

3,334 research outputs found

Fluctuations of the Longest Common Subsequence for Sequences of Independent Blocks

Author: Matzinger Heinrich
Torres Felipe
Publication venue
Publication date: 01/01/2010
Field of study

The problem of the fluctuation of the Longest Common Subsequence (LCS) of two i.i.d. sequences of length

n>0

has been open for decades. There exist contradicting conjectures on the topic. Chvatal and Sankoff conjectured in 1975 that asymptotically the order should be

n^{2/3}

, while Waterman conjectured in 1994 that asymptotically the order should be

n

. A contiguous substring consisting only of one type of symbol is called a block. In the present work, we determine the order of the fluctuation of the LCS for a special model of sequences consisting of i.i.d. blocks whose lengths are uniformly distributed on the set

\{l-1,l,l+1\}

, with

l

a given positive integer. We showed that the fluctuation in this model is asymptotically of order

n

, which confirm Waterman's conjecture. For achieving this goal, we developed a new method which allows us to reformulate the problem of the order of the variance as a (relatively) low dimensional optimization problem.Comment: PDFLatex, 40 page

arXiv.org e-Print Archive

CiteSeerX

Distribution of Aligned Letter Pairs in Optimal Alignments of Random Sequences

Author: Hauser Raphael
Matzinger Heinrich
Publication venue
Publication date: 01/01/2012
Field of study

Considering the optimal alignment of two i.i.d. random sequences of length

n

, we show that when the scoring function is chosen randomly, almost surely the empirical distribution of aligned letter pairs in all optimal alignments converges to a unique limiting distribution as

n

tends to infinity. This result is interesting because it helps understanding the microscopic path structure of a special type of last passage percolation problem with correlated weights, an area of long-standing open problems. Characterizing the microscopic path structure yields furthermore a robust alternative to optimal alignment scores for testing the relatedness of genetic sequences

arXiv.org e-Print Archive

CiteSeerX

Oxford University Research Archive

Convergence of the stochastic mesh estimator for pricing American options

Author: Avramidis Athanassios
Matzinger Heinrich
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2001
Field of study

Broadie and Glasserman proposed a simulation-based method they named {\em stochastic mesh} for pricing high-dimensional American options. Based on simulated states of the assets underlying the option at each exercise opportunity, the method produces an estimator of the option value at each sampled state. Under the mild assumption of the finiteness of certain moments, we derive an asymptotic upper bound on the probability of error of the mesh estimator, where both the error size and the probability bound vanish as the sample size increases. We include the empirical performance for the test problems used by Broadie and Glasserman in a recent unpublished manuscript. We find that the mesh estimator has large bias that decays very slowly with the sample size, suggesting that in applications it will most likely be necessary to employ bias and/or variance reduction techniques

Southampton (e-Prints Soton)

Repository TU/e

Pure OAI Repository

eCommons@Cornell

An Upper Bound on the Convergence Rate of a Second Functional in Optimal Sequence Alignment

Author: Hauser Raphael
Matzinger Heinrich
Popescu Ionel
Publication venue
Publication date: 26/09/2014
Field of study

Consider finite sequences

X_{[1,n]}=X_1\dots X_n

and

Y_{[1,n]}=Y_1\dots Y_n

of length

n

, consisting of i.i.d.\ samples of random letters from a finite alphabet, and let

S

and

T

be chosen i.i.d.\ randomly from the unit ball in the space of symmetric scoring functions over this alphabet augmented by a gap symbol. We prove a probabilistic upper bound of linear order in

n^{0.75}

for the deviation of the score relative to

T

of optimal alignments with gaps of

X_{[1,n]}

and

Y_{[1,n]}

relative to

S

. It remains an open problem to prove a lower bound. Our result contributes to the understanding of the microstructure of optimal alignments relative to one given scoring function, extending a theory begun by the first two authors

arXiv.org e-Print Archive

A Monte Carlo Approach to the Fluctuation Problem in Optimal Alignments of Random Strings

Author: Amsalu Saba
Hauser Raphael
Matzinger Heinrich
Publication venue: Unspecified
Publication date: 01/01/2012
Field of study

The problem of determining the correct order of fluctuation of the optimal alignment score of two random strings of length

n

has been open for several decades. It is known [12] that the biased expected effect of a random letter-change on the optimal score implies an order of fluctuation linear in √

n

. However, in many situations where such a biased effect is observed empirically, it has been impossible to prove analytically. The main result of this paper shows that when the rescaled-limit of the optimal alignment score increases in a certain direction, then the biased effect exists. On the basis of this result one can quantify a confidence level for the existence of such a biased effect and hence of an order √

n

fluctuation based on simulation of optimal alignments scores. This is an important step forward, as the correct order of fluctuation was previously known only for certain special distributions [12],[13],[5],[10]. To illustrate the usefulness of our new methodology, we apply it to optimal alignments of strings written in the DNA-alphabet. As scoring function, we use the BLASTZ default-substitution matrix together with a realistic gap penalty. BLASTZ is one of the most widely used sequence alignment methodologies in bioinformatics. For this DNA-setting, we show that with a high level of confidence, the fluctuation of the optimal alignment score is of order Θ(√

n

). An important special case of optimal alignment score is the Longest Common Subsequence (LCS) of random strings. For binary sequences with equiprobably symbols the question of the fluctuation of the LCS remains open. The symmetry in that case does not allow for our method. On the other hand, in real-life DNA sequences, it is not the case that all letters occur with the same frequency. So, for many real life situations, our method allows to determine the order of the fluctuation up to a high confidence level

arXiv.org e-Print Archive

Oxford University Research Archive