Finding analytically the statistics of the longest common subsequence (LCS)
of a pair of random sequences drawn from c alphabets is a challenging problem
in computational evolutionary biology. We present exact asymptotic results for
the distribution of the LCS in a simpler, yet nontrivial, variant of the
original model called the Bernoulli matching (BM) model which reduces to the
original model in the large c limit. We show that in the BM model, for all c,
the distribution of the asymptotic length of the LCS, suitably scaled, is
identical to the Tracy-Widom distribution of the largest eigenvalue of a random
matrix whose entries are drawn from a Gaussian unitary ensemble. In particular,
in the large c limit, this provides an exact expression for the asymptotic
length distribution in the original LCS problem.Comment: 4 pages Revtex, 2 .eps figures include