Search CORE

2 research outputs found

Revisiting Waiting Times in DNA evolution

Author: Nicodeme Pierre
Publication venue
Publication date: 29/05/2012
Field of study

Transcription factors are short stretches of DNA (or

k

-mers) mainly located in promoters sequences that enhance or repress gene expression. With respect to an initial distribution of letters on the DNA alphabet, Behrens and Vingron consider a random sequence of length

n

that does not contain a given

k

-mer or word of size

k

. Under an evolution model of the DNA, they compute the probability

\mathfrak{p}_n

that this

k

-mer appears after a unit time of 20 years. They prove that the waiting time for the first apparition of the

k

-mer is well approximated by

T_n=1/\mathfrak{p}_n

. Their work relies on the simplifying assumption that the

k

-mer is not self-overlapping. They observe in particular that the waiting time is mostly driven by the initial distribution of letters. Behrens et al. use an approach by automata that relaxes the assumption related to words overlaps. Their numerical evaluations confirms the validity of Behrens and Vingron approach for non self-overlapping words, but provides up to 44% corrections for highly self-overlapping words such as

\mathtt{AAAAA}

. We devised an approach of the problem by clump analysis and generating functions; this approach leads to prove a quasi-linear behaviour of

\mathfrak{p}_n

for a large range of values of

n

, an important result for DNA evolution. We present here this clump analysis, first by language decomposition, and next by an automaton construction; finally, we describe an equivalent approach by construction of Markov automata.Comment: 19 pages, 3 Figures, 2 Table

arXiv.org e-Print Archive

HAL Descartes

HAL-Paris 13

Hal-Diderot

An Automaton Approach for Waiting Times in DNA Evolution

Author: Behrens Sarah
Nicaud Cyril
Nicodème Pierre
Publication venue: Mary Ann Liebert
Publication date: 01/12/2011
Field of study

International audienceIn a recent article, Behrens and Vingron (J. Comput. Biol. 17/12, 2010) compute waiting times for k-mers to appear during DNA evolution under the assumption that the considered k-mers do not occur in the initial DNA sequence, an issue arising when studying the evolution of regulatory DNA sequences with regard to transcription factor (TF) binding site emergence. The mathematical analysis underlying their computation assumes that occurrences of words under interest do not overlap. We relax here this assumption by use of an automata approach. In an alphabet of size 4 like the DNA alphabet, most words have no or a low autocorrelation; therefore, globally, our results confirm those of Behrens and Vingron. The outcome is quite different when considering highly autocorrelated k-mers; in this case, the autocorrelation pushes down the probability of occurrence of these k-mers at generation 1 and, consequently, increases the waiting time for apparition of these k-mers up to 40%. An analysis of existing TF binding sites unveils a significant proportion of k-mers exhibiting autocorrelation. Thus, our computations based on automata greatly improve the accuracy of predicting waiting times for the emergence of TF binding sites to appear during DNA evolution. We do the computation in the Bernoulli or M0 model; computations in the M1 model, a Markov model of order 1, are more costly in terms of time and memory but should produce similar results. While Behrens and Vingron considered specifically promoters of length 1000, we extend the results to promoters of any size; we exhibit the property that the probability that a k-mer occurs at generation time 1 while being absent at time 0 behaves linearly with respect to the length of the promoter, which induces a hyperbolic behaviour of the waiting time of any k-mer with respect to the length of the promoter. The C code is available at www.lipn.univ-paris13.fr/similar to nicodeme/

arXiv.org e-Print Archive

Crossref

Hal-Diderot

HAL-Polytechnique

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM