3 research outputs found

    Large deviation properties for patterns

    Get PDF
    International audienceDeciding whether a given pattern is over- or under-represented according to a given background model is a key question in computational biology. Such a decision is usually made by computing some p-values reflecting the ''exceptionality'' of a pattern in a given sequence or set of sequences. In the simplest cases (short and simple patterns, simple background model, small number of sequences), an exact p-value can be computed with a tractable complexity. The realistic cases are in general too complicated to get such an exact pp-value. Approximations are thus proposed (Gaussian, Poisson, Large deviation approximations). These approximations are applicable under some conditions: Gaussian approximations are valid in the central domain while Poisson and Large deviation approximations are valid for rare events. In the present paper, we prove a large deviation approximation to the double strands counting problem that refers to a counting of a given pattern in a set of sequences that arise from both strands of the genome. In that case, dependencies between a sequence and its reverse complement cannot be neglected. They are captured here for a Bernoulli model from general combinatorial properties of the pattern. A large deviation result is also provided for a set of small sequences.Ce papier établit un résultat de grande déviations pour des ensembles de séquences courtes ou pour des ensembles de mots surrepresentés ou sous-représentés dans de longues séquences geenomiques. Ce résultat s'applique à la détection de mots exceptionnels dans des séquences génomiques

    Large deviation properties for patterns

    No full text
    Presentation at LSD&LAW 2012. To appear in JDA.International audienceDeciding whether a given pattern is overrepresented or under-represented according to a given background model is a key question in computational biology. Such a decision is usually made by computing some p-values re ecting the \exceptionality" of a pattern in a given sequence or set of sequences. In the simplest cases (short and simple patterns, simple background model, small number of sequences), an exact p-value can be computed with a tractable complexity. The realistic cases are in general too complicated to get such an exact p-value. Approximations are thus proposed (Gaussian, Poisson, Large deviation approximations). These approximations are applicable under some conditions: Gaussian approximations are valid in the central domain while Poisson and Large deviations approximations are valid for rare events. In the present paper, we prove a large deviation approximation to the double strands counting problem that refers to a counting of a given pattern in a set of sequences that arise from both strands of the genome. Here dependencies between a sequence and its complement plays a fundamental role. General combinatorial properties of the pattern allow to deal with such a dependency. A large deviation result is also provided for a set of small sequences.Ce papier presente des résultarts de grande déviation sur les mots. Le premier cas traité correspond au comptage de deux mots, ce qui couvre le cas important de la recherche d'un motif sur deux brins complémentaires de l'ADN, Le second cas est celui de la recherche d'un ensemble fini de mots dans un ensemble de séquences courtes, avec des probabilités d'apparition différentes
    corecore