22 research outputs found

    On the distribution of the number of cycles in the breakpoint graph of a random signed permutation

    Get PDF
    International audienceWe use the finite Markov chain embedding technique to obtain the distribution of the number of cycles in the breakpoint graph of a random uniform signed permutation. This further gives a very good approximation of the distribution of the reversal distance between two random genomes

    Prediction in high dimensional linear models and application to genomic selection under imperfect linkage disequilibrium

    Get PDF
    Genomic selection (GS) consists in predicting breeding values of selection candidates, using a large number of genetic markers. An important question in GS is the determination of the number of markers required for a good prediction. When the genetic map is too sparse, it is likely to observe some imperfect linkage disequilibrium: the alleles at a gene location and at a marker located nearby vary.We tackle here the problem of imperfect linkage disequilibrium and we present theoretical results regarding the accuracy criteria, the correlation between predicted value and true value. Illustrations on simulated data and on rice real data are proposed

    Compound Poisson Approximation and Testing for Gene Clusters with Multigene Families

    Get PDF
    International audienceWe present in this article a compound Poisson approximation for computing probabilities involved in significance tests for conserved genomic regions between different species. We consider the case when the conserved genomic regions are found by the reference region approach. An important aspect of our computations is the fact that we are taking into account the existence of multigene families. We obtain convergence results for the error of our approximation by using the Stein-Chen method for compound Poisson approximation

    The distribution of cycles in breakpoint graphs of signed permutations

    Get PDF
    Breakpoint graphs are ubiquitous structures in the field of genome rearrangements. Their cycle decomposition has proved useful in computing and bounding many measures of (dis)similarity between genomes, and studying the distribution of those cycles is therefore critical to gaining insight on the distributions of the genomic distances that rely on it. We extend here the work initiated by Doignon and Labarre, who enumerated unsigned permutations whose breakpoint graph contains kk cycles, to signed permutations, and prove explicit formulas for computing the expected value and the variance of the corresponding distributions, both in the unsigned case and in the signed case. We also compare these distributions to those of several well-studied distances, emphasising the cases where approximations obtained in this way stand out. Finally, we show how our results can be used to derive simpler proofs of other previously known results

    CASSIOPE: An expert system for conserved regions searches

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Understanding genome evolution provides insight into biological mechanisms. For many years comparative genomics and analysis of conserved chromosomal regions have helped to unravel the mechanisms involved in genome evolution and their implications for the study of biological systems. Detection of conserved regions (descending from a common ancestor) not only helps clarify genome evolution but also makes it possible to identify quantitative trait loci (QTLs) and investigate gene function.</p> <p>The identification and comparison of conserved regions on a genome scale is computationally intensive, making process automation essential. Three key requirements are necessary: consideration of phylogeny to identify orthologs between multiple species, frequent updating of the annotation and panel of compared genomes and computation of statistical tests to assess the significance of identified conserved gene clusters.</p> <p>Results</p> <p>We developed a modular system superimposed on a multi-agent framework, called CASSIOPE (Clever Agent System for Synteny Inheritance and Other Phenomena in Evolution). CASSIOPE automatically identifies statistically significant conserved regions between multiple genomes based on automated phylogenies and statistical testing. Conserved regions were searched for in 19 species and 1,561 hits were found. To our knowledge, CASSIOPE is the first system to date that integrates evolutionary biology-based concepts and fulfills all three key requirements stated above. All results are available at <url>http://194.57.197.245/cassiopeWeb/displayCluster?clusterId=1</url></p> <p>Conclusion</p> <p>CASSIOPE makes it possible to study conserved regions from a chosen query genetic region and to infer conserved gene clusters based on phylogenies and statistical tests assessing the significance of these conserved regions.</p> <p><b>Source code </b>is freely available, please contact: <email>[email protected]</email></p

    Measures for the exceptionality of gene order in conserved genomic regions

    Get PDF
    International audienceWe propose in this article three measures for quantifying the exceptionality of gene order in conserved genomic regions found by the reference region approach. The three measures are based on the transposition distance in the permutation group. We obtain analytic expressions for their distribution in the case of a random uniform permutation, i.e. under the null hypothesis of random gene order. Our results can be used to increase the power of the significance tests for gene clusters which take into account only the proximity of the orthologous genes and not their order

    Applications du calcul des probabilités à la recherche de régions génomiques conservées

    No full text
    This thesis is concentrated on some probability and statistical issues linked to genomic comparison. In the first part we present a compound Poisson approximation for computing probabilities involved in significance tests for conserved genomic regions found by the reference-region approach. An important aspect of our computations is the fact that we are taking into account the existence of multigene families. In the second part we propose three measures, based on the transposition distance in the symmetric group, for quantifying the exceptionality of the gene order in conserved genomic regions. We obtain analytic expressions for their distribution in the case of a random permutation. In the third part of the thesis we study the distribution of the number of cycles in the breakpoint graph of a random signed permutation. We use the Markov chain imbedding technique to obtain this distribution in terms of a product of transition matrices of a certain finite Markov chain. The knowledge of this distribution provides a very good approximation for the distribution of the reversal distance.Cette thèse se concentre sur quelques sujets de probabilités et statistique liés à la génomique comparative. Dans la première partie nous présentons une approximation de Poisson composée pour calculer des probabilités impliquées dans des tests statistiques pour la significativité des régions génomiques conservées trouvées par une approche de type région de référence.Un aspect important de notre démarche est le fait de prendre en compte l'existence des familles multigéniques. Dans la deuxième partie nous proposons trois mesures, basées sur la distance de transposition dans le groupe symétrique, pour quantifier l'exceptionalité de l'ordre des gènes dans des régions génomiques conservées. Nous avons obtenu des expressions analytiques pour leur distribution dans le cas d'une permutation aléatoire. Dans la troisième partie nous avons étudié la distribution du nombre de cycles dans le graphe des points de rupture d'une permutation signée aléatoire. Nous avons utilisé la technique ``Markov chain imbedding'' pour obtenir cette distribution en terme d'un produit de matrices de transition d'une certaine chaîne de Markov finie. La connaissance de cettedistribution fournit par la suite une très bonne approximation pour la distribution de la distance d'inversion

    Applications du calcul des probabilités à la recherche de régions genomiques conservées

    No full text
    Cette thèse se concentre sur quelques sujets de probabilités et statistique liés à la génomique comparative. Dans la première partie nous présentons une approximation de Poisson composée pour calculer des probabilités impliquées dans des tests statistiques pour la significativité des régions génomiques conservées trouvées par une approche de type région de référence. Un aspect important de notre démarche est le fait de prendre en compte l existence des familles multigéniques. Dans la deuxième partie nous proposons trois mesures, basées sur la distance de transposition dans le groupe symétrique, pour quantifier l exceptionalité de l ordre des gènes dans des régions génomiques conservées. Nous avons obtenu des expressions analytiques pour leur distribution dans le cas d une permutation aléatoire. Dans la troisième partie nous avons étudié la distribution du nombre de cycles dans le graphe des points de rupture d une permutation signée aléatoire. Nous avons utilisé la technique Markov chain imbedding pour obtenir cette distribution en terme d un produit de matrices de transition d une certaine chaîne de Markov finie. La connaissance de cette distribution fournit par la suite une très bonne approximation pour la distribution de la distance d inversion.This thesis is concentrated on some probability and statistical issues linked to genomic comparison. In the first part we present a compound Poisson approximation for computing probabilities involved in significance tests for conserved genomic regions found by the reference-region approach. An important aspect of our computations is the fact that we are taking into account the existence of multigene families. In the second part we propose three measures, based on the transposition distance in the symmetric group, for quantifying the exceptionality of the gene order in conserved genomic regions. We obtain analytic expressions for their distribution in the case of a random permutation. In the third part of the thesis we study the distribution of the number of cycles in the breakpoint graph of a random signed permutation. We use the Markov chain imbedding technique to obtain this distribution in terms of a product of transition matrices of a certain finite Markov chain. The knowledge of this distribution provides a very good approximation for the distribution of the reversal distance.AIX-MARSEILLE1-Inst.Médit.tech (130552107) / SudocSudocFranceF
    corecore