7 research outputs found

    Fractionation statistics

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Paralog reduction, the loss of duplicate genes after whole genome duplication (WGD) is a pervasive process. Whether this loss proceeds gene by gene or through deletion of multi-gene DNA segments is controversial, as is the question of fractionation bias, namely whether one homeologous chromosome is more vulnerable to gene deletion than the other.</p> <p>Results</p> <p>As a null hypothesis, we first assume deletion events, on one homeolog only, excise a geometrically distributed number of genes with unknown mean <it>Āµ</it>, and these events combine to produce deleted runs of length l, distributed approximately as a negative binomial with unknown parameter <it>r</it>, itself a random variable with distribution <it>Ļ€</it>(Ā·). A more realistic model requires deletion events on both homeologs distributed as a truncated geometric. We simulate the distribution of run lengths <it>l</it> in both models, as well as the underlying <it>Ļ€</it>(<it>r</it>), as a function of <it>Āµ</it>, and show how sampling <it>l</it> allows us to estimate <it>Āµ</it>. We apply this to data on a total of 15 genomes descended from 6 distinct WGD events and show how to correct the bias towards shorter runs caused by genome rearrangements. Because of the difficulty in deriving <it>Ļ€</it>(Ā·) analytically, we develop a deterministic recurrence to calculate each <it>Ļ€</it>(<it>r</it>) as a function of <it>Āµ</it> and the proportion of unreduced paralog pairs.</p> <p>Conclusions</p> <p>The parameter <it>Āµ</it> can be estimated based on run lengths of single-copy regions. Estimates of <it>Āµ</it> in real data do not exclude the possibility that duplicate gene deletion is largely gene by gene, although it may sometimes involve longer segments.</p

    Gene order in rosid phylogeny, inferred from pairwise syntenies among extant genomes

    Get PDF
    BACKGROUND: Ancestral gene order reconstruction for flowering plants has lagged behind developments in yeasts, insects and higher animals, because of the recency of widespread plant genome sequencing, sequencers' embargoes on public data use, paralogies due to whole genome duplication (WGD) and fractionation of undeleted duplicates, extensive paralogy from other sources, and the computational cost of existing methods. RESULTS: We address these problems, using the gene order of four core eudicot genomes (cacao, castor bean, papaya and grapevine) that have escaped any recent WGD events, and two others (poplar and cucumber) that descend from independent WGDs, in inferring the ancestral gene order of the rosid clade and those of its main subgroups, the fabids and malvids. We improve and adapt techniques including the OMG method for extracting large, paralogy-free, multiple orthologies from conflated pairwise synteny data among the six genomes and the PATHGROUPS approach for ancestral gene order reconstruction in a given phylogeny, where some genomes may be descendants of WGD events. We use the gene order evidence to evaluate the hypothesis that the order Malpighiales belongs to the malvids rather than as traditionally assigned to the fabids. CONCLUSIONS: Gene orders of ancestral eudicot species, involving 10,000 or more genes can be reconstructed in an efficient, parsimonious and consistent way, despite paralogies due to WGD and other processes. Pairwise genomic syntenies provide appropriate input to a parameter-free procedure of multiple ortholog identification followed by gene-order reconstruction in solving instances of the "small phylogeny" problem
    corecore