79 research outputs found

    Ploidy estimation for an artificially combined wheat dataset.

    No full text
    <p>Reads from the large arms of chromosomes 5A, 5B and 5D were pooled, assembled and used for ploidy estimation. Only contigs with average coverage of 10X or above, and for which the individual ploidy in a given subgenome was estimated to be one were considered.</p><p><sup>a</sup>Based on BLAST alignments to the individual assemblies.</p><p>Ploidy estimation for an artificially combined wheat dataset.</p

    The graphical model for ploidy estimation and variant calls.

    No full text
    <p>Each node represents a variable. Edges represent probabilistic dependencies. Each node is associated with a probability distribution of the corresponding variable conditioned on the variables corresponding to its parents. Variables within the same plate (rectangle) are replicated according to the number of positions in a contig (the “Positions” rectangle) or the number of reads overlapping a given position of a given contig (the “Reads” rectangle). Shaded variables represent the HiSeq error model, which is a component of the ploidy estimation model.</p

    ConPADE: Genome Assembly Ploidy Estimation from Next-Generation Sequencing Data

    No full text
    <div><p>As a result of improvements in genome assembly algorithms and the ever decreasing costs of high-throughput sequencing technologies, new high quality draft genome sequences are published at a striking pace. With well-established methodologies, larger and more complex genomes are being tackled, including polyploid plant genomes. Given the similarity between multiple copies of a basic genome in polyploid individuals, assembly of such data usually results in collapsed contigs that represent a variable number of homoeologous genomic regions. Unfortunately, such collapse is often not ideal, as keeping contigs separate can lead both to improved assembly and also insights about how haplotypes influence phenotype. Here, we describe a first step in avoiding inappropriate collapse during assembly. In particular, we describe ConPADE (Contig Ploidy and Allele Dosage Estimation), a probabilistic method that estimates the ploidy of any given contig/scaffold based on its allele proportions. In the process, we report findings regarding errors in sequencing. The method can be used for whole genome shotgun (WGS) sequencing data. We also show applicability of the method for variant calling and allele dosage estimation. Results for simulated and real datasets are discussed and provide evidence that ConPADE performs well as long as enough sequencing coverage is available, or the true contig ploidy is low. We show that ConPADE may also be used for related applications, such as the identification of duplicated genes in fragmented assemblies, although refinements are needed.</p></div

    Length simulation results.

    No full text
    <p>Color in each cell indicates the percentage of correct ploidy calls, out of 100 simulations of contigs sequenced at 50X coverage for each ploidy level.</p

    Bacterial datasets used to learn the error model.

    No full text
    <p><sup>a</sup>For <i>K</i>. <i>oxytoca</i>, only the largest contig was used, representing approximately 96.95% of the genome.</p><p><sup>b</sup>For <i>M</i>. <i>tuberculosis</i>, we sampled a small portion of the data to avoid oversampling a single genome (original coverage for downloaded data was 5,598.69X).</p><p>Bacterial datasets used to learn the error model.</p

    Coverage simulation results.

    No full text
    <p>Color in each cell indicates the percentage of correct ploidy calls, out of 100 simulations of 200 kb-long contigs for each ploidy level.</p

    Genotype distribution for switchgrass contigs with a ploidy estimate of four.

    No full text
    <p>Bars represent the frequency of each SNP genotype, for all identified variants in contigs estimated to have ploidy four.</p

    Ploidy estimate distribution for common wheat chromosome arm 5D contigs.

    No full text
    <p>Bars represent the frequency of each ploidy estimated by ConPADE, for a set of 16,684 wheat contigs from the <i>de novo</i> assembly of chromosome arm 5D.</p

    Sequencing error probabilities.

    No full text
    <p>Observed sequencing error probability as a function of the Phred quality score (dots connected by the dotted line) and the expected error probability according to the expression 10<sup>(−<i>QS</i> /10)</sup>, where <i>QS</i> represents the quality score (solid line). There is overall agreement between empirical observations and theoretical expectation, expect for the quality score of 2.</p
    corecore