16 research outputs found

    Assembly statistics on three types of simulated data of human chromosome 22.

    No full text
    <p>Assembly statistics on three types of simulated data of human chromosome 22.</p

    Statistics of <i>P. brasiliensis</i> assemblies by four assemblers on three types of real data.

    No full text
    <p>Statistics of <i>P. brasiliensis</i> assemblies by four assemblers on three types of real data.</p

    Optimizing Information in Next-Generation-Sequencing (NGS) Reads for Improving <i>De Novo</i> Genome Assembly

    Get PDF
    <div><p>Next-Generation-Sequencing is advantageous because of its much higher data throughput and much lower cost compared with the traditional Sanger method. However, NGS reads are shorter than Sanger reads, making <i>de novo</i> genome assembly very challenging. Because genome assembly is essential for all downstream biological studies, great efforts have been made to enhance the completeness of genome assembly, which requires the presence of long reads or long distance information. To improve <i>de novo</i> genome assembly, we develop a computational program, ARF-PE, to increase the length of Illumina reads. ARF-PE takes as input Illumina paired-end (PE) reads and recovers the original DNA fragments from which two ends the paired reads are obtained. On the PE data of four bacteria, ARF-PE recovered >87% of the DNA fragments and achieved >98% of perfect DNA fragment recovery. Using Velvet, SOAPdenovo, Newbler, and CABOG, we evaluated the benefits of recovered DNA fragments to genome assembly. For all four bacteria, the recovered DNA fragments increased the assembly contiguity. For example, the N50 lengths of the <i>P. brasiliensis</i> contigs assembled by SOAPdenovo and Newbler increased from 80,524 bp to 166,573 bp and from 80,655 bp to 193,388 bp, respectively. ARF-PE also increased assembly accuracy in many cases. On the PE data of two fungi and a human chromosome, ARF-PE doubled and tripled the N50 length. However, the assembly accuracies dropped, but still remained >91%. In general, ARF-PE can increase both assembly contiguity and accuracy for bacterial genomes. For complex eukaryotic genomes, ARF-PE is promising because it raises assembly contiguity. But future error correction is needed for ARF-PE to also increase the assembly accuracy. ARF-PE is freely available at <a href="http://140.116.235.124/~tliu/arf-pe/" target="_blank">http://140.116.235.124/~tliu/arf-pe/</a>.</p></div

    Statistics of DNA fragments recovered from the real PEs of bacteria.

    No full text
    <p>On the real PE libraries of (a) <i>P. brasiliensis</i> and (b) <i>E. coli</i>, we obtain the statistics of the recovered DNA fragments in the same definitions as in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0069503#pone-0069503-t004" target="_blank">Table 4</a>.</p

    Workflow of ARF-PE.

    No full text
    <p>ARF-PE runs in three steps at its kernel (in brown rectangles). First, ARF-PE assembles PE reads into contigs using Velvet and obtains the contig graph. Second, PE reads are re-aligned to the contig sequences. ARF-PE then splits the PEs into four categories based on the mappings of the two reads. Third, ARF-PE extracts the sequences between the two mapped loci as the recovered DNA fragments of PEs. Besides the kernel, ARF-PE offers two options (in green rectangles): filtering low-complexity reads before assembly and correcting errors in contigs after assembly. See main text for details.</p

    Metrics of four bacterial assemblies by four assemblers.

    No full text
    <p>We show the corrected N50 lengths (y-axis) and accuracies (numbers on top of bars) of the four bacterial assemblies by four assemblers: (a) Velvet, (b) SOAPdenovo, (c) Newbler, and (d) CABOG. The accuracy of an assembly is defined as the ratio of the corrected N50 length to the N50 length before correcting assembly errors, and ranges from 0 to 100%. The four species are <i>C. marinum</i>, <i>E. coli</i>, <i>P. brasiliensis</i>, and <i>S. smaragdinae</i>. For each species, each assembler treats three types of data: original PE reads, recovered DNA fragments and the remaining PEs, and recovered DNA fragments and original PEs.</p

    Effects of ARF-PE parameters on genome assembly.

    No full text
    <p>SOAPdenovo (s) and Newbler (n) are used to assemble the recovered DNA fragments (r) together with the remaining (u) and original (o) PEs of four bacteria. X-axis indicates the combinations of parameter values defined in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0069503#pone-0069503-g003" target="_blank">Figure 3</a>. Y-axis shows the corrected N50 length when three types of data are assembled.</p

    Statistics of SOAPdenovo and Newbler assemblies on three types of simulated data of <i>P. brasiliensis</i>.

    No full text
    <p>Assembly accuracy is defined as the ratio in N50 length after error correction by GAGE. For each assembler and metric, the better results among the three types of data are shown in bold.</p

    Effects of ARF-PE parameters on the statistics of DNA fragment recovery.

    No full text
    <p>Several combinations of ARF-PE parameter values are used to recover DNA fragments from the real data of four bacteria. X-axis indicates the parameter values (f: filtering low-complexity reads, l: minimal number of continuous identical bases for a read to be considered as lowly-complex, ec: error correction to the initial Velvet assembly, r: minimal fraction of consensus bases required for correcting assembly errors, s: seed length of read alignment by SOAP2, v: maximal mismatches allowed in an alignment by SOAP2). Y-axis shows the percentage of DNA fragments that are recovered (red), correctly recovered (green), and perfectly recovered (blue) (the later two values are relative to the number of recovered DNA fragments).</p

    Statistics of DNA fragments recovered from the simulated PEs of bacteria.

    No full text
    <p>On the simulated PE libraries of (a) <i>P. brasiliensis</i> and (b) <i>E. coli</i>, we calculate the percentages of recovered DNA fragments via dividing by the corresponding numbers of PEs. The percentages of correctly and perfectly recovered fragments are calculated via dividing by the number of the recovered DNA fragments for each category of PEs.</p
    corecore