10 research outputs found

    Hybrid De Novo Genome Assembly Using MiSeq and SOLiD Short Read Data.

    No full text
    A hybrid de novo assembly pipeline was constructed to utilize both MiSeq and SOLiD short read data in combination in the assembly. The short read data were converted to a standard format of the pipeline, and were supplied to the pipeline components such as ABySS and SOAPdenovo. The assembly pipeline proceeded through several stages, and either MiSeq paired-end data, SOLiD mate-paired data, or both of them could be specified as input data at each stage separately. The pipeline was examined on the filamentous fungus Aspergillus oryzae RIB40, by aligning the assembly results against the reference sequences. Using both the MiSeq and the SOLiD data in the hybrid assembly, the alignment length was improved by a factor of 3 to 8, compared with the assemblies using either one of the data types. The number of the reproduced gene cluster regions encoding secondary metabolite biosyntheses (SMB) was also improved by the hybrid assemblies. These results imply that the MiSeq data with long read length are essential to construct accurate nucleotide sequences, while the SOLiD mate-paired reads with long insertion length enhance long-range arrangements of the sequences. The pipeline was also tested on the actinomycete Streptomyces avermitilis MA-4680, whose gene is known to have high-GC content. Although the quality of the SOLiD reads was too low to perform any meaningful assemblies by themselves, the alignment length to the reference was improved by a factor of 2, compared with the assembly using only the MiSeq data

    Dotplot alignments of assembled strands against the reference genome sequence of <i>A. oryzae</i>.

    No full text
    <p>Alignments shorter than 4000 bp were omitted from the plots. Forward and reverse alignments are plotted in red and blue colors, respectively. The Roman numerals I-VIII on the abscissa are the chromosome index of <i>A. oryzae</i>. (a) The MSSH assembly, (b) the denovo2 assembly.</p

    Characteristics of the <i>A. oryzae</i><sup><sup>a</sup></sup> contigs/scaffolds/strands from several assemblies<sup><sup>b</sup></sup>.

    No full text
    <p><sup>a</sup><i>S. avermitilis</i> results are omitted due to the erroneous SOLiD reads.</p><p><sup>b</sup> k-mer size is fixed at 45.</p><p>Characteristics of the <i>A. oryzae</i><sup><sup><a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0126289#t004fn001" target="_blank">a</a></sup></sup> contigs/scaffolds/strands from several assemblies<sup><sup><a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0126289#t004fn002" target="_blank">b</a></sup></sup>.</p

    Dotplot alignments of assembled strands against the reference genome sequence of <i>S. avermitilis</i>.

    No full text
    <p>Alignments shorter than 4000 bp were omitted from the plots. Forward and reverse alignments are plotted in red and blue colors, respectively. (a) The MSSH assembly, (b) the HHHH assembly.</p

    K-mer size dependence of the R50 values<sup><sup>a</sup></sup> of the <i>A. oryzae</i><sup><sup>b</sup></sup> unitigs.

    No full text
    <p><sup>a</sup> in kbp unit.</p><p><sup>b</sup><i>S. avermitilis</i> results are omitted due to the erroneous SOLiD reads.</p><p>K-mer size dependence of the R50 values<sup><sup><a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0126289#t003fn001" target="_blank">a</a></sup></sup> of the <i>A. oryzae</i><sup><sup><a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0126289#t003fn002" target="_blank">b</a></sup></sup> unitigs.</p

    Number of ORFs reproduced in the assemblies.

    No full text
    <p><sup>a</sup> Number of ORFs aligned with e-values lower than 10<sup>−100</sup>.</p><p><sup>b</sup> Total number of ORFs.</p><p>Number of ORFs reproduced in the assemblies.</p

    Overview of the hybrid assembly pipeline.

    No full text
    <p>Raw data generated by several NGS platforms are preprocessed into a common format, which is registered to the library. A data set used at each assembly stage can be specified separately. The assembly results are denoted according to the supplied data, as illustrated at the bottom of the figure.</p

    Statistical information of input short read data.

    No full text
    <p><sup>a</sup> Insertion length is estimated by mapping paired reads on the assembled results.</p><p><sup>b</sup> Total length of the reference sequences [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0126289#pone.0126289.ref026" target="_blank">26</a>, <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0126289#pone.0126289.ref029" target="_blank">29</a>].</p><p><sup>c</sup> SOLiD data are down-sampled at <i>N</i><sub>lowQ</sub> = 25 (qv25) and 6 (qv6).</p><p>Statistical information of input short read data.</p
    corecore