Search CORE

15 research outputs found

Scatter plots of GC content and read coverage of real Illumina data.

Author: Chi-Chuan Hwang (408053)
Chun-Hui Yu (408052)
Tsunglin Liu (33873)
Tzen-Yuh Chiang (138945)
Yen-Chun Chen (408051)
Publication venue
Publication date
Field of study

The data sets are from S. aureus USA300 (A) and S. aureus MRSA252 (B) genomes. Read coverage is normalized to the mean value, which is represented by a horizontal dashed line. A vertical dashed line denotes the mean GC content. The data points are fitted by a straight line and the slope is defined as the degree of GC bias. The two cases represent a negative and positive GC bias, respectively.</p

The Francis Crick Institute

Correlation between the degree of GC bias obtained using reference sequences and assembled contigs.

Author: Chi-Chuan Hwang (408053)
Chun-Hui Yu (408052)
Tsunglin Liu (33873)
Tzen-Yuh Chiang (138945)
Yen-Chun Chen (408051)
Publication venue
Publication date
Field of study

The correlation is calculated for thirteen Illumina data sets, including eight data sets by Edena, four data sets by Vevlet and one data set by ABySS. The high R2 value (0.88) indicates that estimating the degree of GC bias using the assembled contigs is appropriate.</p

The Francis Crick Institute

Distributions of coverage depths at all bases and at error bases.

Author: Chi-Chuan Hwang (408053)
Chun-Hui Yu (408052)
Tsunglin Liu (33873)
Tzen-Yuh Chiang (138945)
Yen-Chun Chen (408051)
Publication venue
Publication date
Field of study

Distributions of coverage depths at error bases (red curves) are compared with those at all bases (blue curves) in the Velvet assemblies of three bacterial genomes: E. coli (A), S. aureus (B), and M. tuberculosis (C), using data simulated at a strong negative (left column), zero (middle column), and strong positive (right column) GC bias.</p

The Francis Crick Institute

Effects of GC Bias in Next-Generation-Sequencing Data on De Novo Genome Assembly

Author: Chi-Chuan Hwang (408053)
Chun-Hui Yu (408052)
Tsunglin Liu (33873)
Tzen-Yuh Chiang (138945)
Yen-Chun Chen (408051)
Publication venue
Publication date: 01/01/2013
Field of study

<div>Next-generation-sequencing (NGS) has revolutionized the field of genome assembly because of its much higher data throughput and much lower cost compared with traditional Sanger sequencing. However, NGS poses new computational challenges to de novo genome assembly. Among the challenges, GC bias in NGS data is known to aggravate genome assembly. However, it is not clear to what extent GC bias affects genome assembly in general. In this work, we conduct a systematic analysis on the effects of GC bias on genome assembly. Our analyses reveal that GC bias only lowers assembly completeness when the degree of GC bias is above a threshold. At a strong GC bias, the assembly fragmentation due to GC bias can be explained by the low coverage of reads in the GC-poor or GC-rich regions of a genome. This effect is observed for all the assemblers under study. Increasing the total amount of NGS data thus rescues the assembly fragmentation because of GC bias. However, the amount of data needed for a full rescue depends on the distribution of GC contents. Both low and high coverage depths due to GC bias lower the accuracy of assembly. These pieces of information provide guidance toward a better de novo genome assembly in the presence of GC bias.</div

CiteSeerX

Directory of Open Access Journals

PubMed Central

The Francis Crick Institute

Completeness of the E.coli assemblies using data of various coverage.

Author: Chi-Chuan Hwang (408053)
Chun-Hui Yu (408052)
Tsunglin Liu (33873)
Tzen-Yuh Chiang (138945)
Yen-Chun Chen (408051)
Publication venue
Publication date
Field of study

Assembly completeness is measured by N50 length of the corrected contigs, which are output by eight assemblers when treating simulated reads of various coverage (50X, 100X, 250X, 500X, 1000X, and 2000X) at a zero (blue line) and a strong positive GC bias (slope 3.6, pink line).</p

The Francis Crick Institute

Number of “major” errors in the assemblies at a strong negative, zero, and strong positive GC bias by the eight assemblers for the three bacteria.

Author: Chi-Chuan Hwang (408053)
Chun-Hui Yu (408052)
Tsunglin Liu (33873)
Tzen-Yuh Chiang (138945)
Yen-Chun Chen (408051)
Publication venue
Publication date
Field of study

Number of “major” errors in the assemblies at a strong negative, zero, and strong positive GC bias by the eight assemblers for the three bacteria.</p

The Francis Crick Institute

Scatter plots of GC content and read coverage of data simulated with various degrees of background fluctuations.

Author: Chi-Chuan Hwang (408053)
Chun-Hui Yu (408052)
Tsunglin Liu (33873)
Tzen-Yuh Chiang (138945)
Yen-Chun Chen (408051)
Publication venue
Publication date
Field of study

The data are simulated from the E. coli genome at three degrees of background fluctuations: zero (top row), 10 (middle row), and 20 (bottom row). At each degree of background fluctuation, we simulated PE reads at a strong negative (A), zero (B), and a strong positive (C) GC bias, respectively.</p

The Francis Crick Institute

Ratio of corrected N50 length at a strong GC bias to that at no GC bias.

Author: Chi-Chuan Hwang (408053)
Chun-Hui Yu (408052)
Tsunglin Liu (33873)
Tzen-Yuh Chiang (138945)
Yen-Chun Chen (408051)
Publication venue
Publication date
Field of study

Ratio of the corrected N50 length at a strong negative GC bias (A) and a strong positive GC bias (B) to that at no GC bias when assembling the data of five species (in different colors) using eight assemblers.</p

The Francis Crick Institute

Corrected N50 length of assemblies at three background fluctuations.

Author: Chi-Chuan Hwang (408053)
Chun-Hui Yu (408052)
Tsunglin Liu (33873)
Tzen-Yuh Chiang (138945)
Yen-Chun Chen (408051)
Publication venue
Publication date
Field of study

We show the corrected N50 length in eight assemblies of three bacterial genomes: E. coli (A), S. aureus (B), and M. tuberculosis (C), using simulated data at three degrees of background fluctuations (x-axis), each at three degrees of GC bias: negative (yellow), zero (dark blue), and positive (pink).</p

The Francis Crick Institute

Correlation between the degree of GC bias and two statistics of GC contents.

Author: Chi-Chuan Hwang (408053)
Chun-Hui Yu (408052)
Tsunglin Liu (33873)
Tzen-Yuh Chiang (138945)
Yen-Chun Chen (408051)
Publication venue
Publication date
Field of study

No correlation can be observed between the degree of GC bias (y-axis) and either the mean GC content (A) or the standard deviation of GC contents (B).</p

The Francis Crick Institute

Scatter plots of GC content and read coverage of real Illumina data.

Correlation between the degree of GC bias obtained using reference sequences and assembled contigs.

Distributions of coverage depths at all bases and at error bases.

Effects of GC Bias in Next-Generation-Sequencing Data on <i>De Novo</i> Genome Assembly

Completeness of the <i>E.coli</i> assemblies using data of various coverage.

Number of “major” errors in the assemblies at a strong negative, zero, and strong positive GC bias by the eight assemblers for the three bacteria.

Scatter plots of GC content and read coverage of data simulated with various degrees of background fluctuations.

Ratio of corrected N50 length at a strong GC bias to that at no GC bias.

Corrected N50 length of assemblies at three background fluctuations.

Correlation between the degree of GC bias and two statistics of GC contents.