3 research outputs found
A Reference-Free Algorithm for Computational Normalization of Shotgun Sequencing Data
Deep shotgun sequencing and analysis of genomes, transcriptomes, amplified
single-cell genomes, and metagenomes has enabled investigation of a wide range
of organisms and ecosystems. However, sampling variation in short-read data
sets and high sequencing error rates of modern sequencers present many new
computational challenges in data interpretation. These challenges have led to
the development of new classes of mapping tools and {\em de novo} assemblers.
These algorithms are challenged by the continued improvement in sequencing
throughput. We here describe digital normalization, a single-pass computational
algorithm that systematizes coverage in shotgun sequencing data sets, thereby
decreasing sampling variation, discarding redundant data, and removing the
majority of errors. Digital normalization substantially reduces the size of
shotgun data sets and decreases the memory and time requirements for {\em de
novo} sequence assembly, all without significantly impacting content of the
generated contigs. We apply digital normalization to the assembly of microbial
genomic data, amplified single-cell genomic data, and transcriptomic data. Our
implementation is freely available for use and modification
A New Chicken Genome Assembly Provides Insight into Avian Genome Structure
The importance of the Gallus gallus (chicken) as a model organism and agricultural animal merits a continuation of sequence assembly improvement efforts. We present a new version of the chicken genome assembly (Gallus_gallus-5.0; GCA_000002315.3), built from combined long single molecule sequencing technology, finished BACs, and improved physical maps. In overall assembled bases, we see a gain of 183 Mb, including 16.4 Mb in placed chromosomes with a corresponding gain in the percentage of intact repeat elements characterized. Of the 1.21 Gb genome, we include three previously missing autosomes, GGA30, 31, and 33, and improve sequence contig length 10-fold over the previous Gallus_gallus-4.0. Despite the significant base representation improvements made, 138 Mb of sequence is not yet located to chromosomes. When annotated for gene content, Gallus_gallus-5.0 shows an increase of 4679 annotated genes (2768 noncoding and 1911 protein-coding) over those in Gallus_gallus-4.0. We also revisited the question of what genes are missing in the avian lineage, as assessed by the highest quality avian genome assembly to date, and found that a large fraction of the original set of missing genes are still absent in sequenced bird species. Finally, our new data support a detailed map of MHC-B, encompassing two segments: one with a highly stable gene copy number and another in which the gene copy number is highly variable. The chicken model has been a critical resource for many other fields of study, and this new reference assembly will substantially further these efforts
Recommended from our members
A New Chicken Genome Assembly Provides Insight into Avian Genome Structure
The importance of the Gallus gallus (chicken) as a model organism and agricultural animal merits a continuation of sequence assembly improvement efforts. We present a new version of the chicken genome assembly (Gallus_gallus-5.0; GCA_000002315.3), built from combined long single molecule sequencing technology, finished BACs, and improved physical maps. In overall assembled bases, we see a gain of 183 Mb, including 16.4 Mb in placed chromosomes with a corresponding gain in the percentage of intact repeat elements characterized. Of the 1.21 Gb genome, we include three previously missing autosomes, GGA30, 31, and 33, and improve sequence contig length 10-fold over the previous Gallus_gallus-4.0. Despite the significant base representation improvements made, 138 Mb of sequence is not yet located to chromosomes. When annotated for gene content, Gallus_gallus-5.0 shows an increase of 4679 annotated genes (2768 noncoding and 1911 protein-coding) over those in Gallus_gallus-4.0. We also revisited the question of what genes are missing in the avian lineage, as assessed by the highest quality avian genome assembly to date, and found that a large fraction of the original set of missing genes are still absent in sequenced bird species. Finally, our new data support a detailed map of MHC-B, encompassing two segments: one with a highly stable gene copy number and another in which the gene copy number is highly variable. The chicken model has been a critical resource for many other fields of study, and this new reference assembly will substantially further these efforts