41 research outputs found
A pilot study for channel catfish whole genome sequencing and de novo assembly
<p>Abstract</p> <p>Background</p> <p>Recent advances in next-generation sequencing technologies have drastically increased throughput and significantly reduced sequencing costs. However, the average read lengths in next-generation sequencing technologies are short as compared with that of traditional Sanger sequencing. The short sequence reads pose great challenges for <it>de novo </it>sequence assembly. As a pilot project for whole genome sequencing of the catfish genome, here we attempt to determine the proper sequence coverage, the proper software for assembly, and various parameters used for the assembly of a BAC physical map contig spanning approximately a million of base pairs.</p> <p>Results</p> <p>A combination of low sequence coverage of 454 and Illumina sequencing appeared to provide effective assembly as reflected by a high N50 value. Using 454 sequencing alone, a sequencing depth of 18 X was sufficient to obtain the good quality assembly, whereas a 70 X Illumina appeared to be sufficient for a good quality assembly. Additional sequencing coverage after 18 X of 454 or after 70 X of Illumina sequencing does not provide significant improvement of the assembly. Considering the cost of sequencing, a 2 X 454 sequencing, when coupled to 70 X Illumina sequencing, provided an assembly of reasonably good quality. With several software tested, Newbler with a seed length of 16 and ABySS with a K-value of 60 appear to be appropriate for the assembly of 454 reads alone and Illumina paired-end reads alone, respectively. Using both 454 and Illumina paired-end reads, a hybrid assembly strategy using Newbler for initial 454 sequence assembly, Velvet for initial Illumina sequence assembly, followed by a second step assembly using MIRA provided the best assembly of the physical map contig, resulting in 193 contigs with a N50 value of 13,123 bp.</p> <p>Conclusions</p> <p>A hybrid sequencing strategy using low sequencing depth of 454 and high sequencing depth of Illumina provided the good quality assembly with high N50 value and relatively low cost. A combination of Newbler, Velvet, and MIRA can be used to assemble the 454 sequence reads and the Illumina reads effectively. The assembled sequence can serve as a resource for comparative genome analysis. Additional long reads using the third generation sequencing platforms are needed to sequence through repetitive genome regions that should further enhance the sequence assembly.</p
Comprehensive Transcriptome Study to Develop Molecular Resources of the Copepod Calanus sinicus for Their Potential Ecological Applications
Calanus sinicus Brodsky (Copepoda, Crustacea) is a dominant zooplanktonic species widely distributed in the margin seas of the Northwest Pacific Ocean. In this study, we utilized an RNA-Seq-based approach to develop molecular resources for C. sinicus. Adult samples were sequenced using the Illumina HiSeq 2000 platform. The sequencing data generated 69,751 contigs from 58.9 million filtered reads. The assembled contigs had an average length of 928.8 bp. Gene annotation allowed the identification of 43,417 unigene hits against the NCBI database. Gene ontology (GO) and KEGG pathway mapping analysis revealed various functional genes related to diverse biological functions and processes. Transcripts potentially involved in stress response and lipid metabolism were identified among these genes. Furthermore, 4,871 microsatellites and 110,137 single nucleotide polymorphisms (SNPs) were identified in the C. sinicus transcriptome sequences. SNP validation by the melting temperature (Tm)-shift method suggested that 16 primer pairs amplified target products and showed biallelic polymorphism among 30 individuals. The present work demonstrates the power of Illumina-based RNA-Seq for the rapid development of molecular resources in nonmodel species. The validated SNP set from our study is currently being utilized in an ongoing ecological analysis to support a future study of C. sinicus population genetics
Comprehensive Transcriptome Study to Develop Molecular Resources of the Copepod Calanus sinicus for Their Potential Ecological Applications
Calanus sinicus Brodsky (Copepoda, Crustacea) is a dominant zooplanktonic species widely distributed in the margin seas of the Northwest Pacific Ocean. In this study, we utilized an RNA-Seq-based approach to develop molecular resources for C. sinicus. Adult samples were sequenced using the Illumina HiSeq 2000 platform. The sequencing data generated 69,751 contigs from 58.9 million filtered reads. The assembled contigs had an average length of 928.8 bp. Gene annotation allowed the identification of 43,417 unigene hits against the NCBI database. Gene ontology (GO) and KEGG pathway mapping analysis revealed various functional genes related to diverse biological functions and processes. Transcripts potentially involved in stress response and lipid metabolism were identified among these genes. Furthermore, 4,871 microsatellites and 110,137 single nucleotide polymorphisms (SNPs) were identified in the C. sinicus transcriptome sequences. SNP validation by the melting temperature ( )-shift method suggested that 16 primer pairs amplified target products and showed biallelic polymorphism among 30 individuals. The present work demonstrates the power of Illumina-based RNA-Seq for the rapid development of molecular resources in nonmodel species. The validated SNP set from our study is currently being utilized in an ongoing ecological analysis to support a future study of C. sinicus population genetics
The association between different dimensions of social capital and cognition among older adults in China
Background: Social capital is a multidimensional concept including social trust, social support, social participation, and reciprocity, and each dimension may be associated with the cognition of older adults differently. The existing research on social capital and cognition rarely constructs a comprehensive social capital framework and compares the association between the cognition and multiple dimensions of social capital. Objective: To determine whether four social capital domains have different associations with cognition among older adults in China. Methods: Baseline and four-year follow-up data (N = 6291) from community-dwelling participants aged ≥55 years at baseline from CHARLS were used. Generalized linear regression was conducted to assess the association between four dimensions of social capital at baseline and cognition at four-year follow-up of all samples, controlling for baseline cognitive scores. Results: Both financial support (β = 0.177, p = 0.013) and reciprocity (β = 0.280, p = 0.022) at baseline were associated with better executive function four years later, and social participation (β = 0.123, p = 0.004) at baseline was associated with better episodic memory after four years. Age, gender, education and hukou modified the association between social capital and cognition (p < 0.05). Conclusion: Financial support, reciprocity and social participation was associated with cognition among Chinese older adults to different extent
Development of Molecular Resources for an Intertidal Clam, <i>Sinonovacula constricta</i>, Using 454 Transcriptome Sequencing
<div><p>Background</p><p>The razor clam <i>Sinonovacula constricta</i> is a benthic intertidal bivalve species with important commercial value. Despite its economic importance, knowledge of its transcriptome is scarce. Next generation sequencing technologies offer rapid and efficient tools for generating large numbers of sequences, which can be used to characterize the transcriptome, to develop effective molecular markers and to identify genes associated with growth, a key breeding trait.</p><p>Results</p><p>Total RNA was isolated from the mantle, gill, liver, siphon, gonad and muscular foot tissues. High-throughput deep sequencing of <i>S. constricta</i> using 454 pyrosequencing technology yielded 859,313 high-quality reads with an average read length of 489 bp. Clustering and assembly of these reads produced 16,323 contigs and 131,346 singletons with average lengths of 1,376 bp and 458 bp, respectively. Based on transcriptome sequencing, 14,615 sequences had significant matches with known genes encoding 147,669 predicted proteins. Subsequently, previously unknown growth-related genes were identified. A total of 13,563 microsatellites (SSRs) and 13,634 high-confidence single nucleotide polymorphism loci (SNPs) were discovered, of which almost half were validated.</p><p>Conclusion</p><p>De novo sequencing of the razor clam <i>S. constricta</i> transcriptome on the 454 GS FLX platform generated a large number of ESTs. Candidate growth factors and a large number of SSRs and SNPs were identified. These results will impact genetic studies of <i>S. constricta</i>.</p></div
Species matched to the annotated sequences of <i>S. constricta</i> by BLASTx.
<p>Species matched to the annotated sequences of <i>S. constricta</i> by BLASTx.</p
Distribution of simple sequence repeats (SSR) and other nucleotide repeats in the transcriptome.
<p>(A) Distribution of five nucleotide repeat types (di-, tri-, tetra-, penta-, and hexa-nucleotide repeats). (B) Distribution of tri-nucleotide repeats. (C) Distribution of di-nucleotide repeats. SSRs had at least six di-nucleotide repeats and five other repeats (tri-, tetra-, penta-, and hexa-nucleotide repeats).</p
Length distribution of total reads and contigs from the <i>S. constricta</i> transcriptome.
<p>Length distribution of total reads and contigs from the <i>S. constricta</i> transcriptome.</p