30 research outputs found

    Assessing the feasibility of GS FLX Pyrosequencing for sequencing the Atlantic salmon genome

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>With a whole genome duplication event and wealth of biological data, salmonids are excellent model organisms for studying evolutionary processes, fates of duplicated genes and genetic and physiological processes associated with complex behavioral phenotypes. It is surprising therefore, that no salmonid genome has been sequenced. Atlantic salmon (<it>Salmo salar</it>) is a good representative salmonid for sequencing given its importance in aquaculture and the genomic resources available. However, the size and complexity of the genome combined with the lack of a sequenced reference genome from a closely related fish makes assembly challenging. Given the cost and time limitations of Sanger sequencing as well as recent improvements to next generation sequencing technologies, we examined the feasibility of using the Genome Sequencer (GS) FLX pyrosequencing system to obtain the sequence of a salmonid genome. Eight pooled BACs belonging to a minimum tiling path covering ~1 Mb of the Atlantic salmon genome were sequenced by GS FLX shotgun and Long Paired End sequencing and compared with a ninth BAC sequenced by Sanger sequencing of a shotgun library.</p> <p>Results</p> <p>An initial assembly using only GS FLX shotgun sequences (average read length 248.5 bp) with ~30× coverage allowed gene identification, but was incomplete even when 126 Sanger-generated BAC-end sequences (~0.09× coverage) were incorporated. The addition of paired end sequencing reads (additional ~26× coverage) produced a final assembly comprising 175 contigs assembled into four scaffolds with 171 gaps. Sanger sequencing of the ninth BAC (~10.5× coverage) produced nine contigs and two scaffolds. The number of scaffolds produced by the GS FLX assembly was comparable to Sanger-generated sequencing; however, the number of gaps was much higher in the GS FLX assembly.</p> <p>Conclusion</p> <p>These results represent the first use of GS FLX paired end reads for <it>de novo </it>sequence assembly. Our data demonstrated that this improved the GS FLX assemblies; however, with respect to <it>de novo </it>sequencing of complex genomes, the GS FLX technology is limited to gene mining and establishing a set of ordered sequence contigs. Currently, for a salmonid reference sequence, it appears that a substantial portion of sequencing should be done using Sanger technology.</p

    The Cassava Genome: Current Progress, Future Directions

    Get PDF
    The starchy swollen roots of cassava provide an essential food source for nearly a billion people, as well as possibilities for bioenergy, yet improvements to nutritional content and resistance to threatening diseases are currently impeded. A 454-based whole genome shotgun sequence has been assembled, which covers 69% of the predicted genome size and 96% of protein-coding gene space, with genome finishing underway. The predicted 30,666 genes and 3,485 alternate splice forms are supported by 1.4 M expressed sequence tags (ESTs). Maps based on simple sequence repeat (SSR)-, and EST-derived single nucleotide polymorphisms (SNPs) already exist. Thanks to the genome sequence, a high-density linkage map is currently being developed from a cross between two diverse cassava cultivars: one susceptible to cassava brown streak disease; the other resistant. An efficient genotyping-by-sequencing (GBS) approach is being developed to catalog SNPs both within the mapping population and among diverse African farmer-preferred varieties of cassava. These resources will accelerate marker-assisted breeding programs, allowing improvements in disease-resistance and nutrition, and will help us understand the genetic basis for disease resistance

    Comprehensive resequence analysis of a 136 kb region of human chromosome 8q24 associated with prostate and colon cancers

    Get PDF
    Recently, genome-wide association studies have identified loci across a segment of chromosome 8q24 (128,100,000–128,700,000) associated with the risk of breast, colon and prostate cancers. At least three regions of 8q24 have been independently associated with prostate cancer risk; the most centromeric of which appears to be population specific. Haplotypes in two contiguous but independent loci, marked by rs6983267 and rs1447295, have been identified in the Cancer Genetic Markers of Susceptibility project (http://cgems.cancer.gov), which genotyped more than 5,000 prostate cancer cases and 5,000 controls of European origin. The rs6983267 locus is also strongly associated with colorectal cancer. To ascertain a comprehensive catalog of common single-nucleotide polymorphisms (SNPs) across the two regions, we conducted a resequence analysis of 136 kb (chr8: 128,473,000–128,609,802) using the Roche/454 next-generation sequencing technology in 39 prostate cancer cases and 40 controls of European origin. We have characterized a comprehensive catalog of common (MAF > 1%) SNPs within this region, including 442 novel SNPs and have determined the pattern of linkage disequilibrium across the region. Our study has generated a detailed map of genetic variation across the region, which should be useful for choosing SNPs for fine mapping of association signals in 8q24 and investigations of the functional consequences of select common variants

    High-throughput 454 resequencing for allele discovery and recombination mapping in Plasmodium falciparum

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Knowledge of the origins, distribution, and inheritance of variation in the malaria parasite (<it>Plasmodium falciparum</it>) genome is crucial for understanding its evolution; however the 81% (A+T) genome poses challenges to high-throughput sequencing technologies. We explore the viability of the Roche 454 Genome Sequencer FLX (GS FLX) high throughput sequencing technology for both whole genome sequencing and fine-resolution characterization of genetic exchange in malaria parasites.</p> <p>Results</p> <p>We present a scheme to survey recombination in the haploid stage genomes of two sibling parasite clones, using whole genome pyrosequencing that includes a sliding window approach to predict recombination breakpoints. Whole genome shotgun (WGS) sequencing generated approximately 2 million reads, with an average read length of approximately 300 bp. <it>De novo </it>assembly using a combination of WGS and 3 kb paired end libraries resulted in contigs ≤ 34 kb. More than 8,000 of the 24,599 SNP markers identified between parents were genotyped in the progeny, resulting in a marker density of approximately 1 marker/3.3 kb and allowing for the detection of previously unrecognized crossovers (COs) and many non crossover (NCO) gene conversions throughout the genome.</p> <p>Conclusions</p> <p>By sequencing the 23 Mb genomes of two haploid progeny clones derived from a genetic cross at more than 30× coverage, we captured high resolution information on COs, NCOs and genetic variation within the progeny genomes. This study is the first to resequence progeny clones to examine fine structure of COs and NCOs in malaria parasites.</p

    Sequencing of diverse mandarin, pummelo and orange genomes reveals complex history of admixture during citrus domestication

    Get PDF
    Cultivated citrus are selections from, or hybrids of, wild progenitor species whose identities and contributions to citrus domestication remain controversial. Here we sequence and compare citrus genomes-a high-quality reference haploid clementine genome and mandarin, pummelo, sweet-orange and sour-orange genomes-and show that cultivated types derive from two progenitor species. Although cultivated pummelos represent selections from one progenitor species, Citrus maxima, cultivated mandarins are introgressions of C. maxima into the ancestral mandarin species Citrus reticulata. The most widely cultivated citrus, sweet orange, is the offspring of previously admixed individuals, but sour orange is an F1 hybrid of pure C. maxima and C. reticulata parents, thus implying that wild mandarins were part of the early breeding germplasm. A Chinese wild 'mandarin' diverges substantially from C. reticulata, thus suggesting the possibility of other unrecognized wild citrus species. Understanding citrus phylogeny through genome analysis clarifies taxonomic relationships and facilitates sequence-directed genetic improvement

    Sequencing of diverse mandarin, pummelo and orange genomes reveals complex history of admixture during citrus domestication

    Get PDF
    Cultivated citrus are selections from, or hybrids of, wild progenitor species whose identities and contributions to citrus domestication remain controversial. Here we sequence and compare citrus genomes-a high-quality reference haploid clementine genome and mandarin, pummelo, sweet-orange and sour-orange genomes-and show that cultivated types derive from two progenitor species. Although cultivated pummelos represent selections from one progenitor species, Citrus maxima, cultivated mandarins are introgressions of C. maxima into the ancestral mandarin species Citrus reticulata. The most widely cultivated citrus, sweet orange, is the offspring of previously admixed individuals, but sour orange is an F1 hybrid of pure C. maxima and C. reticulata parents, thus implying that wild mandarins were part of the early breeding germplasm. A Chinese wild 'mandarin' diverges substantially from C. reticulata, thus suggesting the possibility of other unrecognized wild citrus species. Understanding citrus phylogeny through genome analysis clarifies taxonomic relationships and facilitates sequence-directed genetic improvement. (Résumé d'auteur

    Comparative and Functional Genomics of Rhodococcus opacus PD630 for Biofuels Development

    Get PDF
    The Actinomycetales bacteria Rhodococcus opacus PD630 and Rhodococcus jostii RHA1 bioconvert a diverse range of organic substrates through lipid biosynthesis into large quantities of energy-rich triacylglycerols (TAGs). To describe the genetic basis of the Rhodococcus oleaginous metabolism, we sequenced and performed comparative analysis of the 9.27 Mb R. opacus PD630 genome. Metabolic-reconstruction assigned 2017 enzymatic reactions to the 8632 R. opacus PD630 genes we identified. Of these, 261 genes were implicated in the R. opacus PD630 TAGs cycle by metabolic reconstruction and gene family analysis. Rhodococcus synthesizes uncommon straight-chain odd-carbon fatty acids in high abundance and stores them as TAGs. We have identified these to be pentadecanoic, heptadecanoic, and cis-heptadecenoic acids. To identify bioconversion pathways, we screened R. opacus PD630, R. jostii RHA1, Ralstonia eutropha H16, and C. glutamicum 13032 for growth on 190 compounds. The results of the catabolic screen, phylogenetic analysis of the TAGs cycle enzymes, and metabolic product characterizations were integrated into a working model of prokaryotic oleaginy.Cambridge-MIT InstituteMassachusetts Institute of Technology. (Seed Grant program)Shell Oil CompanyNational Institute of Allergy and Infectious Diseases (U.S.)United States. National Institutes of HealthNational Institutes of Health. Department of Health and Human Services (Contract No. HHSN272200900006C

    Global mRNA decay analysis at single nucleotide resolution reveals segmental and positional degradation patterns in a Gram-positive bacterium

    Get PDF
    Background Recent years have shown a marked increase in the use of next-generation sequencing technologies for quantification of gene expression (RNA sequencing, RNA-Seq). The expression level of a gene is a function of both its rate of transcription and RNA decay, and the influence of mRNA decay rates on gene expression in genome-wide studies of Gram-positive bacteria is under-investigated. Results In this work, we employed RNA-Seq in a genome-wide determination of mRNA half-lives in the Gram-positive bacterium Bacillus cereus. By utilizing a newly developed normalization protocol, RNA-Seq was used successfully to determine global mRNA decay rates at the single nucleotide level. The analysis revealed positional degradation patterns, with mRNAs being degraded from both ends of the molecule, indicating that both 5' to 3' and 3' to 5' directions of RNA decay are present in B. cereus. Other operons showed segmental degradation patterns where specific ORFs within polycistrons were degraded at variable rates, underlining the importance of RNA processing in gene regulation. We determined the half-lives for more than 2,700 ORFs in B. cereus ATCC 10987, ranging from less than one minute to more than fifteen minutes, and showed that mRNA decay rate correlates globally with mRNA expression level, GC content, and functional class of the ORF. Conclusions To our knowledge, this study presents the first global analysis of mRNA decay in a bacterium at single nucleotide resolution. We provide a proof of principle for using RNA-Seq in bacterial mRNA decay analysis, revealing RNA processing patterns at the single nucleotide level
    corecore