16 research outputs found
Integrating Sequencing Technologies in Personal Genomics: Optimal Low Cost Reconstruction of Structural Variants
The goal of human genome re-sequencing is obtaining an accurate assembly of an individual's genome. Recently, there has been great excitement in the development of many technologies for this (e.g. medium and short read sequencing from companies such as 454 and SOLiD, and high-density oligo-arrays from Affymetrix and NimbelGen), with even more expected to appear. The costs and sensitivities of these technologies differ considerably from each other. As an important goal of personal genomics is to reduce the cost of re-sequencing to an affordable point, it is worthwhile to consider optimally integrating technologies. Here, we build a simulation toolbox that will help us optimally combine different technologies for genome re-sequencing, especially in reconstructing large structural variants (SVs). SV reconstruction is considered the most challenging step in human genome re-sequencing. (It is sometimes even harder than de novo assembly of small genomes because of the duplications and repetitive sequences in the human genome.) To this end, we formulate canonical problems that are representative of issues in reconstruction and are of small enough scale to be computationally tractable and simulatable. Using semi-realistic simulations, we show how we can combine different technologies to optimally solve the assembly at low cost. With mapability maps, our simulations efficiently handle the inhomogeneous repeat-containing structure of the human genome and the computational complexity of practical assembly algorithms. They quantitatively show how combining different read lengths is more cost-effective than using one length, how an optimal mixed sequencing strategy for reconstructing large novel SVs usually also gives accurate detection of SNPs/indels, how paired-end reads can improve reconstruction efficiency, and how adding in arrays is more efficient than just sequencing for disentangling some complex SVs. Our strategy should facilitate the sequencing of human genomes at maximum accuracy and low cost
The Natural Product Domain Seeker NaPDoS: A Phylogeny Based Bioinformatic Tool to Classify Secondary Metabolite Gene Diversity
New bioinformatic tools are needed to analyze the growing volume of DNA sequence data. This is especially true in the case of secondary metabolite biosynthesis, where the highly repetitive nature of the associated genes creates major challenges for accurate sequence assembly and analysis. Here we introduce the web tool Natural Product Domain Seeker (NaPDoS), which provides an automated method to assess the secondary metabolite biosynthetic gene diversity and novelty of strains or environments. NaPDoS analyses are based on the phylogenetic relationships of sequence tags derived from polyketide synthase (PKS) and non-ribosomal peptide synthetase (NRPS) genes, respectively. The sequence tags correspond to PKS-derived ketosynthase domains and NRPS-derived condensation domains and are compared to an internal database of experimentally characterized biosynthetic genes. NaPDoS provides a rapid mechanism to extract and classify ketosynthase and condensation domains from PCR products, genomes, and metagenomic datasets. Close database matches provide a mechanism to infer the generalized structures of secondary metabolites while new phylogenetic lineages provide targets for the discovery of new enzyme architectures or mechanisms of secondary metabolite assembly. Here we outline the main features of NaPDoS and test it on four draft genome sequences and two metagenomic datasets. The results provide a rapid method to assess secondary metabolite biosynthetic gene diversity and richness in organisms or environments and a mechanism to identify genes that may be associated with uncharacterized biochemistry
Genome Sequence and Transcriptome Analysis of the Radioresistant Bacterium Deinococcus gobiensis: Insights into the Extreme Environmental Adaptations
The desert is an excellent model for studying evolution under extreme environments. We present here the complete genome and ultraviolet (UV) radiation-induced transcriptome of Deinococcus gobiensis I-0, which was isolated from the cold Gobi desert and shows higher tolerance to gamma radiation and UV light than all other known microorganisms. Nearly half of the genes in the genome encode proteins of unknown function, suggesting that the extreme resistance phenotype may be attributed to unknown genes and pathways. D. gobiensis also contains a surprisingly large number of horizontally acquired genes and predicted mobile elements of different classes, which is indicative of adaptation to extreme environments through genomic plasticity. High-resolution RNA-Seq transcriptome analyses indicated that 30 regulatory proteins, including several well-known regulators and uncharacterized protein kinases, and 13 noncoding RNAs were induced immediately after UV irradiation. Particularly interesting is the UV irradiation induction of the phrB and recB genes involved in photoreactivation and recombinational repair, respectively. These proteins likely include key players in the immediate global transcriptional response to UV irradiation. Our results help to explain the exceptional ability of D. gobiensis to withstand environmental extremes of the Gobi desert, and highlight the metabolic features of this organism that have biotechnological potential
BAC-pool sequencing and analysis of large segments of A12 and D12 homoeologous chromosomes in upland cotton.
Acknowledgments
“Dedicated to Dr. Ramesh Kantety, a mentor, colleague and friend”. We would like to acknowledge the support offered by Padmini Sripathi during data analysis and submissions.
Author Contributions
Conceived and designed the experiments: RVK JZY. Performed the experiments: RB ZX SM GBW. Analyzed the data: RB. Contributed reagents/materials/analysis tools: RVK RB JZY RJK BAR. Wrote the manuscript: RB. Revised the manuscript: RB RVK JZY RGP BAR GCS. Advised the research: RVK JZY RGP BAR GCS.Author Contributions
Conceived and designed the experiments: RVK JZY. Performed the experiments: RB ZX SM GBW. Analyzed the data: RB. Contributed reagents/materials/analysis tools: RVK RB JZY RJK BAR. Wrote the manuscript: RB. Revised the manuscript: RB RVK JZY RGP BAR GCS. Advised the research: RVK JZY RGP BAR GCS.Although new and emerging next-generation sequencing (NGS) technologies have reduced sequencing costs significantly, much work remains to implement them for de novo sequencing of complex and highly repetitive genomes such as the tetraploid genome of Upland cotton (Gossypium hirsutum L.). Herein we report the results from implementing a novel, hybrid Sanger/454-based BAC-pool sequencing strategy using minimum tiling path (MTP) BACs from Ctg-3301 and Ctg-465, two large genomic segments in A12 and D12 homoeologous chromosomes (Ctg). To enable generation of longer contig sequences in assembly, we implemented a hybrid assembly method to process ~35x data from 454 technology and 2.8-3x data from Sanger method. Hybrid assemblies offered higher sequence coverage and better sequence assemblies. Homology studies revealed the presence of retrotransposon regions like Copia and Gypsy elements in these contigs and also helped in identifying new genomic SSRs. Unigenes were anchored to the sequences in Ctg-3301 and Ctg-465 to support the physical map. Gene density, gene structure and protein sequence information derived from protein prediction programs were used to obtain the functional annotation of these genes. Comparative analysis of both contigs with Arabidopsis genome exhibited synteny and microcollinearity with a conserved gene order in both genomes. This study provides insight about use of MTP-based BAC-pool sequencing approach for sequencing complex polyploid genomes with limited constraints in generating better sequence assemblies to build reference scaffold sequences. Combining the utilities of MTP-based BAC-pool sequencing with current longer and short read NGS technologies in multiplexed format would provide a new direction to cost-effectively and precisely sequence complex plant genomes.Yeshttp://www.plosone.org/static/editorial#pee
