60 research outputs found

    Analysing 454 amplicon resequencing experiments using the modular and database oriented Variant Identification Pipeline

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Next-generation amplicon sequencing enables high-throughput genetic diagnostics, sequencing multiple genes in several patients together in one sequencing run. Currently, no open-source out-of-the-box software solution exists that reliably reports detected genetic variations and that can be used to improve future sequencing effectiveness by analyzing the PCR reactions.</p> <p>Results</p> <p>We developed an integrated database oriented software pipeline for analysis of 454/Roche GS-FLX amplicon resequencing experiments using Perl and a relational database. The pipeline enables variation detection, variation detection validation, and advanced data analysis, which provides information that can be used to optimize PCR efficiency using traditional means. The modular approach enables customization of the pipeline where needed and allows researchers to adopt their analysis pipeline to their experiments. Clear documentation and training data is available to test and validate the pipeline prior to using it on real sequencing data.</p> <p>Conclusions</p> <p>We designed an open-source database oriented pipeline that enables advanced analysis of 454/Roche GS-FLX amplicon resequencing experiments using SQL-statements. This modular database approach allows easy coupling with other pipeline modules such as variant interpretation or a LIMS system. There is also a set of standard reporting scripts available.</p

    Practical Tools to Implement Massive Parallel Pyrosequencing of PCR Products in Next Generation Molecular Diagnostics

    Get PDF
    Despite improvements in terms of sequence quality and price per basepair, Sanger sequencing remains restricted to screening of individual disease genes. The development of massively parallel sequencing (MPS) technologies heralded an era in which molecular diagnostics for multigenic disorders becomes reality. Here, we outline different PCR amplification based strategies for the screening of a multitude of genes in a patient cohort. We performed a thorough evaluation in terms of set-up, coverage and sequencing variants on the data of 10 GS-FLX experiments (over 200 patients). Crucially, we determined the actual coverage that is required for reliable diagnostic results using MPS, and provide a tool to calculate the number of patients that can be screened in a single run. Finally, we provide an overview of factors contributing to false negative or false positive mutation calls and suggest ways to maximize sensitivity and specificity, both important in a routine setting. By describing practical strategies for screening of multigenic disorders in a multitude of samples and providing answers to questions about minimum required coverage, the number of patients that can be screened in a single run and the factors that may affect sensitivity and specificity we hope to facilitate the implementation of MPS technology in molecular diagnostics

    Extensive pyrosequencing reveals frequent intra-genomic variations of internal transcribed spacer regions of nuclear ribosomal DNA

    Get PDF
    BACKGROUND: Internal transcribed spacer of nuclear ribosomal DNA (nrDNA) is already one of the most popular phylogenetic and DNA barcoding markers. However, the existence of its multiple copies has complicated such usage and a detailed characterization of intra-genomic variations is critical to address such concerns. METHODOLOGY/PRINCIPAL FINDINGS: In this study, we used sequence-tagged pyrosequencing and genome-wide analyses to characterize intra-genomic variations of internal transcribed spacer 2 (ITS2)regions from 178 plant species. We discovered that mutation of ITS2 is frequent, with a mean of 35 variants per species. And on average, three of the most abundant variants make up 91% of all ITS2 copies. Moreover, we found different congeneric species share identical variants in 13 genera. Interestingly, different species across different genera also share identical variants. In particular, one minor variant of ITS2 in Eleutherococcus giraldii was found identical to the ITS2 major variant of Panax ginseng, both from Araliaceae family. In addition, DNA barcoding gap analysis showed that the intra-genomic distances were markedly smaller than those of the intra-specific or inter-specific variants. When each of 5543 variants were examined for its species discrimination efficiency, a 97% success rate was obtained at the species level. CONCLUSIONS: Identification of identical ITS2 variants across intra-generic or inter-generic species revealed complex species evolutionary history, possibly, horizontal gene transfer and ancestral hybridization. Although intra-genomic multiple variants are frequently found within each genome, the usage of the major variants alone is sufficient for phylogeny construction and species determination in most cases. Furthermore, the inclusion of minor variants further improves the resolution of species identification.Jingyuan Song, Linchun Shi, Dezhu Li, Yongzhen Sun, Yunyun Niu, Zhiduan Chen, Hongmei Luo, Xiaohui Pang, Zhiying Sun, Chang Liu, Aiping Lv, Youping Deng, Zachary Larson-Rabin, Mike Wilkinson and Shilin Che

    Statistical methods for high-throughput genomic data

    Get PDF

    P. patens genomic and transcriptomic analyses

    Get PDF
    The model organism Physcomitrium patens, formerly Physcomitrella patens is a moss in the Funariaceae family. Due to P. patens ability to generate easily transgenic plants via homologous recombination, the interest of scientists worldwide was attracted. P. patens was the world's first completely sequenced non-seed plant genome (V1). Constant improvements of the genome assembly and the associated gene annotations resulted in the current P. patens pseudo-chromosomal genome version (V3). This genome version is the basis of all analyses performed in this thesis. Since P. patens became a U.S. Department of Energy Joint Genome Institute (DOE JGI) plant flagship genome 1 and a member of the JGI Gene Atlas project 2, hundreds of P. patens RNA-seq samples were generated. During my time as a PhD student, I analysed the JGI Gene Atlas RNA-seq samples and several dozen other RNA-seq samples from different projects. These RNA-seq samples contained data from five different P. patens ecotypes/accessions (Gransden, Kaskaskia, Reute, Villersexel, and Wisconsin).To efficiently analyse this data, I developed a powerful RNA-seq pipeline to perform differentially expressed gene (DEG) calling. The performance of the RNA-seq pipeline was tested by comparing its results to commercial software solutions and multiple RNA-seq samples from different species. My newly generated gene expression results, together with previous published expression data from a variety of other projects, were stored at our novel online tool PEATmoss. Furthermore, my gene version lookup tables were implemented in a database structure. This, allows PEATmoss users to find gene models of different gene annotation versions and to use them in PEATmoss. With an updated version of my RNA-seq pipeline, I identified and analysed sequence variations in P. patens accessions. A clear clustering by individual accessions could be shown. I could demonstrate, that due to decades of vegetative propagation in laboratories, somatic mutations have accumulated in Gransden laboratory plants. In addition, we used restriction fragment length polymorphism (RFLP) to offer a simple method for quick identification of unknown P. patens plants. 1 https://jgi.doe.gov/our-science/science-programs/plant-genomics/plant-flagship-genomes/ 2 https://jgi.doe.gov/doe-jgi-plant-flagship-gene-atlas

    On the Origin of Phenotypic Variation: Novel Technologies to Dissect Molecular Determinants of Phenotype

    Get PDF
    This thesis describes the conception, design, and development of novel computational tools, theoretical models, and experimental techniques applied to the dissection of molecular factors underlying phenotypic variation. The first part of my work is focused on finding rare genetic variants in pooled DNA samples, leading to the development of a novel set of algorithms, SNPseeker and SPLINTER, applied to next-generation sequencing data. The second part of my work describes the creation of a reporter system for DNA methylation for the purpose of dissecting the genetic contribution of tissue-specific patterns of DNA methylation across the genome. Finally the last part of my work is focused on understanding the basis of stochastic variation in gene expression with a focus on modeling and dissecting the relationship between single-cell protein variance and mean at a genome-wide scale

    The application of genomic technologies to cancer and companion diagnostics.

    Get PDF
    This thesis describes work undertaken by the author between 1996 and 2014. Genomics is the study of the genome, although it is also often used as a catchall phrase and applied to the transcriptome (study of RNAs) and methylome (study of DNA methylation). As cancer is a disease of the genome the rapid advances in genomic technology, specifically microarrays and next generation sequencing, are creating a wave of change in our understanding of its molecular pathology. Molecular pathology and personalised medicine are being driven by discoveries in genomics, and genomics is being driven by the development of faster, better and cheaper genome sequencing. The next decade is likely to see significant changes in the way cancer is managed for individual cancer patients as next generation sequencing enters the clinic. In chapter 3 I discuss how ERBB2 amplification testing for breast cancer is currently dominated by immunohistochemistry (a single-gene test); and present the development, by the author, of a semi-quantitative PCR test for ERBB2 amplification. I also show that estimating ERBB2 amplification from microarray copy-number analysis of the genome is possible. In chapter 4 I present a review of microarray comparison studies, and outline the case for careful and considered comparison of technologies when selecting a platform for use in a research study. Similar, indeed more stringent, care needs to be applied when selecting a platform for use in a clinical test. In chapter 5 I present co-authored work on the development of amplicon and exome methods for the detection and quantitation of somatic mutations in circulating tumour DNA, and demonstrate the impact this can have in understanding tumour heterogeneity and evolution during treatment. I also demonstrate how next-generation sequencing technologies may allow multiple genetic abnormalities to be analysed in a single test, and in low cellularity tumours and/or heterogenous cancers. Keywords: Genome, exome, transcriptome, amplicon, next-generation sequencing, differential gene expression, RNA-seq, ChIP-seq, microarray, ERBB2, companion diagnostic

    Biotechnologies for Plant Mutation Breeding: Protocols

    Get PDF
    Plant Breeding/Biotechnology; Agriculture; Genetic Engineering; Plant Genetics & Genomic
    • …
    corecore