88 research outputs found

    Discovery of Single Nucleotide Polymorphisms and Mutations by Pyrosequencing

    Get PDF
    Comparative genomics, analyzing variation among individual genomes, is an area of intense investigation. DNA sequencing is usually employed to look for polymorphisms and mutations. Pyrosequencing, a real-time DNA sequencing method, is emerging as a popular platform for comparative genomics. Here we review the use of this technology for mutation scanning, polymorphism discovery and chemical haplotyping. We describe the methodology and accuracy of this technique and discuss how to reduce the cost for large-scale analysis

    High Throughput Automated Allele Frequency Estimation by Pyrosequencing

    Get PDF
    Pyrosequencing is a DNA sequencing method based on the principle of sequencing-by-synthesis and pyrophosphate detection through a series of enzymatic reactions. This bioluminometric, real-time DNA sequencing technique offers unique applications that are cost-effective and user-friendly. In this study, we have combined a number of methods to develop an accurate, robust and cost efficient method to determine allele frequencies in large populations for association studies. The assay offers the advantage of minimal systemic sampling errors, uses a general biotin amplification approach, and replaces dTTP for dATP-apha-thio to avoid non-uniform higher peaks in order to increase accuracy. We demonstrate that this newly developed assay is a robust, cost-effective, accurate and reproducible approach for large-scale genotyping of DNA pools. We also discuss potential improvements of the software for more accurate allele frequency analysis

    Bacterial flora-typing with targeted, chip-based Pyrosequencing

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The metagenomic analysis of microbial communities holds the potential to improve our understanding of the role of microbes in clinical conditions. Recent, dramatic improvements in DNA sequencing throughput and cost will enable such analyses on individuals. However, such advances in throughput generally come at the cost of shorter read-lengths, limiting the discriminatory power of each read. In particular, classifying the microbial content of samples by sequencing the < 1,600 bp 16S rRNA gene will be affected by such limitations.</p> <p>Results</p> <p>We describe a method for identifying the phylogenetic content of bacterial samples using high-throughput Pyrosequencing targeted at the 16S rRNA gene. Our analysis is adapted to the shorter read-lengths of such technology and uses a database of 16S rDNA to determine the most specific phylogenetic classification for reads, resulting in a weighted phylogenetic tree characterizing the content of the sample. We present results for six samples obtained from the human vagina during pregnancy that corroborates previous studies using conventional techniques.</p> <p>Next, we analyze the power of our method to classify reads at each level of the phylogeny using simulation experiments. We assess the impacts of read-length and database completeness on our method, and predict how we do as technology improves and more bacteria are sequenced. Finally, we study the utility of targeting specific 16S variable regions and show that such an approach considerably improves results for certain types of microbial samples. Using simulation, our method can be used to determine the most informative variable region.</p> <p>Conclusion</p> <p>This study provides positive validation of the effectiveness of targeting 16S metagenomes using short-read sequencing technology. Our methodology allows us to infer the most specific assignment of the sequence reads within the phylogeny, and to identify the most discriminative variable region to target. The analysis of high-throughput Pyrosequencing on human flora samples will accelerate the study of the relationship between the microbial world and ourselves.</p

    Whole genome survey of coding SNPs reveals a reproducible pathway determinant of Parkinson disease

    Get PDF
    It is quickly becoming apparent that situating human variation in a pathway context is crucial to understanding its phenotypic significance. Toward this end, we have developed a general method for finding pathways associated with traits that control for pathway size. We have applied this method to a new whole genome survey of coding SNP variation in 187 patients afflicted with Parkinson disease (PD) and 187 controls. We show that our dataset provides an independent replication of the axon guidance association recently reported by Lesnick et al. [PLoS Genet 2007;3:e98], and also indicates that variation in the ubiquitin-mediated proteolysis and T-cell receptor signaling pathways may predict PD susceptibility. Given this result, it is reasonable to hypothesize that pathway associations are more replicable than individual SNP associations in whole genome association studies. However, this hypothesis is complicated by a detailed comparison of our dataset to the second recent PD association study by Fung et al. [Lancet Neurol 2006;5:911–916]. Surprisingly, we find that the axon guidance pathway does not rank at the very top of the Fung dataset after controlling for pathway size. More generally, in comparing the studies, we find that SNP frequencies replicate well despite technologically different assays, but that both SNP and pathway associations are globally uncorrelated across studies. We thus have a situation in which an association between axon guidance pathway variation and PD has been found in 2 out of 3 studies. We conclude by relating this seeming inconsistency to the molecular heterogeneity of PD, and suggest future analyses that may resolve such discrepancies

    Ontology-Based Meta-Analysis of Global Collections of High-Throughput Public Data

    Get PDF
    The investigation of the interconnections between the molecular and genetic events that govern biological systems is essential if we are to understand the development of disease and design effective novel treatments. Microarray and next-generation sequencing technologies have the potential to provide this information. However, taking full advantage of these approaches requires that biological connections be made across large quantities of highly heterogeneous genomic datasets. Leveraging the increasingly huge quantities of genomic data in the public domain is fast becoming one of the key challenges in the research community today.We have developed a novel data mining framework that enables researchers to use this growing collection of public high-throughput data to investigate any set of genes or proteins. The connectivity between molecular states across thousands of heterogeneous datasets from microarrays and other genomic platforms is determined through a combination of rank-based enrichment statistics, meta-analyses, and biomedical ontologies. We address data quality concerns through dataset replication and meta-analysis and ensure that the majority of the findings are derived using multiple lines of evidence. As an example of our strategy and the utility of this framework, we apply our data mining approach to explore the biology of brown fat within the context of the thousands of publicly available gene expression datasets.Our work presents a practical strategy for organizing, mining, and correlating global collections of large-scale genomic data to explore normal and disease biology. Using a hypothesis-free approach, we demonstrate how a data-driven analysis across very large collections of genomic data can reveal novel discoveries and evidence to support existing hypothesis

    Defining KIR and HLA Class I Genotypes at Highest Resolution via High-Throughput Sequencing.

    Get PDF
    The physiological functions of natural killer (NK) cells in human immunity and reproduction depend upon diverse interactions between killer cell immunoglobulin-like receptors (KIRs) and their HLA class I ligands: HLA-A, HLA-B, and HLA-C. The genomic regions containing the KIR and HLA class I genes are unlinked, structurally complex, and highly polymorphic. They are also strongly associated with a wide spectrum of diseases, including infections, autoimmune disorders, cancers, and pregnancy disorders, as well as the efficacy of transplantation and other immunotherapies. To facilitate study of these extraordinary genes, we developed a method that captures, sequences, and analyzes the 13 KIR genes and HLA-A, HLA-B, and HLA-C from genomic DNA. We also devised a bioinformatics pipeline that attributes sequencing reads to specific KIR genes, determines copy number by read depth, and calls high-resolution genotypes for each KIR gene. We validated this method by using DNA from well-characterized cell lines, comparing it to established methods of HLA and KIR genotyping, and determining KIR genotypes from 1000 Genomes sequence data. This identified 116 previously uncharacterized KIR alleles, which were all demonstrated to be authentic by sequencing from source DNA via standard methods. Analysis of just two KIR genes showed that 22% of the 1000 Genomes individuals have a previously uncharacterized allele or a structural variant. The method we describe is suited to the large-scale analyses that are needed for characterizing human populations and defining the precise HLA and KIR factors associated with disease. The methods are applicable to other highly polymorphic genes.This study was supported by U.S. National Institutes of Health grants U01 AI090905, R01 20 GM109030, R01 AI17892 and U19 AI119350. Authors Steven Norberg and Mostafa Ronaghi are 21 employees of Illumina Inc.This is the author accepted manuscript. It is currently under an indefinite embargo pending publication by Elsevier

    Viral population estimation using pyrosequencing

    Get PDF
    The diversity of virus populations within single infected hosts presents a major difficulty for the natural immune response as well as for vaccine design and antiviral drug therapy. Recently developed pyrophosphate based sequencing technologies (pyrosequencing) can be used for quantifying this diversity by ultra-deep sequencing of virus samples. We present computational methods for the analysis of such sequence data and apply these techniques to pyrosequencing data obtained from HIV populations within patients harboring drug resistant virus strains. Our main result is the estimation of the population structure of the sample from the pyrosequencing reads. This inference is based on a statistical approach to error correction, followed by a combinatorial algorithm for constructing a minimal set of haplotypes that explain the data. Using this set of explaining haplotypes, we apply a statistical model to infer the frequencies of the haplotypes in the population via an EM algorithm. We demonstrate that pyrosequencing reads allow for effective population reconstruction by extensive simulations and by comparison to 165 sequences obtained directly from clonal sequencing of four independent, diverse HIV populations. Thus, pyrosequencing can be used for cost-effective estimation of the structure of virus populations, promising new insights into viral evolutionary dynamics and disease control strategies.Comment: 23 pages, 13 figure

    Highly parallel oligonucleotide purification and functionalization using reversible chemistry

    Get PDF
    We have developed a cost-effective, highly parallel method for purification and functionalization of 5′-labeled oligonucleotides. The approach is based on 5′-hexa-His phase tag purification, followed by exchange of the hexa-His tag for a functional group using reversible reaction chemistry. These methods are suitable for large-scale (micromole to millimole) production of oligonucleotides and are amenable to highly parallel processing of many oligonucleotides individually or in high complexity pools. Examples of the preparation of 5′-biotin, 95-mer, oligonucleotide pools of >40K complexity at micromole scale are shown. These pools are prepared in up to ~16% yield and 90–99% purity. Approaches for using this method in other applications are also discussed
    corecore