88 research outputs found
Discovery of Single Nucleotide Polymorphisms and Mutations by Pyrosequencing
Comparative genomics, analyzing variation among individual genomes, is an area of
intense investigation. DNA sequencing is usually employed to look for polymorphisms and
mutations. Pyrosequencing, a real-time DNA sequencing method, is emerging as a popular
platform for comparative genomics. Here we review the use of this technology for mutation
scanning, polymorphism discovery and chemical haplotyping. We describe the methodology
and accuracy of this technique and discuss how to reduce the cost for large-scale analysis
High Throughput Automated Allele Frequency Estimation by Pyrosequencing
Pyrosequencing is a DNA sequencing method based on the principle of sequencing-by-synthesis and pyrophosphate detection through a series of enzymatic reactions. This bioluminometric, real-time DNA sequencing technique offers unique applications that are cost-effective and user-friendly. In this study, we have combined a number of methods to develop an accurate, robust and cost efficient method to determine allele frequencies in large populations for association studies. The assay offers the advantage of minimal systemic sampling errors, uses a general biotin amplification approach, and replaces dTTP for dATP-apha-thio to avoid non-uniform higher peaks in order to increase accuracy. We demonstrate that this newly developed assay is a robust, cost-effective, accurate and reproducible approach for large-scale genotyping of DNA pools. We also discuss potential improvements of the software for more accurate allele frequency analysis
Recommended from our members
Viral Population Estimation Using Pyrosequencing
The diversity of virus populations within single infected hosts presents a major difficulty for the natural immune response as well as for vaccine design and antiviral drug therapy. Recently developed pyrophosphate-based sequencing technologies (pyrosequencing) can be used for quantifying this diversity by ultra-deep sequencing of virus samples. We present computational methods for the analysis of such sequence data and apply these techniques to pyrosequencing data obtained from HIV populations within patients harboring drug-resistant virus strains. Our main result is the estimation of the population structure of the sample from the pyrosequencing reads. This inference is based on a statistical approach to error correction, followed by a combinatorial algorithm for constructing a minimal set of haplotypes that explain the data. Using this set of explaining haplotypes, we apply a statistical model to infer the frequencies of the haplotypes in the population via an expectation–maximization (EM) algorithm. We demonstrate that pyrosequencing reads allow for effective population reconstruction by extensive simulations and by comparison to 165 sequences obtained directly from clonal sequencing of four independent, diverse HIV populations. Thus, pyrosequencing can be used for cost-effective estimation of the structure of virus populations, promising new insights into viral evolutionary dynamics and disease control strategies.</p
Bacterial flora-typing with targeted, chip-based Pyrosequencing
<p>Abstract</p> <p>Background</p> <p>The metagenomic analysis of microbial communities holds the potential to improve our understanding of the role of microbes in clinical conditions. Recent, dramatic improvements in DNA sequencing throughput and cost will enable such analyses on individuals. However, such advances in throughput generally come at the cost of shorter read-lengths, limiting the discriminatory power of each read. In particular, classifying the microbial content of samples by sequencing the < 1,600 bp 16S rRNA gene will be affected by such limitations.</p> <p>Results</p> <p>We describe a method for identifying the phylogenetic content of bacterial samples using high-throughput Pyrosequencing targeted at the 16S rRNA gene. Our analysis is adapted to the shorter read-lengths of such technology and uses a database of 16S rDNA to determine the most specific phylogenetic classification for reads, resulting in a weighted phylogenetic tree characterizing the content of the sample. We present results for six samples obtained from the human vagina during pregnancy that corroborates previous studies using conventional techniques.</p> <p>Next, we analyze the power of our method to classify reads at each level of the phylogeny using simulation experiments. We assess the impacts of read-length and database completeness on our method, and predict how we do as technology improves and more bacteria are sequenced. Finally, we study the utility of targeting specific 16S variable regions and show that such an approach considerably improves results for certain types of microbial samples. Using simulation, our method can be used to determine the most informative variable region.</p> <p>Conclusion</p> <p>This study provides positive validation of the effectiveness of targeting 16S metagenomes using short-read sequencing technology. Our methodology allows us to infer the most specific assignment of the sequence reads within the phylogeny, and to identify the most discriminative variable region to target. The analysis of high-throughput Pyrosequencing on human flora samples will accelerate the study of the relationship between the microbial world and ourselves.</p
Whole genome survey of coding SNPs reveals a reproducible pathway determinant of Parkinson disease
It is quickly becoming apparent that situating human variation in a pathway context is crucial to understanding its phenotypic significance. Toward this end, we have developed a general method for finding pathways associated with traits that control for pathway size. We have applied this method to a new whole genome survey of coding SNP variation in 187 patients afflicted with Parkinson disease (PD) and 187 controls. We show that our dataset provides an independent replication of the axon guidance association recently reported by Lesnick et al. [PLoS Genet 2007;3:e98], and also indicates that variation in the ubiquitin-mediated proteolysis and T-cell receptor signaling pathways may predict PD susceptibility. Given this result, it is reasonable to hypothesize that pathway associations are more replicable than individual SNP associations in whole genome association studies. However, this hypothesis is complicated by a detailed comparison of our dataset to the second recent PD association study by Fung et al. [Lancet Neurol 2006;5:911–916]. Surprisingly, we find that the axon guidance pathway does not rank at the very top of the Fung dataset after controlling for pathway size. More generally, in comparing the studies, we find that SNP frequencies replicate well despite technologically different assays, but that both SNP and pathway associations are globally uncorrelated across studies. We thus have a situation in which an association between axon guidance pathway variation and PD has been found in 2 out of 3 studies. We conclude by relating this seeming inconsistency to the molecular heterogeneity of PD, and suggest future analyses that may resolve such discrepancies
Ontology-Based Meta-Analysis of Global Collections of High-Throughput Public Data
The investigation of the interconnections between the molecular and genetic events that govern biological systems is essential if we are to understand the development of disease and design effective novel treatments. Microarray and next-generation sequencing technologies have the potential to provide this information. However, taking full advantage of these approaches requires that biological connections be made across large quantities of highly heterogeneous genomic datasets. Leveraging the increasingly huge quantities of genomic data in the public domain is fast becoming one of the key challenges in the research community today.We have developed a novel data mining framework that enables researchers to use this growing collection of public high-throughput data to investigate any set of genes or proteins. The connectivity between molecular states across thousands of heterogeneous datasets from microarrays and other genomic platforms is determined through a combination of rank-based enrichment statistics, meta-analyses, and biomedical ontologies. We address data quality concerns through dataset replication and meta-analysis and ensure that the majority of the findings are derived using multiple lines of evidence. As an example of our strategy and the utility of this framework, we apply our data mining approach to explore the biology of brown fat within the context of the thousands of publicly available gene expression datasets.Our work presents a practical strategy for organizing, mining, and correlating global collections of large-scale genomic data to explore normal and disease biology. Using a hypothesis-free approach, we demonstrate how a data-driven analysis across very large collections of genomic data can reveal novel discoveries and evidence to support existing hypothesis
Defining KIR and HLA Class I Genotypes at Highest Resolution via High-Throughput Sequencing.
The physiological functions of natural killer (NK) cells in human immunity and reproduction depend upon diverse interactions between killer cell immunoglobulin-like receptors (KIRs) and their HLA class I ligands: HLA-A, HLA-B, and HLA-C. The genomic regions containing the KIR and HLA class I genes are unlinked, structurally complex, and highly polymorphic. They are also strongly associated with a wide spectrum of diseases, including infections, autoimmune disorders, cancers, and pregnancy disorders, as well as the efficacy of transplantation and other immunotherapies. To facilitate study of these extraordinary genes, we developed a method that captures, sequences, and analyzes the 13 KIR genes and HLA-A, HLA-B, and HLA-C from genomic DNA. We also devised a bioinformatics pipeline that attributes sequencing reads to specific KIR genes, determines copy number by read depth, and calls high-resolution genotypes for each KIR gene. We validated this method by using DNA from well-characterized cell lines, comparing it to established methods of HLA and KIR genotyping, and determining KIR genotypes from 1000 Genomes sequence data. This identified 116 previously uncharacterized KIR alleles, which were all demonstrated to be authentic by sequencing from source DNA via standard methods. Analysis of just two KIR genes showed that 22% of the 1000 Genomes individuals have a previously uncharacterized allele or a structural variant. The method we describe is suited to the large-scale analyses that are needed for characterizing human populations and defining the precise HLA and KIR factors associated with disease. The methods are applicable to other highly polymorphic genes.This study was supported by U.S. National Institutes of Health grants U01 AI090905, R01 20 GM109030, R01 AI17892 and U19 AI119350. Authors Steven Norberg and Mostafa Ronaghi are 21 employees of Illumina Inc.This is the author accepted manuscript. It is currently under an indefinite embargo pending publication by Elsevier
Viral population estimation using pyrosequencing
The diversity of virus populations within single infected hosts presents a
major difficulty for the natural immune response as well as for vaccine design
and antiviral drug therapy. Recently developed pyrophosphate based sequencing
technologies (pyrosequencing) can be used for quantifying this diversity by
ultra-deep sequencing of virus samples. We present computational methods for
the analysis of such sequence data and apply these techniques to pyrosequencing
data obtained from HIV populations within patients harboring drug resistant
virus strains. Our main result is the estimation of the population structure of
the sample from the pyrosequencing reads. This inference is based on a
statistical approach to error correction, followed by a combinatorial algorithm
for constructing a minimal set of haplotypes that explain the data. Using this
set of explaining haplotypes, we apply a statistical model to infer the
frequencies of the haplotypes in the population via an EM algorithm. We
demonstrate that pyrosequencing reads allow for effective population
reconstruction by extensive simulations and by comparison to 165 sequences
obtained directly from clonal sequencing of four independent, diverse HIV
populations. Thus, pyrosequencing can be used for cost-effective estimation of
the structure of virus populations, promising new insights into viral
evolutionary dynamics and disease control strategies.Comment: 23 pages, 13 figure
Highly parallel oligonucleotide purification and functionalization using reversible chemistry
We have developed a cost-effective, highly parallel method for purification and functionalization of 5′-labeled oligonucleotides. The approach is based on 5′-hexa-His phase tag purification, followed by exchange of the hexa-His tag for a functional group using reversible reaction chemistry. These methods are suitable for large-scale (micromole to millimole) production of oligonucleotides and are amenable to highly parallel processing of many oligonucleotides individually or in high complexity pools. Examples of the preparation of 5′-biotin, 95-mer, oligonucleotide pools of >40K complexity at micromole scale are shown. These pools are prepared in up to ~16% yield and 90–99% purity. Approaches for using this method in other applications are also discussed
- …