174 research outputs found

    Quantifying single nucleotide variant detection sensitivity in exome sequencing

    Get PDF
    BACKGROUND: The targeted capture and sequencing of genomic regions has rapidly demonstrated its utility in genetic studies. Inherent in this technology is considerable heterogeneity of target coverage and this is expected to systematically impact our sensitivity to detect genuine polymorphisms. To fully interpret the polymorphisms identified in a genetic study it is often essential to both detect polymorphisms and to understand where and with what probability real polymorphisms may have been missed. RESULTS: Using down-sampling of 30 deeply sequenced exomes and a set of gold-standard single nucleotide variant (SNV) genotype calls for each sample, we developed an empirical model relating the read depth at a polymorphic site to the probability of calling the correct genotype at that site. We find that measured sensitivity in SNV detection is substantially worse than that predicted from the naive expectation of sampling from a binomial. This calibrated model allows us to produce single nucleotide resolution SNV sensitivity estimates which can be merged to give summary sensitivity measures for any arbitrary partition of the target sequences (nucleotide, exon, gene, pathway, exome). These metrics are directly comparable between platforms and can be combined between samples to give “power estimates” for an entire study. We estimate a local read depth of 13X is required to detect the alleles and genotype of a heterozygous SNV 95% of the time, but only 3X for a homozygous SNV. At a mean on-target read depth of 20X, commonly used for rare disease exome sequencing studies, we predict 5–15% of heterozygous and 1–4% of homozygous SNVs in the targeted regions will be missed. CONCLUSIONS: Non-reference alleles in the heterozygote state have a high chance of being missed when commonly applied read coverage thresholds are used despite the widely held assumption that there is good polymorphism detection at these coverage levels. Such alleles are likely to be of functional importance in population based studies of rare diseases, somatic mutations in cancer and explaining the “missing heritability” of quantitative traits

    Type 2 diabetes genetic association database manually curated for the study design and odds ratio

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The prevalence of type 2 diabetes has reached epidemic proportions worldwide, and the incidence of life-threatening complications of diabetes through continued exposure of tissues to high glucose levels is increasing. Advances in genotyping technology have increased the scale and accuracy of the genotype data so that an association genetic study has expanded enormously. Consequently, it is difficult to search the published association data efficiently, and several databases on the association results have been constructed, but these databases have their limitations to researchers: some providing only genome-wide association data, some not focused on the association but more on the integrative data, and some are not user-friendly. In this study, a user-friend database of type 2 diabetes genetic association of manually curated information was constructed.</p> <p>Description</p> <p>The list of publications used in this study was collected from the HuGE Navigator, which is an online database of published genome epidemiology literature. Because type 2 diabetes genetic association database (T2DGADB) aims to provide specialized information on the genetic risk factors involved in the development of type 2 diabetes, 701 of the 1,771 publications in the type 2 Diabetes case-control study for the development of the disease were extracted.</p> <p>Conclusions</p> <p>In the database, the association results were grouped as either positive or negative. The gene and SNP names were replaced with gene symbols and rsSNP numbers, the association p-values were determined manually, and the results are displayed by graphs and tables. In addition, the study design in publications, such as the population type and size are described. This database can be used for research purposes, such as an association and functional study of type 2 diabetes related genes, and as a primary genetic resource to construct a diabetes risk test in the preparation of personalized medicine in the future.</p

    Multi-locus Test Conditional on Confirmed Effects Leads to Increased Power in Genome-wide Association Studies

    Get PDF
    Complex diseases or phenotypes may involve multiple genetic variants and interactions between genetic, environmental and other factors. Current genome-wide association studies (GWAS) mostly used single-locus analysis and had identified genetic effects with multiple confirmations. Such confirmed single-nucleotide polymorphism (SNP) effects were likely to be true genetic effects and ignoring this information in testing new effects of the same phenotype results in decreased statistical power due to increased residual variance that has a component of the omitted effects. In this study, a multi-locus association test (MLT) was proposed for GWAS analysis conditional on SNPs with confirmed effects to improve statistical power. Analytical formulae for statistical power were derived and were verified by simulation for MLT accounting for confirmed SNPs and for single-locus test (SLT) without accounting for confirmed SNPs. Statistical power of the two methods was compared by case studies with simulated and the Framingham Heart Study (FHS) GWAS data. Results showed that the MLT method had increased statistical power over SLT. In the GWAS case study on four cholesterol phenotypes and serum metabolites, the MLT method improved statistical power by 5% to 38% depending on the number and effect sizes of the conditional SNPs. For the analysis of HDL cholesterol (HDL-C) and total cholesterol (TC) of the FHS data, the MLT method conditional on confirmed SNPs from GWAS catalog and NCBI had considerably more significant results than SLT

    SMN1 dosage analysis in spinal muscular atrophy from India

    Get PDF
    BACKGROUND: Spinal muscular atrophy (SMA) represents the second most common fatal autosomal recessive disorder after cystic fibrosis. Due to the high carrier frequency, the burden of this genetic disorder is very heavy in developing countries like India. As there is no cure or effective treatment, genetic counseling becomes very important in disease management. SMN1 dosage analysis results can be utilized for identifying carriers before offering prenatal diagnosis in the context of genetic counseling. METHODS: In the present study we analyzed the carrier status of parents and sibs of proven SMA patients. In addition, SMN1 copy number was determined in suspected SMA patients and parents of children with a clinical diagnosis of SMA. RESULTS: wenty nine DNA samples were analyzed by quantitative PCR to determine the number of SMN1 gene copies present, and 17 of these were found to have one SMN1 gene copy. The parents of confirmed SMA patients were found to be obligate carriers of the disease. Dosage analysis was useful in ruling out clinical suspicion of SMA in four patients. In a family with history of a deceased floppy infant and two abortions, both parents were found to be carriers of SMA and prenatal diagnosis could be offered in future pregnancies. CONCLUSION: SMN1 copy number analysis is an important parameter for identification of couples at risk for having a child affected with SMA and reduces unwarranted prenatal diagnosis for SMA. The dosage analysis is also useful for the counseling of clinically suspected SMA with a negative diagnostic SMA test

    Routes for breaching and protecting genetic privacy

    Full text link
    We are entering the era of ubiquitous genetic information for research, clinical care, and personal curiosity. Sharing these datasets is vital for rapid progress in understanding the genetic basis of human diseases. However, one growing concern is the ability to protect the genetic privacy of the data originators. Here, we technically map threats to genetic privacy and discuss potential mitigation strategies for privacy-preserving dissemination of genetic data.Comment: Draft for comment

    A Two-Stage Random Forest-Based Pathway Analysis Method

    Get PDF
    Pathway analysis provides a powerful approach for identifying the joint effect of genes grouped into biologically-based pathways on disease. Pathway analysis is also an attractive approach for a secondary analysis of genome-wide association study (GWAS) data that may still yield new results from these valuable datasets. Most of the current pathway analysis methods focused on testing the cumulative main effects of genes in a pathway. However, for complex diseases, gene-gene interactions are expected to play a critical role in disease etiology. We extended a random forest-based method for pathway analysis by incorporating a two-stage design. We used simulations to verify that the proposed method has the correct type I error rates. We also used simulations to show that the method is more powerful than the original random forest-based pathway approach and the set-based test implemented in PLINK in the presence of gene-gene interactions. Finally, we applied the method to a breast cancer GWAS dataset and a lung cancer GWAS dataset and interesting pathways were identified that have implications for breast and lung cancers

    An Open Access Database of Genome-wide Association Results

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The number of genome-wide association studies (GWAS) is growing rapidly leading to the discovery and replication of many new disease loci. Combining results from multiple GWAS datasets may potentially strengthen previous conclusions and suggest new disease loci, pathways or pleiotropic genes. However, no database or centralized resource currently exists that contains anywhere near the full scope of GWAS results.</p> <p>Methods</p> <p>We collected available results from 118 GWAS articles into a database of 56,411 significant SNP-phenotype associations and accompanying information, making this database freely available here. In doing so, we met and describe here a number of challenges to creating an open access database of GWAS results. Through preliminary analyses and characterization of available GWAS, we demonstrate the potential to gain new insights by querying a database across GWAS.</p> <p>Results</p> <p>Using a genomic bin-based density analysis to search for highly associated regions of the genome, positive control loci (e.g., MHC loci) were detected with high sensitivity. Likewise, an analysis of highly repeated SNPs across GWAS identified replicated loci (e.g., <it>APOE</it>, <it>LPL</it>). At the same time we identified novel, highly suggestive loci for a variety of traits that did not meet genome-wide significant thresholds in prior analyses, in some cases with strong support from the primary medical genetics literature (<it>SLC16A7, CSMD1, OAS1</it>), suggesting these genes merit further study. Additional adjustment for linkage disequilibrium within most regions with a high density of GWAS associations did not materially alter our findings. Having a centralized database with standardized gene annotation also allowed us to examine the representation of functional gene categories (gene ontologies) containing one or more associations among top GWAS results. Genes relating to cell adhesion functions were highly over-represented among significant associations (p < 4.6 × 10<sup>-14</sup>), a finding which was not perturbed by a sensitivity analysis.</p> <p>Conclusion</p> <p>We provide access to a full gene-annotated GWAS database which could be used for further querying, analyses or integration with other genomic information. We make a number of general observations. Of reported associated SNPs, 40% lie within the boundaries of a RefSeq gene and 68% are within 60 kb of one, indicating a bias toward gene-centricity in the findings. We found considerable heterogeneity in information available from GWAS suggesting the wider community could benefit from standardization and centralization of results reporting.</p

    Prediction of susceptibility to major depression by a model of interactions of multiple functional genetic variants and environmental factors

    Get PDF
    Major depressive disorder (MDD) is the most common psychiatric disorder and the second overall cause of disability. Even though a significant amount of the variance in the MDD phenotype is explained by inheritance, specific genetic variants conferring susceptibility to MDD explain only a minimal proportion of MDD causality. Moreover, genome-wide association studies have only identified two small-sized effect loci that reach genome-wide significance. In this study, a group of Mexican-American patients with MDD and controls recruited for a pharmacogenetic study were genotyped for nonsynonymous single-nucleotide polymorphisms (nsSNPs) and used to explore the interactions of multiple functional genetic variants with risk-classification tree analysis. The risk-classification tree analysis model and linkage disequilibrium blocks were used to replicate exploratory findings in the database of genotypes and phenotypes (dbGaP) for major depression, and pathway analysis was performed to explore potential biological mechanisms using the branching events. In exploratory analyses, we found that risk-classification tree analysis, using 15 nsSNPs that had a nominal association with MDD diagnosis, identified multiple increased-MDD genotype clusters and significant additive interactions in combinations of genotype variants that were significantly associated with MDD. The results in the dbGaP for major depression disclosed a multidimensional dependent phenotype constituted of MDD plus significant modifiers (smoking, marriage status, age, alcohol abuse/dependence and gender), which then was used for the association tree analysis. The reconstructed tree analysis for the dbGaP data showed robust reliability and replicated most of the genes involved in the branching process found in our exploratory analyses. Pathway analysis using all six major events of branching (PSMD9, HSD3B1, BDNF, GHRHR, PDE6C and PDLIM5) was significant for positive regulation of cellular and biological processes that are relevant to growth and organ development. Our findings not only provide important insights into the biological pathways underlying innate susceptibility to MDD but also offer a predictive framework based on interactions of multiple functional genetic variants and environmental factors. These findings identify novel targets for therapeutics and for translation into preventive, clinical and personalized health care

    VKORC1 Common Variation and Bone Mineral Density in the Third National Health and Nutrition Examination Survey

    Get PDF
    Osteoporosis, defined by low bone mineral density (BMD), is common among postmenopausal women. The distribution of BMD varies across populations and is shaped by both environmental and genetic factors. Because the candidate gene vitamin K epoxide reductase complex subunit 1 (VKORC1) generates vitamin K quinone, a cofactor for the gamma-carboxylation of bone-related proteins such as osteocalcin, we hypothesized that VKORC1 genetic variants may be associated with BMD and osteoporosis in the general population. To test this hypothesis, we genotyped six VKORC1 SNPs in 7,159 individuals from the Third National Health and Nutrition Examination Survey (NHANES III). NHANES III is a nationally representative sample linked to health and lifestyle variables including BMD, which was measured using dual energy x-ray absorptiometry (DEXA) on four regions of the proximal femur. In adjusted models stratified by race/ethnicity and sex, SNPs rs9923231 and rs9934438 were associated with increased BMD (p = 0.039 and 0.024, respectively) while rs8050894 was associated with decreased BMD (p = 0.016) among non-Hispanic black males (n = 619). VKORC1 rs2884737 was associated with decreased BMD among Mexican-American males (n = 795; p = 0.004). We then tested for associations between VKORC1 SNPs and osteoporosis, but the results did not mirror the associations observed between VKORC1 and BMD, possibly due to small numbers of cases. This is the first report of VKORC1 common genetic variation associated with BMD, and one of the few reports available that investigate the genetics of BMD and osteoporosis in diverse populations
    corecore