211 research outputs found

    CNVineta: a data mining tool for large case–control copy number variation datasets

    Get PDF
    Motivation: Copy number variation (CNV), a major contributor to human genetic variation, comprises ≥ 1 kb genomic deletions and insertions. Yet, the identification of CNVs from microarray data is still hampered by high false negative and positive prediction rates due to the noisy nature of the raw data. Here, we present CNVineta, an R package for rapid data mining and visualization of CNVs in large case–control datasets genotyped with single nucleotide polymorphism oligonucleotide arrays. CNVineta is compatible with various established CNV prediction algorithms, can be used for genome-wide association analysis of rare and common CNVs and enables rapid and serial display of log2 of raw data ratios as well as B-allele frequencies for visual quality inspection. In summary, CNVineta aides in the interpretation of large-scale CNV datasets and prioritization of target regions for follow-up experiments

    Impact of diagnostic misclassification on estimation of genetic correlations using genome-wide genotypes

    Get PDF
    Disorders that share genetic risk factors often are placed in closely related diagnostic categories and treated similarly. Until recently, evidence for shared genetic etiology derived from classical research strategies – coaggregation in family and twin studies. Accumulating sufficient numbers of families was often problematic. However, in the era of genome-wide genotyping, we can now directly estimate the degree of sharing of genetic risk factors between disorders. This strategy is practical even for very rare disorders, where it is infeasible to ascertain informative families. Importantly, the estimates of genetic correlations from genome-wide genotypes are derived using such distant relatives that contamination by shared environmental factors seems unlikely. However, any method that seeks to quantify the shared etiology of disorders assumes they can be distinguished diagnostically from one another without error. Here we investigate the impact of misdiagnosis on estimates of genetic correlation both from traditional family data and from genome-wide genotypes of case–control samples from unrelated individuals. Our analyses show similar results for levels of misdiagnosis in both types of data. In both scenarios, genetic variances and heritabilities tend to be slightly underestimated but genetic correlations are overestimated, sometimes substantially so. For example, two genetically distinct but equally heritable disorders each with prevalence 1%, can generate false-positive estimates of genetic correlations of >0.2 in the presence of 10% reciprocal misdiagnosis. Strategies for minimizing the effects of misdiagnosis in cross-disorder genetic studies are discussed

    BR-squared: a practical solution to the winner’s curse in genome-wide scans

    Get PDF
    The detrimental effects of the winner’s curse, including overestimation of the genetic effects of associated variants and underestimation of sufficient sample sizes for replication studies are well-recognized in genome-wide association studies (GWAS). These effects can be expected to worsen as the field moves from GWAS into whole genome sequencing. To date, few studies have reported statistical adjustments to the naive estimates, due to the lack of suitable statistical methods and computational tools. We have developed an efficient genome-wide non-parametric method that explicitly accounts for the threshold, ranking, and allele frequency effects in whole genome scans. Here, we implement the method to provide bias-reduced estimates via bootstrap re-sampling (BR-squared) for association studies of both disease status and quantitative traits, and we report the results of applying BR-squared to GWAS of psoriasis and HbA1c. We observed over 50% reduction in the genetic effect size estimation for many associated SNPs. This translates into a greater than fourfold increase in sample size requirements for successful replication studies, which in part explains some of the apparent failures in replicating the original signals. Our analysis suggests that adjusting for the winner’s curse is critical for interpreting findings from whole genome scans and planning replication and meta-GWAS studies, as well as in attempts to translate findings into the clinical setting

    In search of causal variants: refining disease association signals using cross-population contrasts

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Genome-wide association (GWA) using large numbers of single nucleotide polymorphisms (SNPs) is now a powerful, state-of-the-art approach to mapping human disease genes. When a GWA study detects association between a SNP and the disease, this signal usually represents association with a set of several highly correlated SNPs in strong linkage disequilibrium. The challenge we address is to distinguish among these correlated loci to highlight potential functional variants and prioritize them for follow-up.</p> <p>Results</p> <p>We implemented a systematic method for testing association across diverse population samples having differing histories and LD patterns, using a logistic regression framework. The hypothesis is that important underlying biological mechanisms are shared across human populations, and we can filter correlated variants by testing for heterogeneity of genetic effects in different population samples. This approach formalizes the descriptive comparison of p-values that has typified similar cross-population fine-mapping studies to date. We applied this method to correlated SNPs in the cholinergic nicotinic receptor gene cluster <it>CHRNA5-CHRNA3-CHRNB4</it>, in a case-control study of cocaine dependence composed of 504 European-American and 583 African-American samples. Of the 10 SNPs genotyped in the r<sup>2 </sup>≥ 0.8 bin for <it>rs16969968</it>, three demonstrated significant cross-population heterogeneity and are filtered from priority follow-up; the remaining SNPs include <it>rs16969968 </it>(heterogeneity p = 0.75). Though the power to filter out rs16969968 is reduced due to the difference in allele frequency in the two groups, the results nevertheless focus attention on a smaller group of SNPs that includes the non-synonymous SNP rs16969968, which retains a similar effect size (odds ratio) across both population samples.</p> <p>Conclusion</p> <p>Filtering out SNPs that demonstrate cross-population heterogeneity enriches for variants more likely to be important and causative. Our approach provides an important and effective tool to help interpret results from the many GWA studies now underway.</p

    A two-stage meta-analysis identifies several new loci for Parkinson's Disease

    Get PDF
    A previous genome-wide association (GWA) meta-analysis of 12,386 PD cases and 21,026 controls conducted by the International Parkinson's Disease Genomics Consortium (IPDGC) discovered or confirmed 11 Parkinson's disease (PD) loci. This first analysis of the two-stage IPDGC study focused on the set of loci that passed genome-wide significance in the first stage GWA scan. However, the second stage genotyping array, the ImmunoChip, included a larger set of 1,920 SNPs selected on the basis of the GWA analysis. Here, we analyzed this set of 1,920 SNPs, and we identified five additional PD risk loci (combined p<5×10−10, PARK16/1q32, STX1B/16p11, FGF20/8p22, STBD1/4q21, and GPNMB/7p15). Two of these five loci have been suggested by previous association studies (PARK16/1q32, FGF20/8p22), and this study provides further support for these findings. Using a dataset of post-mortem brain samples assayed for gene expression (n = 399) and methylation (n = 292), we identified methylation and expression changes associated with PD risk variants in PARK16/1q32, GPNMB/7p15, and STX1B/16p11 loci, hence suggesting potential molecular mechanisms and candidate genes at these risk loci

    Copy-Number Variation: The Balance between Gene Dosage and Expression in Drosophila melanogaster

    Get PDF
    Copy-number variants (CNVs) reshape gene structure, modulate gene expression, and contribute to significant phenotypic variation. Previous studies have revealed CNV patterns in natural populations of Drosophila melanogaster and suggested that selection and mutational bias shape genomic patterns of CNV. Although previous CNV studies focused on heterogeneous strains, here, we established a number of second-chromosome substitution lines to uncover CNV characteristics when homozygous. The percentage of genes harboring CNVs is higher than found in previous studies. More CNVs are detected in homozygous than heterozygous substitution strains, suggesting the comparative genomic hybridization arrays underestimate CNV owing to heterozygous masking. We incorporated previous gene expression data collected from some of the same substitution lines to investigate relationships between CNV gene dosage and expression. Most genes present in CNVs show no evidence of increased or diminished transcription, and the fraction of such dosage-insensitive CNVs is greater in heterozygotes. More than 70% of the dosage-sensitive CNVs are recessive with undetectable effects on transcription in heterozygotes. A deficiency of singletons in recessive dosage-sensitive CNVs supports the hypothesis that most CNVs are subject to negative selection. On the other hand, relaxed purifying selection might account for the higher number of protein–protein interactions in dosage-insensitive CNVs than in dosage-sensitive CNVs. Dosage-sensitive CNVs that are upregulated and downregulated coincide with copy-number increases and decreases. Our results help clarify the relation between CNV dosage and gene expression in the D. melanogaster genome

    Common Variants at 10 Genomic Loci Influence Hemoglobin A(1C) Levels via Glycemic and Nonglycemic Pathways

    Get PDF
    OBJECTIVE-Glycated hemoglobin (HbA(1c)), used to monitor and diagnose diabetes, is influenced by average glycemia over a 2- to 3-month period. Genetic factors affecting expression, turnover, and abnormal glycation of hemoglobin could also be associated with increased levels of HbA(1c). We aimed to identify such genetic factors and investigate the extent to which they influence diabetes classification based on HbA(1c) levels.RESEARCH DESIGN AND METHODS-We studied associations with HbA(1c) in up to 46,368 nondiabetic adults of European descent from 23 genome-wide association studies (GWAS) and 8 cohorts with de novo genotyped single nucleotide polymorphisms (SNPs). We combined studies using inverse-variance meta-analysis and tested mediation by glycemia using conditional analyses. We estimated the global effect of HbA(1c) loci using a multilocus risk score, and used net reclassification to estimate genetic effects on diabetes screening.RESULTS-Ten loci reached genome-wide significant association with HbA(1c), including six new loci near FN3K (lead SNP/P value, rs1046896/P = 1.6 x 10(-26)), HFE (rs1800562/P = 2.6 x 10(-20)), TMPRSS6 (rs855791/P = 2.7 x 10(-14)), ANK1 (rs4737009/P = 6.1 x 10(-12)), SPTA1 (rs2779116/P = 2.8 x 10(-9)) and ATP11A/TUBGCP3 (rs7998202/P = 5.2 x 10(-9)), and four known HbA(1c) loci: HK1 (rs16926246/P = 3.1 x 10(-54)), MTNR1B (rs1387153/P = 4.0 X 10(-11)), GCK (rs1799884/P = 1.5 x 10(-20)) and G6PC2/ABCB11 (rs552976/P = 8.2 x 10(-18)). We show that associations with HbA(1c) are partly a function of hyperglycemia associated with 3 of the 10 loci (GCK, G6PC2 and MTNR1B). The seven nonglycemic loci accounted for a 0.19 (%HbA(1c)) difference between the extreme 10% tails of the risk score, and would reclassify similar to 2% of a general white population screened for diabetes with HbA(1c).CONCLUSIONS-GWAS identified 10 genetic loci reproducibly associated with HbA(1c). Six are novel and seven map to loci where rarer variants cause hereditary anemias and iron storage disorders. Common variants at these loci likely influence HbA(1c) levels via erythrocyte biology, and confer a small but detectable reclassification of diabetes diagnosis by HbA(1c) Diabetes 59: 3229-3239, 201

    Gene set-based analysis of polymorphisms: finding pathways or biological processes associated to traits in genome-wide association studies

    Get PDF
    Genome-wide association studies have become a popular strategy to find associations of genes to traits of interest. Despite the high-resolution available today to carry out genotyping studies, the success of its application in real studies has been limited by the testing strategy used. As an alternative to brute force solutions involving the use of very large cohorts, we propose the use of the Gene Set Analysis (GSA), a different analysis strategy based on testing the association of modules of functionally related genes. We show here how the Gene Set-based Analysis of Polymorphisms (GeSBAP), which is a simple implementation of the GSA strategy for the analysis of genome-wide association studies, provides a significant increase in the power testing for this type of studies. GeSBAP is freely available at http://bioinfo.cipf.es/gesbap
    corecore