159 research outputs found

    Across-cohort QC analyses of GWAS summary statistics from complex traits.

    Get PDF
    Genome-wide association studies (GWASs) have been successful in discovering SNP trait associations for many quantitative traits and common diseases. Typically, the effect sizes of SNP alleles are very small and this requires large genome-wide association meta-analyses (GWAMAs) to maximize statistical power. A trend towards ever-larger GWAMA is likely to continue, yet dealing with summary statistics from hundreds of cohorts increases logistical and quality control problems, including unknown sample overlap, and these can lead to both false positive and false negative findings. In this study, we propose four metrics and visualization tools for GWAMA, using summary statistics from cohort-level GWASs. We propose methods to examine the concordance between demographic information, and summary statistics and methods to investigate sample overlap. (I) We use the population genetics Fst statistic to verify the genetic origin of each cohort and their geographic location, and demonstrate using GWAMA data from the GIANT Consortium that geographic locations of cohorts can be recovered and outlier cohorts can be detected. (II) We conduct principal component analysis based on reported allele frequencies, and are able to recover the ancestral information for each cohort. (III) We propose a new statistic that uses the reported allelic effect sizes and their standard errors to identify significant sample overlap or heterogeneity between pairs of cohorts. (IV) To quantify unknown sample overlap across all pairs of cohorts, we propose a method that uses randomly generated genetic predictors that does not require the sharing of individual-level genotype data and does not breach individual privacy

    Novel genetic analysis for case-control genome-wide association studies: quantification of power and genomic prediction accuracy

    Get PDF
    Genome-wide association studies (GWAS) are routinely conducted for both quantitative and binary (disease) traits. We present two analytical tools for use in the experimental design of GWAS. Firstly, we present power calculations quantifying power in a unified framework for a range of scenarios. In this context we consider the utility of quantitative scores (e.g. endophenotypes) that may be available on cases only or both cases and controls. Secondly, we consider, the accuracy of prediction of genetic risk from genome-wide SNPs and derive an expression for genomic prediction accuracy using a liability threshold model for disease traits in a case-control design. The expected values based on our derived equations for both power and prediction accuracy agree well with observed estimates from simulations

    Impact of diagnostic misclassification on estimation of genetic correlations using genome-wide genotypes

    Get PDF
    Disorders that share genetic risk factors often are placed in closely related diagnostic categories and treated similarly. Until recently, evidence for shared genetic etiology derived from classical research strategies – coaggregation in family and twin studies. Accumulating sufficient numbers of families was often problematic. However, in the era of genome-wide genotyping, we can now directly estimate the degree of sharing of genetic risk factors between disorders. This strategy is practical even for very rare disorders, where it is infeasible to ascertain informative families. Importantly, the estimates of genetic correlations from genome-wide genotypes are derived using such distant relatives that contamination by shared environmental factors seems unlikely. However, any method that seeks to quantify the shared etiology of disorders assumes they can be distinguished diagnostically from one another without error. Here we investigate the impact of misdiagnosis on estimates of genetic correlation both from traditional family data and from genome-wide genotypes of case–control samples from unrelated individuals. Our analyses show similar results for levels of misdiagnosis in both types of data. In both scenarios, genetic variances and heritabilities tend to be slightly underestimated but genetic correlations are overestimated, sometimes substantially so. For example, two genetically distinct but equally heritable disorders each with prevalence 1%, can generate false-positive estimates of genetic correlations of >0.2 in the presence of 10% reciprocal misdiagnosis. Strategies for minimizing the effects of misdiagnosis in cross-disorder genetic studies are discussed

    The Genetic Interpretation of Area under the ROC Curve in Genomic Profiling

    Get PDF
    Genome-wide association studies in human populations have facilitated the creation of genomic profiles which combine the effects of many associated genetic variants to predict risk of disease. The area under the receiver operator characteristic (ROC) curve is a well established measure for determining the efficacy of tests in correctly classifying diseased and non-diseased individuals. We use quantitative genetics theory to provide insight into the genetic interpretation of the area under the ROC curve (AUC) when the test classifier is a predictor of genetic risk. Even when the proportion of genetic variance explained by the test is 100%, there is a maximum value for AUC that depends on the genetic epidemiology of the disease, i.e. either the sibling recurrence risk or heritability and disease prevalence. We derive an equation relating maximum AUC to heritability and disease prevalence. The expression can be reversed to calculate the proportion of genetic variance explained given AUC, disease prevalence, and heritability. We use published estimates of disease prevalence and sibling recurrence risk for 17 complex genetic diseases to calculate the proportion of genetic variance that a test must explain to achieve AUC = 0.75; this varied from 0.10 to 0.74. We provide a genetic interpretation of AUC for use with predictors of genetic risk based on genomic profiles. We provide a strategy to estimate proportion of genetic variance explained on the liability scale from estimates of AUC, disease prevalence, and heritability (or sibling recurrence risk) available as an online calculator

    The contribution of genetic variants to disease depends on the ruler

    Get PDF
    Our understanding of the genetic basis of disease has evolved from descriptions of overall heritability or familiality to the identification of large numbers of risk loci. One can quantify the impact of such loci on disease using a plethora of measures, which can guide future research decisions. However, different measures can attribute varying degrees of importance to a variant. In this Analysis, we consider and contrast the most commonly used measures-specifically, the heritability of disease liability, approximate heritability, sibling recurrence risk, overall genetic variance using a logarithmic relative risk scale, the area under the receiver-operating curve for risk prediction and the population attributable fraction-and give guidelines for their use that should be explicitly considered when assessing the contribution of genetic variants to disease

    Comprehensive genetic analysis of the human lipidome identifies loci associated with lipid homeostasis with links to coronary artery disease

    Get PDF
    We integrated lipidomics and genomics to unravel the genetic architecture of lipid metabolism and identify genetic variants associated with lipid species putatively in the mechanistic pathway for coronary artery disease (CAD). We quantified 596 lipid species in serum from 4,492 individuals from the Busselton Health Study. The discovery GWAS identified 3,361 independent lipid-loci associations, involving 667 genomic regions (479 previously unreported), with validation in two independent cohorts. A meta-analysis revealed an additional 70 independent genomic regions associated with lipid species. We identified 134 lipid endophenotypes for CAD associated with 186 genomic loci. Associations between independent lipid-loci with coronary atherosclerosis were assessed in ∼ 456,000 individuals from the UK Biobank. Of the 53 lipid-loci that showed evidence of association (P \u3c 1 × 10−3), 43 loci were associated with at least one lipid endophenotype. These findings illustrate the value of integrative biology to investigate the aetiology of atherosclerosis and CAD, with implications for other complex diseases

    The National Lung Matrix Trial: translating the biology of stratification in advanced non-small-cell lung cancer

    Get PDF
    © The Author 2015.Background: The management of NSCLC has been transformed by stratified medicine. The National Lung Matrix Trial (NLMT) is a UK-wide study exploring the activity of rationally selected biomarker/targeted therapy combinations. Patients and methods: The Cancer Research UK (CRUK) Stratified Medicine Programme 2 is undertaking the large volume national molecular pre-screening which integrates with the NLMT. At study initiation, there are eight drugs being used to target 18 molecular cohorts. The aim is to determine whether there is sufficient signal of activity in any drug-biomarker combination to warrant further investigation. A Bayesian adaptive design that gives a more realistic approach to decision making and flexibility to make conclusions without fixing the sample size was chosen. The screening platform is an adaptable 28-gene Nextera next-generation sequencing platform designed by Illumina, covering the range of molecular abnormalities being targeted. The adaptive design allows new biomarker-drug combination cohorts to be incorporated by substantial amendment. The pre-clinical justification for each biomarker-drug combination has been rigorously assessed creating molecular exclusion rules and a trumping strategy in patients harbouring concomitant actionable genetic abnormalities. Discrete routes of pathway activation or inactivation determined by cancer genome aberrations are treated as separate cohorts. Key translational analyses include the deep genomic analysis of pre- and post-treatment biopsies, the establishment of patient-derived xenograft models and longitudinal ctDNA collection, in order to define predictive biomarkers, mechanisms of resistance and early markers of response and relapse. Conclusion: The SMP2 platform will provide large scale genetic screening to inform entry into the NLMT, a trial explicitly aimed at discovering novel actionable cohorts in NSCLC

    CNV-association meta-analysis in 191,161 European adults reveals new loci associated with anthropometric traits

    Get PDF
    Funding Information: This research has been conducted using the UK Biobank Resource. This research has been conducted using the Danish National Biobank resource. The authors are grateful to the Raine Study participants and their families, and to the Raine Study research staff for cohort co-ordination and data collection. QIMR is grateful to the twins and their families for their generous participation in these studies. We would like to thank staff at the Queensland Institute of Medical Research: Anjali Henders, Dixie Statham, Lisa Bowdler, Ann Eldridge, and Marlene Grace for sample collection, processing and genotyping, Scott Gordon, Brian McEvoy, Belinda Cornes and Beben Benyamin for data QC and preparation, and David Smyth and Harry Beeby for IT support. HBCS Acknowledgements: We thank all study participants as well as everybody involved in the Helsinki Birth Cohort Study. Helsinki Birth Cohort Study has been supported by grants from the Academy of Finland, the Finnish Diabetes Research Society, Folkhälsan Research Foundation, Novo Nordisk Foundation, Finska Läkaresällskapet, Juho Vainio Foundation, Signe and Ane Gyllenberg Foundation, University of Helsinki, Ministry of Education, Ahokas Foundation, Emil Aaltonen Foundation. Finrisk study is grateful for the THL DNA laboratory for its skillful work to produce the DNA samples used in this study and thanks the Sanger Institute and FIMM genotyping facilities for genotyping the samples. We thank the MOLGENIS team and Genomics Coordination Center of the University Medical Center Groningen for software development and data management, in particular Marieke Bijlsma and Edith Adriaanse. This work was supported by the Leenards Foundation (to Z.K.), the Swiss National Science Foundation (31003A_169929 to Z.K., Sinergia grant CRSII33-133044 to AR), Simons Foundation (SFARI274424 to AR) and SystemsX.ch (51RTP0_151019 to Z.K.). A.R.W., H.Y. and T.M.F. are supported by the European Research Council grant: 323195:SZ-245. M.A.T., M.N.W. and An.M. are supported by the Wellcome Trust Institutional Strategic Support Award (WT097835MF). For full funding information of all participating cohorts see Supplementary Note 2. Publisher Copyright: © 2017 The Author(s).There are few examples of robust associations between rare copy number variants (CNVs) and complex continuous human traits. Here we present a large-scale CNV association meta-analysis on anthropometric traits in up to 191,161 adult samples from 26 cohorts. The study reveals five CNV associations at 1q21.1, 3q29, 7q11.23, 11p14.2, and 18q21.32 and confirms two known loci at 16p11.2 and 22q11.21, implicating at least one anthropometric trait. The discovered CNVs are recurrent and rare (0.01-0.2%), with large effects on height (> 2.4 cm), weight ( 5 kg), and body mass index (BMI) (> 3.5 kg/m(2)). Burden analysis shows a 0.41 cm decrease in height, a 0.003 increase in waist-to-hip ratio and increase in BMI by 0.14 kg/m2 for each Mb of total deletion burden (P = 2.5 x 10(-10), 6.0 x 10(-5), and 2.9 x 10(-3)). Our study provides evidence that the same genes (e.g., MC4R, FIBIN, and FMO5) harbor both common and rare variants affecting body size and that anthropometric traits share genetic loci with developmental and psychiatric disorders.Peer reviewe
    corecore