34 research outputs found

    Analysis of case-control association studies with known risk variants

    Get PDF
    Motivation: The question of how to best use information from known associated variants when conducting disease association studies has yet to be answered. Some studies compute a marginal P-value for each Several Nucleotide Polymorphisms independently, ignoring previously discovered variants. Other studies include known variants as covariates in logistic regression, but a weakness of this standard conditioning strategy is that it does not account for disease prevalence and non-random ascertainment, which can induce a correlation structure between candidate variants and known associated variants even if the variants lie on different chromosomes. Here, we propose a new conditioning approach, which is based in part on the classical technique of liability threshold modeling. Roughly, this method estimates model parameters for each known variant while accounting for the published disease prevalence from the epidemiological literature. Results: We show via simulation and application to empirical datasets that our approach outperforms both the no conditioning strategy and the standard conditioning strategy, with a properly controlled false-positive rate. Furthermore, in multiple data sets involving diseases of low prevalence, standard conditioning produces a severe drop in test statistics whereas our approach generally performs as well or better than no conditioning. Our approach may substantially improve disease gene discovery for diseases with many known risk variants. Availability: LTSOFT software is available online http://www.hsph.harvard.edu/faculty/alkes-price/software/ Contact: [email protected]; [email protected] Supplementary information: Supplementary data are available at Bioinformatics onlin

    Multiethnic Genetic Association Studies Improve Power for Locus Discovery

    Get PDF
    To date, genome-wide association studies have focused almost exclusively on populations of European ancestry. These studies continue with the advent of next-generation sequencing, designed to systematically catalog and test low-frequency variation for a role in disease. A complementary approach would be to focus further efforts on cohorts of multiple ethnicities. This leverages the idea that population genetic drift may have elevated some variants to higher allele frequency in different populations, boosting statistical power to detect an association. Based on empirical allele frequency distributions from eleven populations represented in HapMap Phase 3 and the 1000 Genomes Project, we simulate a range of genetic models to quantify the power of association studies in multiple ethnicities relative to studies that exclusively focus on samples of European ancestry. In each of these simulations, a first phase of GWAS in exclusively European samples is followed by a second GWAS phase in any of the other populations (including a multiethnic design). We find that nontrivial power gains can be achieved by conducting future whole-genome studies in worldwide populations, where, in particular, African populations contribute the largest relative power gains for low-frequency alleles (<5%) of moderate effect that suffer from low power in samples of European descent. Our results emphasize the importance of broadening genetic studies to worldwide populations to ensure efficient discovery of genetic loci contributing to phenotypic trait variability, especially for those traits for which large numbers of samples of European ancestry have already been collected and tested

    Informed Conditioning on Clinical Covariates Increases Power in Case-Control Association Studies

    Get PDF
    Genetic case-control association studies often include data on clinical covariates, such as body mass index (BMI), smoking status, or age, that may modify the underlying genetic risk of case or control samples. For example, in type 2 diabetes, odds ratios for established variants estimated from low–BMI cases are larger than those estimated from high–BMI cases. An unanswered question is how to use this information to maximize statistical power in case-control studies that ascertain individuals on the basis of phenotype (case-control ascertainment) or phenotype and clinical covariates (case-control-covariate ascertainment). While current approaches improve power in studies with random ascertainment, they often lose power under case-control ascertainment and fail to capture available power increases under case-control-covariate ascertainment. We show that an informed conditioning approach, based on the liability threshold model with parameters informed by external epidemiological information, fully accounts for disease prevalence and non-random ascertainment of phenotype as well as covariates and provides a substantial increase in power while maintaining a properly controlled false-positive rate. Our method outperforms standard case-control association tests with or without covariates, tests of gene x covariate interaction, and previously proposed tests for dealing with covariates in ascertained data, with especially large improvements in the case of case-control-covariate ascertainment. We investigate empirical case-control studies of type 2 diabetes, prostate cancer, lung cancer, breast cancer, rheumatoid arthritis, age-related macular degeneration, and end-stage kidney disease over a total of 89,726 samples. In these datasets, informed conditioning outperforms logistic regression for 115 of the 157 known associated variants investigated (P-value = 1×10−9). The improvement varied across diseases with a 16% median increase in χ2 test statistics and a commensurate increase in power. This suggests that applying our method to existing and future association studies of these diseases may identify novel disease loci

    Extremely low-coverage sequencing and imputation increases power for genome-wide association studies

    Get PDF
    Genome wide association studies (GWAS) have proven a powerful method to identify common genetic variants contributing to susceptibility to common diseases. Here we show that extremely low-coverage sequencing (0.1–0.5x) captures almost as much of the common (>5%) and low-frequency (1–5%) variation across the genome as SNP arrays. As an empirical demonstration, we show that genome-wide SNP genotypes can be inferred at a mean r2 of 0.71 using off-target data (0.24x average coverage) in a whole-exome study of 909 samples. Using both simulated and real exome sequencing datasets we show that association statistics obtained using ultra low-coverage sequencing data attain similar P-values at known associated variants as genotyping arrays, without an excess of false positives. Within the context of reductions in sample preparation and sequencing costs, funds invested in ultra low-coverage sequencing can yield several times the effective sample size of SNP-array GWAS, and a commensurate increase in statistical power

    3D Multi-Cell Simulation of Tumor Growth and Angiogenesis

    Get PDF
    We present a 3D multi-cell simulation of a generic simplification of vascular tumor growth which can be easily extended and adapted to describe more specific vascular tumor types and host tissues. Initially, tumor cells proliferate as they take up the oxygen which the pre-existing vasculature supplies. The tumor grows exponentially. When the oxygen level drops below a threshold, the tumor cells become hypoxic and start secreting pro-angiogenic factors. At this stage, the tumor reaches a maximum diameter characteristic of an avascular tumor spheroid. The endothelial cells in the pre-existing vasculature respond to the pro-angiogenic factors both by chemotaxing towards higher concentrations of pro-angiogenic factors and by forming new blood vessels via angiogenesis. The tumor-induced vasculature increases the growth rate of the resulting vascularized solid tumor compared to an avascular tumor, allowing the tumor to grow beyond the spheroid in these linear-growth phases. First, in the linear-spherical phase of growth, the tumor remains spherical while its volume increases. Second, in the linear-cylindrical phase of growth the tumor elongates into a cylinder. Finally, in the linear-sheet phase of growth, tumor growth accelerates as the tumor changes from cylindrical to paddle-shaped. Substantial periods during which the tumor grows slowly or not at all separate the exponential from the linear-spherical and the linear-spherical from the linear-cylindrical growth phases. In contrast to other simulations in which avascular tumors remain spherical, our simulated avascular tumors form cylinders following the blood vessels, leading to a different distribution of hypoxic cells within the tumor. Our simulations cover time periods which are long enough to produce a range of biologically reasonable complex morphologies, allowing us to study how tumor-induced angiogenesis affects the growth rate, size and morphology of simulated tumors

    14 Schizophrenia Working Group of the Psychiatric Genomics Consortium, Discovery, Biology, and Risk of Inherited Variants in Breast Cancer (DRIVE) study

    Get PDF
    Polygenic risk scores have shown great promise in predicting complex disease risk and will become more accurate as training sample sizes increase. The standard approach for calculating risk scores involves linkage disequilibrium (LD)-based marker pruning and applying a p value threshold to association statistics, but this discards information and can reduce predictive accuracy. We introduce LDpred, a method that infers the posterior mean effect size of each marker by using a prior on effect sizes and LD information from an external reference panel. Theory and simulations show that LDpred outperforms the approach of pruning followed by thresholding, particularly at large sample sizes. Accordingly, predicted R 2 increased from 20.1% to 25.3% in a large schizophrenia dataset and from 9.8% to 12.0% in a large multiple sclerosis dataset. A similar relative improvement in accuracy was observed for three additional large disease datasets and for non-European schizophrenia samples. The advantage of LDpred over existing methods will grow as sample sizes increase

    Determinants of penetrance and variable expressivity in monogenic metabolic conditions across 77,184 exomes

    Get PDF
    Penetrance of variants in monogenic disease and clinical utility of common polygenic variation has not been well explored on a large-scale. Here, the authors use exome sequencing data from 77,184 individuals to generate penetrance estimates and assess the utility of polygenic variation in risk prediction of monogenic variants

    Human Fertility, Molecular Genetics, and Natural Selection in Modern Societies

    Get PDF
    Research on genetic influences on human fertility outcomes such as number of children ever born (NEB) or the age at first childbirth (AFB) has been solely based on twin and family-designs that suffer from problematic assumptions and practical limitations. The current study exploits recent advances in the field of molecular genetics by applying the genomic-relationship-matrix based restricted maximum likelihood (GREML) methods to quantify for the first time the extent to which common genetic variants influence the NEB and the AFB of women. Using data from the UK and the Netherlands (N = 6,758), results show significant additive genetic effects on both traits explaining 10% (SE = 5) of the variance in the NEB and 15% (SE = 4) in the AFB. We further find a significant negative genetic correlation between AFB and NEB in the pooled sample of –0.62 (SE = 0.27, p-value = 0.02). This finding implies that individuals with genetic predispositions for an earlier AFB had a reproductive advantage and that natural selection operated not only in historical, but also in contemporary populations. The observed postponement in the AFB across the past century in Europe contrasts with these findings, suggesting an evolutionary override by environmental effects and underscoring that evolutionary predictions in modern human societies are not straight forward. It emphasizes the necessity for an integrative research design from the fields of genetics and social sciences in order to understand and predict fertility outcomes. Finally, our results suggest that we may be able to find genetic variants associated with human fertility when conducting GWAS-meta analyses with sufficient sample size
    corecore