162 research outputs found

    Development of risk prediction models for depression combining genetic and early life risk factors

    Get PDF
    BackgroundBoth genetic and early life risk factors play important roles in the pathogenesis and progression of adult depression. However, the interplay between these risk factors and their added value to risk prediction models have not been fully elucidated.MethodsLeveraging a meta-analysis of major depressive disorder genome-wide association studies (N = 45,591 cases and 97,674 controls), we developed and optimized a polygenic risk score for depression using LDpred in a model selection dataset from the UK Biobank (N = 130,092 European ancestry individuals). In a UK Biobank test dataset (N = 278,730 European ancestry individuals), we tested whether the polygenic risk score and early life risk factors were associated with each other and compared their associations with depression phenotypes. Finally, we conducted joint predictive modeling to combine this polygenic risk score with early life risk factors by stepwise regression, and assessed the model performance in identifying individuals at high risk of depression.ResultsIn the UK Biobank test dataset, the polygenic risk score for depression was moderately associated with multiple early life risk factors. For instance, a one standard deviation increase in the polygenic risk score was associated with 1.16-fold increased odds of frequent domestic violence (95% CI: 1.14–1.19) and 1.09-fold increased odds of not having access to medical care as a child (95% CI: 1.05–1.14). However, the polygenic risk score was more strongly associated with depression phenotypes than most early life risk factors. A joint predictive model integrating the polygenic risk score, early life risk factors, age and sex achieved an AUROC of 0.6766 for predicting strictly defined major depressive disorder, while a model without the polygenic risk score and a model without any early life risk factors had an AUROC of 0.6593 and 0.6318, respectively.ConclusionWe have developed a polygenic risk score to partly capture the genetic liability to depression. Although genetic and early life risk factors can be correlated, joint predictive models improved risk stratification despite limited improvement in magnitude, and may be explored as tools to better identify individuals at high risk of depression

    The Empirical Power of Rare Variant Association Methods: Results from Sanger Sequencing in 1,998 Individuals

    Get PDF
    The role of rare genetic variation in the etiology of complex disease remains unclear. However, the development of next-generation sequencing technologies offers the experimental opportunity to address this question. Several novel statistical methodologies have been recently proposed to assess the contribution of rare variation to complex disease etiology. Nevertheless, no empirical estimates comparing their relative power are available. We therefore assessed the parameters that influence their statistical power in 1,998 individuals Sanger-sequenced at seven genes by modeling different distributions of effect, proportions of causal variants, and direction of the associations (deleterious, protective, or both) in simulated continuous trait and case/control phenotypes. Our results demonstrate that the power of recently proposed statistical methods depend strongly on the underlying hypotheses concerning the relationship of phenotypes with each of these three factors. No method demonstrates consistently acceptable power despite this large sample size, and the performance of each method depends upon the underlying assumption of the relationship between rare variants and complex traits. Sensitivity analyses are therefore recommended to compare the stability of the results arising from different methods, and promising results should be replicated using the same method in an independent sample. These findings provide guidance in the analysis and interpretation of the role of rare base-pair variation in the etiology of complex traits and diseases

    A modiÿed score function estimator for multinomial logistic regression in small samples

    Get PDF
    Abstract Logistic regression modelling of mixed binary and continuous covariates is common in practice, but conventional estimation methods may not be feasible or appropriate for small samples. It is well known that the usual maximum likelihood estimates (MLEs) of the log-odds-ratio parameters are biased in ÿnite samples, and there is a non-zero probability that an MLE is inÿnite, i.e., does not exist. In this paper, we extend the approach proposed by Firth (Biometrika 80 (1993) 27) for bias reduction of MLEs in exponential family models to the multinomial logistic regression model, and consider general regression covariate types. The method is based on a suitable modiÿcation of the score function that removes ÿrst order bias. We apply the method in the analysis of two datasets: one is a study of disease prognosis and the other is a disease prevention trial. In a series of simulation studies in small samples, the modiÿed-score estimates for binomial and trinomial logistic regressions had mean bias closer to zero and smaller mean squared error than other approaches. The modiÿed-score estimates have properties that make them attractive for routine application in logistic regressions of binary and continuous covariates, including the advantage that they can be obtained in samples in which the MLEs are inÿnite

    Data Integration in Genetics and Genomics: Methods and Challenges

    Get PDF
    Due to rapid technological advances, various types of genomic and proteomic data with different sizes, formats, and structures have become available. Among them are gene expression, single nucleotide polymorphism, copy number variation, and protein-protein/gene-gene interactions. Each of these distinct data types provides a different, partly independent and complementary, view of the whole genome. However, understanding functions of genes, proteins, and other aspects of the genome requires more information than provided by each of the datasets. Integrating data from different sources is, therefore, an important part of current research in genomics and proteomics. Data integration also plays important roles in combining clinical, environmental, and demographic data with high-throughput genomic data. Nevertheless, the concept of data integration is not well defined in the literature and it may mean different things to different researchers. In this paper, we first propose a conceptual framework for integrating genetic, genomic, and proteomic data. The framework captures fundamental aspects of data integration and is developed taking the key steps in genetic, genomic, and proteomic data fusion. Secondly, we provide a review of some of the most commonly used current methods and approaches for combining genomic data with focus on the statistical aspects

    VEGF, FGF1, FGF2 and EGF gene polymorphisms and psoriatic arthritis

    Get PDF
    BACKGROUND: Angiogenesis appears to be a first-order event in psoriatic arthritis (PsA). Among angiogenic factors, the cytokines vascular endothelial growth factor (VEGF), epidermal growth factor (EGF), and fibroblast growth factors 1 and 2 (FGF1 and FGF2) play a central role in the initiation of angiogenesis. Most of these cytokines have been shown to be upregulated in or associated with psoriasis, rheumatoid arthritis (RA) or ankylosing spondylitis (AS). As these diseases share common susceptibility associations with PsA, investigation of these angiogenic factors is warranted. METHODS: Two hundred and fifty-eight patients with PsA and 154 ethnically matched controls were genotyped using a Sequenom chip-based MALDI-TOF mass spectrometry platform. Four SNPs in the VEGF gene, three SNPs in the EGF gene and one SNP each in FGF1 and FGF2 genes were evaluated. Statistical analysis was performed using Fisher's exact test, and the Cochrane-Armitage trend test. Associations with haplotypes were estimated by using weighted logistic models, where the individual haplotype estimates were obtained using Phase v2.1. RESULTS: We have observed an increased frequency in the T allele of VEGF +936 (rs3025039) in control subjects when compared to our PsA patients [Fisher's exact p-value = 0.042; OR 0.653 (95% CI: 0.434, 0.982)]. Haplotyping of markers revealed no significant associations. CONCLUSION: The T allele of VEGF in +936 may act as a protective allele in the development of PsA. Further studies regarding the role of pro-angiogenic markers in PsA are warranted

    Integration of “omics” Data and Phenotypic Data Within a Unified Extensible Multimodal Framework

    Get PDF
    Analysis of “omics” data is often a long and segmented process, encompassing multiple stages from initial data collection to processing, quality control and visualization. The cross-modal nature of recent genomic analyses renders this process challenging to both automate and standardize; consequently, users often resort to manual interventions that compromise data reliability and reproducibility. This in turn can produce multiple versions of datasets across storage systems. As a result, scientists can lose significant time and resources trying to execute and monitor their analytical workflows and encounter difficulties sharing versioned data. In 2015, the Ludmer Centre for Neuroinformatics and Mental Health at McGill University brought together expertise from the Douglas Mental Health University Institute, the Lady Davis Institute and the Montreal Neurological Institute (MNI) to form a genetics/epigenetics working group. The objectives of this working group are to: (i) design an automated and seamless process for (epi)genetic data that consolidates heterogeneous datasets into the LORIS open-source data platform; (ii) streamline data analysis; (iii) integrate results with provenance information; and (iv) facilitate structured and versioned sharing of pipelines for optimized reproducibility using high-performance computing (HPC) environments via the CBRAIN processing portal. This article outlines the resulting generalizable “omics” framework and its benefits, specifically, the ability to: (i) integrate multiple types of biological and multi-modal datasets (imaging, clinical, demographics and behavioral); (ii) automate the process of launching analysis pipelines on HPC platforms; (iii) remove the bioinformatic barriers that are inherent to this process; (iv) ensure standardization and transparent sharing of processing pipelines to improve computational consistency; (v) store results in a queryable web interface; (vi) offer visualization tools to better view the data; and (vii) provide the mechanisms to ensure usability and reproducibility. This framework for workflows facilitates brain research discovery by reducing human error through automation of analysis pipelines and seamless linking of multimodal data, allowing investigators to focus on research instead of data handling
    corecore