Search CORE

173 research outputs found

Variable clustering in high-dimensional linear regression: The R package clere

Author: Biernacki Christophe
Canouil Mickael
Jacques Julien
Yengo Loic
Publication venue: 'The R Foundation'
Publication date: 01/01/2016
Field of study

Dimension reduction is one of the biggest challenges in high-dimensional regression models. We recently introduced a new methodology based on variable clustering as a means to reduce dimensionality. We present here the R package clere that implements some refinements of this methodology. An overview of the package functionalities as well as examples to run an analysis are described. Numerical experiments on real data were performed to illustrate the good predictive performance of our parsimonious method compared to standard dimension reduction approaches

INRIA a CCSD electronic archive server

HAL

Hal-Diderot

University of Queensland eSpace

Early metabolic markers identify potential targets for the prevention of type 2 diabetes

Author: Aittokallio Tero
Balkau Beverley
Cobb Jeff
Froguel Philippe
Groop Leif
Kravic Jasmina
Peddinti Gopal
Tuomi Tiinamaija
Yengo Loic
Publication venue
Publication date: 11/05/2017
Field of study

Aims/hypothesis The aims of this study were to evaluate systematically the predictive power of comprehensive metabolomics profiles in predicting the future risk of type 2 diabetes, and to identify a panel of the most predictive metabolic markers. Methods We applied an unbiased systems medicine approach to mine metabolite combinations that provide added value in predicting the future incidence of type 2 diabetes beyond known risk factors. We performed mass spectrometry-based targeted, as well as global untargeted, metabolomics, measuring a total of 568 metabolites, in a Finnish cohort of 543 nondiabetic individuals from the Botnia Prospective Study, which included 146 individuals who progressed to type 2 diabetes by the end of a 10 year follow-up period. Multivariate logistic regression was used to assess statistical associations, and regularised least-squares modelling was used to perform machine learning-based risk classification and marker selection. The predictive performance of the machine learning models and marker panels was evaluated using repeated nested cross-validation, and replicated in an independent French cohort of 1044 individuals including 231 participants who progressed to type 2 diabetes during a 9 year follow-up period in the DESIR (Data from an Epidemiological Study on the Insulin Resistance Syndrome) study. Results Nine metabolites were negatively associated (potentially protective) and 25 were positively associated with progression to type 2 diabetes. Machine learning models based on the entire metabolome predicted progression to type 2 diabetes (area under the receiver operating characteristic curve, AUC = 0.77) significantly better than the reference model based on clinical risk factors alone (AUC = 0.68; DeLong's p = 0.0009). The panel of metabolic markers selected by the machine learning-based feature selection also significantly improved the predictive performance over the reference model (AUC = 0.78; p = 0.00019; integrated discrimination improvement, IDI = 66.7%). This approach identified novel predictive biomarkers, such as alpha-tocopherol, bradykinin hydroxyproline, X-12063 and X-13435, which showed added value in predicting progression to type 2 diabetes when combined with known biomarkers such as glucose, mannose and alpha-hydroxybutyrate and routinely used clinical risk factors. Conclusions/interpretation This study provides a panel of novel metabolic markers for future efforts aimed at the prevention of type 2 diabetes.Peer reviewe

Lund University Publications

Crossref

Spiral - Imperial College Digital Repository

VTT Research System

Helsingin yliopiston digitaalinen arkisto

University of Queensland eSpace

Genetic influence on within-person longitudinal change in anthropometric traits in the UK Biobank

Author: Goddard Michael
Hayes Ben J.
Keller Matthew C.
Kemper Kathryn E.
Sidorenko Julia
Visscher Peter M.
Wang Huanwei
Wray Naomi R.
Yengo Loic
Publication venue: Nature Research
Publication date: 01/05/2024
Field of study

The causes of temporal fluctuations in adult traits are poorly understood. Here, we investigate the genetic determinants of within-person trait variability of 8 repeatedly measured anthropometric traits in 50,117 individuals from the UK Biobank. We found that within-person (non-directional) variability had a SNP-based heritability of 2–5% for height, sitting height, body mass index (BMI) and weight (P ≤ 2.4 × 10−3). We also analysed longitudinal trait change and show a loss of both average height and weight beyond about 70 years of age. A variant tracking the Alzheimer’s risk APOE-E4 allele (rs429358) was significantly associated with weight loss (β = −0.047 kg per yr, s.e. 0.007, P = 2.2 × 10−11), and using 2-sample Mendelian Randomisation we detected a relationship consistent with causality between decreased lumbar spine bone mineral density and height loss (bxy = 0.011, s.e. 0.003, P = 3.5 × 10−4). Finally, population-level variance quantitative trait loci (vQTL) were consistent with within-person variability for several traits, indicating an overlap between trait variability assessed at the population or individual level. Our findings help elucidate the genetic influence on trait-change within an individual and highlight disease risks associated with these changes

Directory of Open Access Journals

Oxford University Research Archive

Estimation and implications of the genetic architecture of fasting and non-fasting blood glucose

Author: Lu Xueling
Pärna Katri
Qiao Zhen
Revez Joana A.
Sidorenko Julia
Snieder Harold
Visscher Peter M.
Wray Naomi R.
Xue Angli
Yengo Loic
Publication venue
Publication date: 19/12/2022
Field of study

This upload includes the sample code that was used in the paper "Estimation and implications of the genetic architecture of fasting and non-fasting blood glucose", which has been accepted for publication in Nature Communications. Abstract The genetic regulation of post-prandial glucose levels is poorly understood. Here, we characterise the genetic architecture of blood glucose variably measured within 0 and 24 hours of fasting in 368,000 European ancestry participants of the UK Biobank. We found a near-linear increase in the heritability of non-fasting glucose levels over time, which plateaus to its fasting state value after 5 hours post meal (h2=11%; standard error: 1%). The genetic correlation between different fasting times is > 0.77, suggesting that the genetic control of glucose is largely constant across fasting durations. Accounting for heritability differences between fasting times leads to a ~16% improvement in the discovery of genetic variants associated with glucose. Newly detected variants improve the prediction of fasting glucose and type 2 diabetes in independent samples. Finally, we meta-analysed summary statistics from genome-wide association studies of random and fasting glucose (N=518,615) and identified 156 independent SNPs explaining 3% of fasting glucose variance. Altogether, our study demonstrates the utility of random glucose measures to improve discovery of genetic variants associated with glucose homeostasis, even in fasting conditions

Proceedings - University of Groningen

ARTS repository - University of Groningen

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Dissertations of the University of Groningen

Leveraging functional genomic annotations and genome coverage to improve polygenic prediction of complex traits within and between ancestries

Author: Ani Alireza
Goddard Michael E.
Lin Tian
Liu Shouye
Nolte Ilja M.
Sidorenko Julia
Snieder Harold
Turley Patrick
Visscher Peter M.
Wang Rujia
Wang Ying
Wray Naomi R.
Yang Jian
Yengo Loic
Zeng Jian
Zheng Zhili
Publication venue: Nature Research
Publication date: 30/04/2024
Field of study

We develop a method, SBayesRC, that integrates genome-wide association study (GWAS) summary statistics with functional genomic annotations to improve polygenic prediction of complex traits. Our method is scalable to whole-genome variant analysis and refines signals from functional annotations by allowing them to affect both causal variant probability and causal effect distribution. We analyze 50 complex traits and diseases using ∼7 million common single-nucleotide polymorphisms (SNPs) and 96 annotations. SBayesRC improves prediction accuracy by 14% in European ancestry and up to 34% in cross-ancestry prediction compared to the baseline method SBayesR, which does not use annotations, and outperforms other methods, including LDpred2, LDpred-funct, MegaPRS, PolyPred-S and PRS-CSx. Investigation of factors affecting prediction accuracy identifies a significant interaction between SNP density and annotation information, suggesting whole-genome sequence variants with annotations may further improve prediction. Functional partitioning analysis highlights a major contribution of evolutionary constrained regions to prediction accuracy and the largest per-SNP contribution from nonsynonymous SNPs

Oxford University Research Archive

A saturated map of common genetic variants associated with human height

Common single-nucleotide polymorphisms (SNPs) are predicted to collectively explain 40-50% of phenotypic variation in human height, but identifying the specific variants and associated regions requires huge sample sizes(1). Here, using data from a genome-wide association study of 5.4 million individuals of diverse ancestries, we show that 12,111 independent SNPs that are significantly associated with height account for nearly all of the common SNP-based heritability. These SNPs are clustered within 7,209 non-overlapping genomic segments with a mean size of around 90 kb, covering about 21% of the genome. The density of independent associations varies across the genome and the regions of increased density are enriched for biologically relevant genes. In out-of-sample estimation and prediction, the 12,111 SNPs (or all SNPs in the HapMap 3 panel(2)) account for 40% (45%) of phenotypic variance in populations of European ancestry but only around 10-20% (14-24%) in populations of other ancestries. Effect sizes, associated regions and gene prioritization are similar across ancestries, indicating that reduced prediction accuracy is likely to be explained by linkage disequilibrium and differences in allele frequency within associated regions. Finally, we show that the relevant biological pathways are detectable with smaller sample sizes than are needed to implicate causal genes and variants. Overall, this study provides a comprehensive map of specific genomic regions that contain the vast majority of common height-associated variants. Although this map is saturated for populations of European ancestry, further research is needed to achieve equivalent saturation in other ancestries. A large genome-wide association study of more than 5 million individuals reveals that 12,111 single-nucleotide polymorphisms account for nearly all the heritability of height attributable to common genetic variants.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Genome-wide association analyses identify 143 risk variants and putative regulatory mechanisms for type 2 diabetes

Author: Allan F. McRae
Angli Xue
Futao Zhang
Jian Zeng & Jian Yang
Julia Sidorenko
Kathryn E. Kemper
Loic Yengo
Luke R. Lloyd-Jones
Peter M. Visscher
Yang Wu
Yeda Wu
Zhihong Zhu
Zhili Zheng
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 28/10/2022
Field of study

Type 2 diabetes (T2D) is a very common disease in humans. Here we conduct a meta-analysis of genome-wide association studies (GWAS) with ~16 million genetic variants in 62,892 T2D cases and 596,424 controls of European ancestry. We identify 139 common and 4 rare variants associated with T2D, 42 of which (39 common and 3 rare variants) are independent of the known variants. Integration of the gene expression data from blood (n = 14,115 and 2765) with the GWAS results identifies 33 putative functional genes for T2D, 3 of which were targeted by approved drugs. A further integration of DNA methylation (n = 1980) and epigenomic annotation data highlight 3 genes (CAMK1D, TP53INP1, and ATP5G1) with plausible regulatory mechanisms, whereby a genetic variant exerts an effect on T2D through epigenetic regulation of gene expression. Our study uncovers additional loci, proposes putative genetic regulatory mechanisms for T2D, and provides evidence of purifying selection for T2D-associated variants.</p

UTUPub

Association Between Population Density and Genetic Risk for Schizophrenia

Author: Boomsma Dorret I.
Colodro-Conde Lucía
Couvy-Duchesne Baptiste
Das Marjolijn
De Zeeuw Eveline L.
Gordon Scott
Kemper Kathryn E.
MacGregor Stuart
Martin Nicholas G.
McGrath John J.
Medland Sarah E.
Neale Rachel E.
Nivard Michel G.
Olsen Catherine M.
Rietschel Marcella
Streit Fabian
Trzaskowski Maciej
Whiteman David C.
Whitfield John B.
Yang Jian
Yengo Loic
Zheng Zhili
Publication venue: 'American Medical Association (AMA)'
Publication date: 01/09/2018
Field of study

VU Research Portal

Genome-wide analysis in over 1 million individuals of European ancestry yields improved polygenic risk scores for blood pressure traits

Author: Ani Alireza
Attia John R.
Caulfield Mark J.
Chasman Daniel I.
Conen David
Evangelou Evangelos
Giri Ayush
Goel Anuj
Goleva Slavina B.
Hellwege Jacklyn N.
Hwang Shih-Jen
Kamali Zoha
Keaton Jacob M.
Kooner Jaspal S.
Loos Ruth J. F.
Morris Andrew P.
Morrison Alanna C.
Traylor Matthew
Vaez Ahmad
van Duijn Cornelia M.
Watkins Hugh
Williams Ariel
Xie Tian
Yengo Loic
Young William J.
Zeng Jian
Zheng Zhili
Publication venue: Nature Research
Publication date: 30/04/2024
Field of study

Hypertension affects more than one billion people worldwide. Here we identify 113 novel loci, reporting a total of 2,103 independent genetic signals (P < 5 × 10−8) from the largest single-stage blood pressure (BP) genome-wide association study to date (n = 1,028,980 European individuals). These associations explain more than 60% of single nucleotide polymorphism-based BP heritability. Comparing top versus bottom deciles of polygenic risk scores (PRSs) reveals clinically meaningful differences in BP (16.9 mmHg systolic BP, 95% CI, 15.5–18.2 mmHg, P = 2.22 × 10−126) and more than a sevenfold higher odds of hypertension risk (odds ratio, 7.33; 95% CI, 5.54–9.70; P = 4.13 × 10−44) in an independent dataset. Adding PRS into hypertension-prediction models increased the area under the receiver operating characteristic curve (AUROC) from 0.791 (95% CI, 0.781–0.801) to 0.826 (95% CI, 0.817–0.836, ∆AUROC, 0.035, P = 1.98 × 10−34). We compare the 2,103 loci results in non-European ancestries and show significant PRS associations in a large African-American sample. Secondary analyses implicate 500 genes previously unreported for BP. Our study highlights the role of increasingly large genomic studies for precision health research

Oxford University Research Archive

Risk prediction of late-onset Alzheimer’s disease implies an oligogenic architecture

Author: Armstrong Nicola J.
Australian Imaging Biomarkers and Lifestyle
Brodaty Henry
Collins Steven
Couvy-Duchesne Baptiste
Goate Alison M.
Huang Kuan-lin
Laws Simon M.
Lennon Kate
Li Qiao Xin
Marcora Edoardo
Marioni Riccardo E.
Mather Karen A.
McRae Allan F.
Porter Tenielle
Robertson Naomi R. Wray
Sachdev Perminder S.
Sidorenko Julia
Thai Christine
Thalamuthu Anbupalam
Trounson Brett
Ugarte Fernanda Yevenes
Visscher Peter M.
Volitakis Irene
Vovos Michael
Wright Margaret J.
Yeng Jian
Yengo Loic
Zhang Qian
Publication venue: Edith Cowan University, Research Online, Perth, Western Australia
Publication date: 23/09/2020
Field of study

© 2020, The Author(s). Genetic association studies have identified 44 common genome-wide significant risk loci for late-onset Alzheimer’s disease (LOAD). However, LOAD genetic architecture and prediction are unclear. Here we estimate the optimal P-threshold (Poptimal) of a genetic risk score (GRS) for prediction of LOAD in three independent datasets comprising 676 cases and 35,675 family history proxy cases. We show that the discriminative ability of GRS in LOAD prediction is maximised when selecting a small number of SNPs. Both simulation results and direct estimation indicate that the number of causal common SNPs for LOAD may be less than 100, suggesting LOAD is more oligogenic than polygenic. The best GRS explains approximately 75% of SNP-heritability, and individuals in the top decile of GRS have ten-fold increased odds when compared to those in the bottom decile. In addition, 14 variants are identified that contribute to both LOAD risk and age at onset of LOAD

Research Online @ ECU