14 research outputs found
Mendelian inheritance of trimodal CpG methylation sites suggests distal cis-acting genetic effects.
Environmentally influenced phenotypes, such as obesity and insulin resistance, can be transmitted over multiple generations. Epigenetic modifications, such as methylation of DNA cytosine-guanine (CpG) pairs, may be carriers of inherited information. At the population level, the methylation state of such "heritable" CpG sites is expected to follow a trimodal distribution, and their mode of inheritance should be Mendelian. Using the Illumina Infinium 450 K DNA methylation array, we determined DNA CpG-methylation in blood cells from a family cohort 123 individuals of Arab ethnicity, including 18 elementary father-mother-child trios, we asked whether Mendelian inheritance of CpG methylation is observed, and most importantly, whether it is independent of any genetic signals. Using 40× whole genome sequencing, we therefore excluded all CpG sites with possibly confounding genetic variants (SNP) within the binding regions of the Illumina probes. We identified a total of 955 CpG sites that displayed a trimodal distribution and confirmed trimodality in a study of 1805 unrelated Caucasians. Of 955 CpG sites, 99.9% observed a strict Mendelian pattern of inheritance and had no SNP within +/-110 nucleotides of the CpG site by design. However, in 97% of these cases a distal cis-acting SNP within a +/-1 Mbp window was found that explained the observed CpG distribution, excluding the hypothesis of epigenetic inheritance for these clear-cut trimodal sites. Using power analysis, we showed that in 46% of all cases, the closest CpG-associated SNP was located more than 1000 bp from the CpG site. Our findings suggest that CpG methylation is maintained over larger genomic distances. Furthermore, nearly half of the SNPs associated with these trimodal sites were also associated with the expression of nearby genes (P = 4.08 × 10(-6)), implying a regulatory effect of these trimodal CpG sites
Using trials of caloric restriction and bariatric surgery to explore the effects of body mass index on the circulating proteome
Thousands of proteins circulate in the bloodstream; identifying those which associate with weight and intervention-induced weight loss may help explain mechanisms of diseases associated with adiposity. We aimed to identify consistent protein signatures of weight loss across independent studies capturing changes in body mass index (BMI). We analysed proteomic data from studies implementing caloric restriction (Diabetes Remission Clinical trial) and bariatric surgery (By-Band-Sleeve), using SomaLogic and Olink Explore1536 technologies, respectively. Linear mixed models were used to estimate the effect of the interventions on circulating proteins. Twenty-three proteins were altered in a consistent direction after both bariatric surgery and caloric restriction, suggesting that these proteins are modulated by weight change, independent of intervention type. We also integrated Mendelian randomisation (MR) estimates of the effect of BMI on proteins measured by SomaLogic from a UK blood donor cohort as a third line of causal evidence. These MR estimates provided further corroborative evidence for a role of BMI in regulating the levels of six proteins including alcohol dehydrogenase-4, nogo receptor and interleukin-1 receptor antagonist protein. These results indicate the importance of triangulation in interrogating causal relationships; further study into the role of proteins modulated by weight in disease is now warranted
Epigenetic scores for the circulating proteome as tools for disease prediction
Protein biomarkers have been identified across many age-related morbidities. However, characterising epigenetic influences could further inform disease predictions. Here, we leverage epigenome-wide data to study links between the DNA methylation (DNAm) signatures of the circulating proteome and incident diseases. Using data from four cohorts, we trained and tested epigenetic scores (EpiScores) for 953 plasma proteins, identifying 109 scores that explained between 1% and 58% of the variance in protein levels after adjusting for known protein quantitative trait loci (pQTL) genetic effects. By projecting these EpiScores into an independent sample (Generation Scotland; n = 9537) and relating them to incident morbidities over a follow-up of 14 years, we uncovered 137 EpiScore-disease associations. These associations were largely independent of immune cell proportions, common lifestyle and health factors, and biological aging. Notably, we found that our diabetes-associated EpiScores highlighted previous top biomarker associations from proteome-wide assessments of diabetes. These EpiScores for protein levels can therefore be a valuable resource for disease prediction and risk stratification
Plasma Proteomics of Renal Function: A Transethnic Meta-Analysis and Mendelian Randomization Study.
BACKGROUND: Studies on the relationship between renal function and the human plasma proteome have identified several potential biomarkers. However, investigations have been conducted largely in European populations, and causality of the associations between plasma proteins and kidney function has never been addressed. METHODS: A cross-sectional study of 993 plasma proteins among 2882 participants in four studies of European and admixed ancestries (KORA, INTERVAL, HUNT, QMDiab) identified transethnic associations between eGFR/CKD and proteomic biomarkers. For the replicated associations, two-sample bidirectional Mendelian randomization (MR) was used to investigate potential causal relationships. Publicly available datasets and transcriptomic data from independent studies were used to examine the association between gene expression in kidney tissue and eGFR. RESULTS: In total, 57 plasma proteins were associated with eGFR, including one novel protein. Of these, 23 were additionally associated with CKD. The strongest inferred causal effect was the positive effect of eGFR on testican-2, in line with the known biological role of this protein and the expression of its protein-coding gene (SPOCK2) in renal tissue. We also observed suggestive evidence of an effect of melanoma inhibitory activity (MIA), carbonic anhydrase III, and cystatin-M on eGFR. CONCLUSIONS: In a discovery-replication setting, we identified 57 proteins transethnically associated with eGFR. The revealed causal relationships are an important stepping stone in establishing testican-2 as a clinically relevant physiological marker of kidney disease progression, and point to additional proteins warranting further investigation.The KORA study was initiated and financed by the Helmholtz Zentrum München – German Research Center for Environmental Health, which is funded by the German Federal Ministry of Education and Research (BMBF) and by the State of Bavaria. This work was also supported by the Biomedical Research Program at Weill Cornell Medicine in Qatar, a program funded by the Qatar Foundation. K.S. is supported by Qatar National Research Fund (QNRF) grant no. NPRPC11-0115-180010. The Nord-Trøndelag Health Study (The HUNT Study) is a collaboration between HUNT Research Centre (Faculty of Medicine, Norwegian University of Science and Technology NTNU), Nord-Trøndelag County Council, Central Norway Health Authority, and the Norwegian Institute of Public Health. The HUNT part of the project re-used protein data that was originally analysed and paid for by Somalogic Inc, CO, USA. Somalogic had no role in the design and conduct of the study; collection of phenotypic data, statistical analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.
Professor John Danesh is funded by the National Institute for Health Research [Senior Investigator Award]. The views expressed are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health and Social Care.
RNA-sequencing experiments and kidney gene expression studies were supported by British Heart Foundation project grants [PG/17/35/33001 and PG/19/16/34270] and Kidney Research UK grants [ RP_017_20180302 and RP_013_20190305] to M.T.
The German Diabetes Center is funded by the German Federal Ministry of Health (Berlin, Germany), the Ministry of Culture and Science of the state North Rhine-Westphalia (Düsseldorf, Germany), and grants from the German Federal Ministry of Education and Research (Berlin, Germany) to the German Center for Diabetes Research e.V. (DZD)
Missing data estimation in fMRI dynamic causal modeling
Dynamic Causal Modeling (DCM) can be used to quantify cognitive function in individuals as effective connectivity. However, ambiguity among subjects in the number and location of discernible active regions prevents all candidate models from being compared in all subjects, preventing the use of DCM as an individual cognitive phenotyping tool. This paper proposes a solution to this problem by treating missing regions in the first-level analysis as missing data, and performing estimation of the time course associated with any missing region using one of four candidate methods: zero-filling, average-filling, noise-filling using a fixed stochastic process, or one estimated using expectation-maximization. The effect of this estimation scheme was analyzed by treating it as a preprocessing step to DCM and observing the resulting effects on model evidence. Simulation studies show that estimation using expectation-maximization yields the highest classification accuracy using a simple loss function and highest model evidence, relative to other methods. This result held for various dataset sizes and varying numbers of model choice. In real data, application to Go/No-Go and Simon tasks allowed computation of signals from the missing nodes and the consequent computation of model evidence in all subjects compared to 62 and 48 percent respectively if no preprocessing was performed. These results demonstrate the face validity of the preprocessing scheme and open the possibility of using single-subject DCM as an individual cognitive phenotyping tool
Metabolic and proteomic signatures of type 2 diabetes subtypes in an Arab population.
Type 2 diabetes (T2D) has a heterogeneous etiology influencing its progression, treatment, and complications. A data driven cluster analysis in European individuals with T2D previously identified four subtypes: severe insulin deficient (SIDD), severe insulin resistant (SIRD), mild obesity-related (MOD), and mild age-related (MARD) diabetes. Here, the clustering approach was applied to individuals with T2D from the Qatar Biobank and validated in an independent set. Cluster-specific signatures of circulating metabolites and proteins were established, revealing subtype-specific molecular mechanisms, including activation of the complement system with features of autoimmune diabetes and reduced 1,5-anhydroglucitol in SIDD, impaired insulin signaling in SIRD, and elevated leptin and fatty acid binding protein levels in MOD. The MARD cluster was the healthiest with metabolomic and proteomic profiles most similar to the controls. We have translated the T2D subtypes to an Arab population and identified distinct molecular signatures to further our understanding of the etiology of these subtypes
PopPAnTe: population and pedigree association testing for quantitative data.
Family-based designs, from twin studies to isolated populations with their complex genealogical data, are a valuable resource for genetic studies of heritable molecular biomarkers. Existing software for family-based studies have mainly focused on facilitating association between response phenotypes and genetic markers, and no user-friendly tools are at present available to straightforwardly extend association studies in related samples to large datasets of generic quantitative data, as those generated by current -omics technologies. We developed PopPAnTe, a user-friendly Java program, which evaluates the association of quantitative data in related samples. Additionally, PopPAnTe implements data pre and post processing, region based testing, and empirical assessment of associations. PopPAnTe is an integrated and flexible framework for pairwise association testing in related samples with a large number of predictors and response variables. It works either with family data of any size and complexity, or, when the genealogical information is unknown, it uses genetic similarity information between individuals as those inferred from genome-wide genetic data. It can therefore be particularly useful in facilitating usage of biobank data collections from population isolates when extensive genealogical information is missing
Metabolic and proteomic signatures of type 2 diabetes subtypes in an Arab population
Type 2 diabetes (T2D) has a heterogeneous etiology influencing its progression, treatment, and complications. A data driven cluster analysis in European individuals with T2D previously identified four subtypes: severe insulin deficient (SIDD), severe insulin resistant (SIRD), mild obesity-related (MOD), and mild age-related (MARD) diabetes. Here, the clustering approach was applied to individuals with T2D from the Qatar Biobank and validated in an independent set. Cluster-specific signatures of circulating metabolites and proteins were established, revealing subtype-specific molecular mechanisms, including activation of the complement system with features of autoimmune diabetes and reduced 1,5-anhydroglucitol in SIDD, impaired insulin signaling in SIRD, and elevated leptin and fatty acid binding protein levels in MOD. The MARD cluster was the healthiest with metabolomic and proteomic profiles most similar to the controls. We have translated the T2D subtypes to an Arab population and identified distinct molecular signatures to further our understanding of the etiology of these subtypes