25 research outputs found

    Contrastive Mixture of Posteriors for Counterfactual Inference, Data Integration and Fairness

    Full text link
    Learning meaningful representations of data that can address challenges such as batch effect correction, data integration and counterfactual inference is a central problem in many domains including computational biology. Adopting a Conditional VAE framework, we identify the mathematical principle that unites these challenges: learning a representation that is marginally independent of a condition variable. We therefore propose the Contrastive Mixture of Posteriors (CoMP) method that uses a novel misalignment penalty to enforce this independence. This penalty is defined in terms of mixtures of the variational posteriors themselves, unlike prior work which uses external discrepancy measures such as MMD to ensure independence in latent space. We show that CoMP has attractive theoretical properties compared to previous approaches, especially when there is complex global structure in latent space. We further demonstrate state of the art performance on a number of real-world problems, including the challenging tasks of aligning human tumour samples with cancer cell-lines and performing counterfactual inference on single-cell RNA sequencing data. Incidentally, we find parallels with the fair representation learning literature, and demonstrate CoMP has competitive performance in learning fair yet expressive latent representations

    Towards Deep Cellular Phenotyping in Placental Histology

    Full text link
    The placenta is a complex organ, playing multiple roles during fetal development. Very little is known about the association between placental morphological abnormalities and fetal physiology. In this work, we present an open sourced, computationally tractable deep learning pipeline to analyse placenta histology at the level of the cell. By utilising two deep Convolutional Neural Network architectures and transfer learning, we can robustly localise and classify placental cells within five classes with an accuracy of 89%. Furthermore, we learn deep embeddings encoding phenotypic knowledge that is capable of both stratifying five distinct cell populations and learn intraclass phenotypic variance. We envisage that the automation of this pipeline to population scale studies of placenta histology has the potential to improve our understanding of basic cellular placental biology and its variations, particularly its role in predicting adverse birth outcomes.Comment: Updated MRC funding material. Corrected typo that suggested ensembling and Inception accuracy were the same (updated to reflect the fact the ensemble model is 1% better than previously reported

    Metabolomic profiling to dissect the role of visceral fat in cardiometabolic health

    Get PDF
    OBJECTIVE: Abdominal obesity is associated with increased risk of type 2 diabetes (T2D) and cardiovascular disease. The aim of this study was to assess whether metabolomic markers of T2D and blood pressure (BP) act on these traits via visceral fat (VF) mass. METHODS: Metabolomic profiling of 280 fasting plasma metabolites was conducted on 2,401 women from TwinsUK. The overlap was assessed between published metabolites associated with T2D, insulin resistance, or BP and those that were identified to be associated with VF (after adjustment for covariates) measured by dual‐energy X‐ray absorptiometry. RESULTS: In addition to glucose, six metabolites were strongly associated with both VF mass and T2D: lactate and branched‐chain amino acids, all of them related to metabolism and the tricarboxylic acid cycle; on average, 38.5% of their association with insulin resistance was mediated by their association with VF mass. Five metabolites were associated with BP and VF mass including the inflammation‐associated peptide HWESASXX, the steroid hormone androstenedione, lactate, and palmitate. On average, 29% of their effect on BP was mediated by their association with VF mass. CONCLUSIONS: Little overlap was found between the metabolites associated with BP and those associated with insulin resistance via VF mass

    Machine Learning based histology phenotyping to investigate the epidemiologic and genetic basis of adipocyte morphology and cardiometabolic traits

    Get PDF
    Genetic studies have recently highlighted the importance of fat distribution, as well as overall adiposity, in the pathogenesis of obesity-associated diseases. Using a large study (n = 1,288) from 4 independent cohorts, we aimed to investigate the relationship between mean adipocyte area and obesity-related traits, and identify genetic factors associated with adipocyte cell size. To perform the first large-scale study of automatic adipocyte phenotyping using both histological and genetic data, we developed a deep learning-based method, the Adipocyte U-Net, to rapidly derive mean adipocyte area estimates from histology images. We validate our method using three state-of-the-art approaches; CellProfiler, Adiposoft and floating adipocytes fractions, all run blindly on two external cohorts. We observe high concordance between our method and the state-of-the-art approaches (Adipocyte U-net vs. CellProfiler: R2visceral = 0.94, P < 2.2 × 10-16, R2subcutaneous = 0.91, P < 2.2 × 10-16), and faster run times (10,000 images: 6mins vs 3.5hrs). We applied the Adipocyte U-Net to 4 cohorts with histology, genetic, and phenotypic data (total N = 820). After meta-analysis, we found that mean adipocyte area positively correlated with body mass index (BMI) (Psubq = 8.13 × 10-69, ÎČsubq = 0.45; Pvisc = 2.5 × 10-55, ÎČvisc = 0.49; average R2 across cohorts = 0.49) and that adipocytes in subcutaneous depots are larger than their visceral counterparts (Pmeta = 9.8 × 10-7). Lastly, we performed the largest GWAS and subsequent meta-analysis of mean adipocyte area and intra-individual adipocyte variation (N = 820). Despite having twice the number of samples than any similar study, we found no genome-wide significant associations, suggesting that larger sample sizes and a homogenous collection of adipose tissue are likely needed to identify robust genetic associations.This article is freely available via Open Access. Click on the Publisher URL to access it via the publisher's site.C.A.G received a pump priming grant from Novo Nordisk to carry out this work. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.published version, accepted versio

    Smoking induces coordinated DNA methylation and gene expression changes in adipose tissue with consequences for metabolic health

    Get PDF
    Abstract Background Tobacco smoking is a risk factor for multiple diseases, including cardiovascular disease and diabetes. Many smoking-associated signals have been detected in the blood methylome, but the extent to which these changes are widespread to metabolically relevant tissues, and impact gene expression or metabolic health, remains unclear. Methods We investigated smoking-associated DNA methylation and gene expression variation in adipose tissue biopsies from 542 healthy female twins. Replication, tissue specificity, and longitudinal stability of the smoking-associated effects were explored in additional adipose, blood, skin, and lung samples. We characterized the impact of adipose tissue smoking methylation and expression signals on metabolic disease risk phenotypes, including visceral fat. Results We identified 42 smoking-methylation and 42 smoking-expression signals, where five genes (AHRR, CYP1A1, CYP1B1, CYTL1, F2RL3) were both hypo-methylated and upregulated in current smokers. CYP1A1 gene expression achieved 95% prediction performance of current smoking status. We validated and replicated a proportion of the signals in additional primary tissue samples, identifying tissue-shared effects. Smoking leaves systemic imprints on DNA methylation after smoking cessation, with stronger but shorter-lived effects on gene expression. Metabolic disease risk traits such as visceral fat and android-to-gynoid ratio showed association with methylation at smoking markers with functional impacts on expression, such as CYP1A1, and at tissue-shared smoking signals, such as NOTCH1. At smoking-signals, BHLHE40 and AHRR DNA methylation and gene expression levels in current smokers were predictive of future gain in visceral fat upon smoking cessation. Conclusions Our results provide the first comprehensive characterization of coordinated DNA methylation and gene expression markers of smoking in adipose tissue. The findings relate to human metabolic health and give insights into understanding the widespread health consequence of smoking outside of the lung

    Fasting and time of day independently modulate circadian rhythm relevant gene expression in adipose and skin tissue

    Get PDF
    Abstract Background Intermittent fasting and time-restricted diets are associated with lower risk biomarkers for cardio-metabolic disease. The shared mechanisms underpinning the similar physiological response to these events is not established, but circadian rhythm could be involved. Here we investigated the transcriptional response to fasting in a large cross-sectional study of adipose and skin tissue from healthy volunteers (N = 625) controlling for confounders of circadian rhythm: time of day and season. Results We identified 367 genes in adipose and 79 in skin whose expression levels were associated (FDR < 5%) with hours of fasting conditionally independent of time of day and season, with 19 genes common to both tissues. Among these genes, we replicated 38 in human, 157 in non-human studies, and 178 are novel associations. Fasting-responsive genes were enriched for regulation of and response to circadian rhythm. We identified 99 genes in adipose and 54 genes in skin whose expression was associated to time of day; these genes were also enriched for circadian rhythm processes. In genes associated to both exposures the effect of time of day was stronger and in an opposite direction to that of hours fasted. We also investigated the relationship between fasting and genetic regulation of gene expression, including GxE eQTL analysis to identify personal responses to fasting. Conclusion This study robustly implicates circadian rhythm genes in the response to hours fasting independently of time of day, seasonality, age and BMI. We identified tissue-shared and tissue-specific differences in the transcriptional response to fasting in a large sample of healthy volunteers
    corecore