26 research outputs found
Contrastive Mixture of Posteriors for Counterfactual Inference, Data Integration and Fairness
Learning meaningful representations of data that can address challenges such
as batch effect correction, data integration and counterfactual inference is a
central problem in many domains including computational biology. Adopting a
Conditional VAE framework, we identify the mathematical principle that unites
these challenges: learning a representation that is marginally independent of a
condition variable. We therefore propose the Contrastive Mixture of Posteriors
(CoMP) method that uses a novel misalignment penalty to enforce this
independence. This penalty is defined in terms of mixtures of the variational
posteriors themselves, unlike prior work which uses external discrepancy
measures such as MMD to ensure independence in latent space. We show that CoMP
has attractive theoretical properties compared to previous approaches,
especially when there is complex global structure in latent space. We further
demonstrate state of the art performance on a number of real-world problems,
including the challenging tasks of aligning human tumour samples with cancer
cell-lines and performing counterfactual inference on single-cell RNA
sequencing data. Incidentally, we find parallels with the fair representation
learning literature, and demonstrate CoMP has competitive performance in
learning fair yet expressive latent representations
Towards Deep Cellular Phenotyping in Placental Histology
The placenta is a complex organ, playing multiple roles during fetal
development. Very little is known about the association between placental
morphological abnormalities and fetal physiology. In this work, we present an
open sourced, computationally tractable deep learning pipeline to analyse
placenta histology at the level of the cell. By utilising two deep
Convolutional Neural Network architectures and transfer learning, we can
robustly localise and classify placental cells within five classes with an
accuracy of 89%. Furthermore, we learn deep embeddings encoding phenotypic
knowledge that is capable of both stratifying five distinct cell populations
and learn intraclass phenotypic variance. We envisage that the automation of
this pipeline to population scale studies of placenta histology has the
potential to improve our understanding of basic cellular placental biology and
its variations, particularly its role in predicting adverse birth outcomes.Comment: Updated MRC funding material. Corrected typo that suggested
ensembling and Inception accuracy were the same (updated to reflect the fact
the ensemble model is 1% better than previously reported
Metabolomic profiling to dissect the role of visceral fat in cardiometabolic health
OBJECTIVE: Abdominal obesity is associated with increased risk of type 2 diabetes (T2D) and cardiovascular disease. The aim of this study was to assess whether metabolomic markers of T2D and blood pressure (BP) act on these traits via visceral fat (VF) mass. METHODS: Metabolomic profiling of 280 fasting plasma metabolites was conducted on 2,401 women from TwinsUK. The overlap was assessed between published metabolites associated with T2D, insulin resistance, or BP and those that were identified to be associated with VF (after adjustment for covariates) measured by dualâenergy Xâray absorptiometry. RESULTS: In addition to glucose, six metabolites were strongly associated with both VF mass and T2D: lactate and branchedâchain amino acids, all of them related to metabolism and the tricarboxylic acid cycle; on average, 38.5% of their association with insulin resistance was mediated by their association with VF mass. Five metabolites were associated with BP and VF mass including the inflammationâassociated peptide HWESASXX, the steroid hormone androstenedione, lactate, and palmitate. On average, 29% of their effect on BP was mediated by their association with VF mass. CONCLUSIONS: Little overlap was found between the metabolites associated with BP and those associated with insulin resistance via VF mass
An automatic entropy method to efficiently mask histology whole-slide images
Tissue segmentation of histology whole-slide images (WSI) remains a critical task in automated digital pathology workflows for both accurate disease diagnosis and deep phenotyping for research purposes. This is especially challenging when the tissue structure of biospecimens is relatively porous and heterogeneous, such as for atherosclerotic plaques. In this study, we developed a unique approach called 'EntropyMasker' based on image entropy to tackle the fore- and background segmentation (masking) task in histology WSI. We evaluated our method on 97 high-resolution WSI of human carotid atherosclerotic plaques in the Athero-Express Biobank Study, constituting hematoxylin and eosin and 8 other staining types. Using multiple benchmarking metrics, we compared our method with four widely used segmentation methods: Otsu's method, Adaptive mean, Adaptive Gaussian and slideMask and observed that our method had the highest sensitivity and Jaccard similarity index. We envision EntropyMasker to fill an important gap in WSI preprocessing, machine learning image analysis pipelines, and enable disease phenotyping beyond the field of atherosclerosis
Machine Learning based histology phenotyping to investigate the epidemiologic and genetic basis of adipocyte morphology and cardiometabolic traits
Genetic studies have recently highlighted the importance of fat distribution, as well as overall adiposity, in the pathogenesis of obesity-associated diseases. Using a large study (n = 1,288) from 4 independent cohorts, we aimed to investigate the relationship between mean adipocyte area and obesity-related traits, and identify genetic factors associated with adipocyte cell size. To perform the first large-scale study of automatic adipocyte phenotyping using both histological and genetic data, we developed a deep learning-based method, the Adipocyte U-Net, to rapidly derive mean adipocyte area estimates from histology images. We validate our method using three state-of-the-art approaches; CellProfiler, Adiposoft and floating adipocytes fractions, all run blindly on two external cohorts. We observe high concordance between our method and the state-of-the-art approaches (Adipocyte U-net vs. CellProfiler: R2visceral = 0.94, P < 2.2 Ă 10-16, R2subcutaneous = 0.91, P < 2.2 Ă 10-16), and faster run times (10,000 images: 6mins vs 3.5hrs). We applied the Adipocyte U-Net to 4 cohorts with histology, genetic, and phenotypic data (total N = 820). After meta-analysis, we found that mean adipocyte area positively correlated with body mass index (BMI) (Psubq = 8.13 Ă 10-69, ÎČsubq = 0.45; Pvisc = 2.5 Ă 10-55, ÎČvisc = 0.49; average R2 across cohorts = 0.49) and that adipocytes in subcutaneous depots are larger than their visceral counterparts (Pmeta = 9.8 Ă 10-7). Lastly, we performed the largest GWAS and subsequent meta-analysis of mean adipocyte area and intra-individual adipocyte variation (N = 820). Despite having twice the number of samples than any similar study, we found no genome-wide significant associations, suggesting that larger sample sizes and a homogenous collection of adipose tissue are likely needed to identify robust genetic associations.This article is freely available via Open Access. Click on the Publisher URL to access it via the publisher's site.C.A.G received a pump priming grant from Novo Nordisk to carry out this work. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.published version, accepted versio
Smoking induces coordinated DNA methylation and gene expression changes in adipose tissue with consequences for metabolic health
Abstract
Background
Tobacco smoking is a risk factor for multiple diseases, including cardiovascular disease and diabetes. Many smoking-associated signals have been detected in the blood methylome, but the extent to which these changes are widespread to metabolically relevant tissues, and impact gene expression or metabolic health, remains unclear.
Methods
We investigated smoking-associated DNA methylation and gene expression variation in adipose tissue biopsies from 542 healthy female twins. Replication, tissue specificity, and longitudinal stability of the smoking-associated effects were explored in additional adipose, blood, skin, and lung samples. We characterized the impact of adipose tissue smoking methylation and expression signals on metabolic disease risk phenotypes, including visceral fat.
Results
We identified 42 smoking-methylation and 42 smoking-expression signals, where five genes (AHRR, CYP1A1, CYP1B1, CYTL1, F2RL3) were both hypo-methylated and upregulated in current smokers. CYP1A1 gene expression achieved 95% prediction performance of current smoking status. We validated and replicated a proportion of the signals in additional primary tissue samples, identifying tissue-shared effects. Smoking leaves systemic imprints on DNA methylation after smoking cessation, with stronger but shorter-lived effects on gene expression. Metabolic disease risk traits such as visceral fat and android-to-gynoid ratio showed association with methylation at smoking markers with functional impacts on expression, such as CYP1A1, and at tissue-shared smoking signals, such as NOTCH1. At smoking-signals, BHLHE40 and AHRR DNA methylation and gene expression levels in current smokers were predictive of future gain in visceral fat upon smoking cessation.
Conclusions
Our results provide the first comprehensive characterization of coordinated DNA methylation and gene expression markers of smoking in adipose tissue. The findings relate to human metabolic health and give insights into understanding the widespread health consequence of smoking outside of the lung
Fasting and time of day independently modulate circadian rhythm relevant gene expression in adipose and skin tissue
Abstract Background Intermittent fasting and time-restricted diets are associated with lower risk biomarkers for cardio-metabolic disease. The shared mechanisms underpinning the similar physiological response to these events is not established, but circadian rhythm could be involved. Here we investigated the transcriptional response to fasting in a large cross-sectional study of adipose and skin tissue from healthy volunteers (Nâ=â625) controlling for confounders of circadian rhythm: time of day and season. Results We identified 367 genes in adipose and 79 in skin whose expression levels were associated (FDRâ<â5%) with hours of fasting conditionally independent of time of day and season, with 19 genes common to both tissues. Among these genes, we replicated 38 in human, 157 in non-human studies, and 178 are novel associations. Fasting-responsive genes were enriched for regulation of and response to circadian rhythm. We identified 99 genes in adipose and 54 genes in skin whose expression was associated to time of day; these genes were also enriched for circadian rhythm processes. In genes associated to both exposures the effect of time of day was stronger and in an opposite direction to that of hours fasted. We also investigated the relationship between fasting and genetic regulation of gene expression, including GxE eQTL analysis to identify personal responses to fasting. Conclusion This study robustly implicates circadian rhythm genes in the response to hours fasting independently of time of day, seasonality, age and BMI. We identified tissue-shared and tissue-specific differences in the transcriptional response to fasting in a large sample of healthy volunteers