1,283 research outputs found

    BayesCCE: a Bayesian framework for estimating cell-type composition from DNA methylation without the need for methylation reference.

    Get PDF
    We introduce a Bayesian semi-supervised method for estimating cell counts from DNA methylation by leveraging an easily obtainable prior knowledge on the cell-type composition distribution of the studied tissue. We show mathematically and empirically that alternative methods which attempt to infer cell counts without methylation reference only capture linear combinations of cell counts rather than provide one component per cell type. Our approach allows the construction of components such that each component corresponds to a single cell type, and provides a new opportunity to investigate cell compositions in genomic studies of tissues for which it was not possible before

    A systematic assessment of cell type deconvolution algorithms for DNA methylation data

    Get PDF
    We performed systematic assessment of computational deconvolution methods that play an important role in the estimation of cell type proportions from bulk methylation data. The proposed framework methylDeConv (available as an R package) integrates several deconvolution methods for methylation profiles (Illumina HumanMethylation450 and MethylationEPIC arrays) and offers different cell-type-specific CpG selection to construct the extended reference library which incorporates the main immune cell subsets, epithelial cells and cell-free DNAs. We compared the performance of different deconvolution algorithms via simulations and benchmark datasets and further investigated the associations of the estimated cell type proportions to cancer therapy in breast cancer and subtypes in melanoma methylation case studies. Our results indicated that the deconvolution based on the extended reference library is critical to obtain accurate estimates of cell proportions in non-blood tissues.U01 OH011478/OH/NIOSH CDC HHS/United StatesU01 OH012257/OH/NIOSH CDC HHS/United StatesU01OH011478/ACL HHS/United StatesU01 OH011478/OH/NIOSH CDC HHS/United State

    Bayesian reassessment of the epigenetic architecture of complex traits

    Get PDF
    Linking epigenetic marks to clinical outcomes improves insight into molecular processes, disease prediction, and therapeutic target identification. Here, a statistical approach is presented to infer the epigenetic architecture of complex disease, determine the variation captured by epigenetic effects, and estimate phenotype-epigenetic probe associations jointly. Implicitly adjusting for probe correlations, data structure (cell-count or relatedness), and single-nucleotide polymorphism (SNP) marker effects, improves association estimates and in 9,448 individuals, 75.7% (95% CI 71.70–79.3) of body mass index (BMI) variation and 45.6% (95% CI 37.3–51.9) of cigarette consumption variation was captured by whole blood methylation array data. Pathway-linked probes of blood cholesterol, lipid transport and sterol metabolism for BMI, and xenobiotic stimuli response for smoking, showed >1.5 times larger associations with >95% posterior inclusion probability. Prediction accuracy improved by 28.7% for BMI and 10.2% for smoking over a LASSO model, with age-, and tissue-specificity, implying associations are a phenotypic consequence rather than causal

    Extracting information from high-throughput gene expression data with pathway analysis and deconvolution

    Get PDF
    Modern technologies allow for the collection of large biological datasets that can be utilised for diverse health-related applications. However, to extract useful information from such data, computational methods are needed. The field that develops and explores methods to analyse biological data is called bioinformatics. In this thesis I evaluate different bioinformatic methods and introduce novel ones related to processing gene expression data. Gene expression data reflects how active different genes are in a set of measured biological samples. These samples can be for example blood from human individuals, tissue samples from tumours and the corresponding healthy tissue, or brain samples from mice with different neural diseases. This thesis covers two topics, pathway analysis and deconvolution, related to downstream analysis of gene expression data. Notably, this summary does not repeat in detail the same points made in the original publications, but aims to provide a comprehensive overview of the current knowledge of the two wider topics. The original publications focus on comparing and evaluating the available methods as well as presenting new ones that cover some previously untouched features. While the terms ’pathway analysis’ and ’deconvolution’ have been used with alternative definitions in other fields, in the context of this thesis, pathway analysis refers to estimating the activity of pathways, i.e. interaction networks body uses to react to different signals, based on given gene expression data and structural information of the relevant pathways. I focus on different types of analysis methods and their varying goals, requirements, and underlying statistical approaches. In addition, the strengths and weaknesses of the concept of pathway analysis are briefly discussed. The first two original publications I and II empirically compare different types of pathway methods and introduce a novel one. In the paper I, the tested methods are evaluated from different perspectives, and in the paper II, a novel method is introduced and its performance demonstrated against alternative tools. Many biological samples contain a variety of cell types and here, deconvolution means computationally extracting cell type composition or cell type specific expression from bulk samples. The deconvolution sections of this thesis also focus on a general overview of the topic and the available computational methodology. As deconvolution is challenging, I discuss the factors affecting its accuracy as well as alternative wet lab approaches to obtain cell type specific information. The first original publication about deconvolution (publication III) introduces a novel method and evaluates it against the other available tools. The second (publication IV) focuses on identifying cell type specific differences between sample groups, which is a particularly difficult task

    Statistical and integrative system-level analysis of DNA methylation data

    Get PDF
    Epigenetics plays a key role in cellular development and function. Alterations to the epigenome are thought to capture and mediate the effects of genetic and environmental risk factors on complex disease. Currently, DNA methylation is the only epigenetic mark that can be measured reliably and genome-wide in large numbers of samples. This Review discusses some of the key statistical challenges and algorithms associated with drawing inferences from DNA methylation data, including cell-type heterogeneity, feature selection, reverse causation and system-level analyses that require integration with other data types such as gene expression, genotype, transcription factor binding and other epigenetic information

    Brief Bioinform

    Get PDF
    We performed systematic assessment of computational deconvolution methods that play an important role in the estimation of cell type proportions from bulk methylation data. The proposed framework methylDeConv (available as an R package) integrates several deconvolution methods for methylation profiles (Illumina HumanMethylation450 and MethylationEPIC arrays) and offers different cell-type-specific CpG selection to construct the extended reference library which incorporates the main immune cell subsets, epithelial cells and cell-free DNAs. We compared the performance of different deconvolution algorithms via simulations and benchmark datasets and further investigated the associations of the estimated cell type proportions to cancer therapy in breast cancer and subtypes in melanoma methylation case studies. Our results indicated that the deconvolution based on the extended reference library is critical to obtain accurate estimates of cell proportions in non-blood tissues.U01 OH011478/OH/NIOSH CDC HHSUnited States/U01 OH012257/OH/NIOSH CDC HHSUnited States/U01OH011478/ACL HHSUnited States

    Epigenetic prediction of complex traits and mortality in a cohort of individuals with oropharyngeal cancer

    Get PDF
    Background DNA methylation (DNAm) variation is an established predictor for several traits. In the context of oropharyngeal cancer (OPC), where 5-year survival is ~ 65%, DNA methylation may act as a prognostic biomarker. We examined the accuracy of DNA methylation biomarkers of 4 complex exposure traits (alcohol consumption, body mass index [BMI], educational attainment and smoking status) in predicting all-cause mortality in people with OPC. Results DNAm predictors of alcohol consumption, BMI, educational attainment and smoking status were applied to 364 individuals with OPC in the Head and Neck 5000 cohort (HN5000; 19.6% of total OPC cases in the study), followed up for median 3.9 years; inter-quartile range (IQR) 3.3 to 5.2 years (time-to-event—death or censor). The proportion of phenotypic variance explained in each trait was as follows: 16.5% for alcohol consumption, 22.7% for BMI, 0.4% for educational attainment and 51.1% for smoking. We then assessed the relationship between each DNAm predictor and all-cause mortality using Cox proportional-hazard regression analysis. DNAm prediction of smoking was most consistently associated with mortality risk (hazard ratio [HR], 1.38 per standard deviation (SD) increase in smoking DNAm score; 95% confidence interval [CI] 1.04 to 1.83; P 0.025, in a model adjusted for demographic, lifestyle, health and biological variables). Finally, we examined the accuracy of each DNAm predictor of mortality. DNAm predictors explained similar levels of variance in mortality to self-reported phenotypes. Receiver operator characteristic (ROC) curves for the DNAm predictors showed a moderate discrimination of alcohol consumption (area under the curve [AUC] 0.63), BMI (AUC 0.61) and smoking (AUC 0.70) when predicting mortality. The DNAm predictor for education showed poor discrimination (AUC 0.57). Z tests comparing AUCs between self-reported phenotype ROC curves and DNAm score ROC curves did not show evidence for difference between the two (alcohol consumption P 0.41, BMI P 0.62, educational attainment P 0.49, smoking P 0.19). Conclusions In the context of a clinical cohort of individuals with OPC, DNAm predictors for smoking, alcohol consumption, educational attainment and BMI exhibit similar predictive values for all-cause mortality compared to self-reported data. These findings may have translational utility in prognostic model development, particularly where phenotypic data are not available
    corecore