5,546 research outputs found
TREEOME: A framework for epigenetic and transcriptomic data integration to explore regulatory interactions controlling transcription
Motivation: Predictive modelling of gene expression is a powerful framework
for the in silico exploration of transcriptional regulatory interactions
through the integration of high-throughput -omics data. A major limitation of
previous approaches is their inability to handle conditional and synergistic
interactions that emerge when collectively analysing genes subject to different
regulatory mechanisms. This limitation reduces overall predictive power and
thus the reliability of downstream biological inference.
Results: We introduce an analytical modelling framework (TREEOME: tree of
models of expression) that integrates epigenetic and transcriptomic data by
separating genes into putative regulatory classes. Current predictive modelling
approaches have found both DNA methylation and histone modification epigenetic
data to provide little or no improvement in accuracy of prediction of
transcript abundance despite, for example, distinct anti-correlation between
mRNA levels and promoter-localised DNA methylation. To improve on this, in
TREEOME we evaluate four possible methods of formulating gene-level DNA
methylation metrics, which provide a foundation for identifying gene-level
methylation events and subsequent differential analysis, whereas most previous
techniques operate at the level of individual CpG dinucleotides. We demonstrate
TREEOME by integrating gene-level DNA methylation (bisulfite-seq) and histone
modification (ChIP-seq) data to accurately predict genome-wide mRNA transcript
abundance (RNA-seq) for H1-hESC and GM12878 cell lines.
Availability: TREEOME is implemented using open-source software and made
available as a pre-configured bootable reference environment. All scripts and
data presented in this study are available online at
http://sourceforge.net/projects/budden2015treeome/.Comment: 14 pages, 6 figure
Recommended from our members
Chromatin dysregulation and DNA methylation at transcription start sites associated with transcriptional repression in cancers.
Although promoter-associated CpG islands have been established as targets of DNA methylation changes in cancer, previous studies suggest that epigenetic dysregulation outside the promoter region may be more closely associated with transcriptional changes. Here we examine DNA methylation, chromatin marks, and transcriptional alterations to define the relationship between transcriptional modulation and spatial changes in chromatin structure. Using human papillomavirus-related oropharyngeal carcinoma as a model, we show aberrant enrichment of repressive H3K9me3 at the transcriptional start site (TSS) with methylation-associated, tumor-specific gene silencing. Further analysis identifies a hypermethylated subtype which shows a functional convergence on MYC targets and association with CREBBP/EP300 mutation. The tumor-specific shift to transcriptional repression associated with DNA methylation at TSSs was confirmed in multiple tumor types. Our data may show a common underlying epigenetic dysregulation in cancer associated with broad enrichment of repressive chromatin marks and aberrant DNA hypermethylation at TSSs in combination with MYC network activation
Profile analysis and prediction of tissue-specific CpG island methylation classes
<p>Abstract</p> <p>Background</p> <p>The computational prediction of DNA methylation has become an important topic in the recent years due to its role in the epigenetic control of normal and cancer-related processes. While previous prediction approaches focused merely on differences between methylated and unmethylated DNA sequences, recent experimental results have shown the presence of much more complex patterns of methylation across tissues and time in the human genome. These patterns are only partially described by a binary model of DNA methylation. In this work we propose a novel approach, based on profile analysis of tissue-specific methylation that uncovers significant differences in the sequences of CpG islands (CGIs) that predispose them to a tissue- specific methylation pattern.</p> <p>Results</p> <p>We defined CGI methylation profiles that separate not only between constitutively methylated and unmethylated CGIs, but also identify CGIs showing a differential degree of methylation across tissues and cell-types or a lack of methylation exclusively in sperm. These profiles are clearly distinguished by a number of CGI attributes including their evolutionary conservation, their significance, as well as the evolutionary evidence of prior methylation. Additionally, we assess profile functionality with respect to the different compartments of protein coding genes and their possible use in the prediction of DNA methylation.</p> <p>Conclusion</p> <p>Our approach provides new insights into the biological features that determine if a CGI has a functional role in the epigenetic control of gene expression and the features associated with CGI methylation susceptibility. Moreover, we show that the ability to predict CGI methylation is based primarily on the quality of the biological information used and the relationships uncovered between different sources of knowledge. The strategy presented here is able to predict, besides the constitutively methylated and unmethylated classes, two more tissue specific methylation classes conserving the accuracy provided by leading binary methylation classification methods.</p
Computational methods in cancer gene networking
In the past few years, many high-throughput techniques have been developed and applied to biological studies. These techniques such as “next generation” genome sequencing, chip-on-chip, microarray and so on can be used to measure gene expression and gene regulatory elements in a genome-wide scale. Moreover, as these technologies become more affordable and accessible, they have become a driving force in modern biology. As a result, huge amount biological data have been produced, with the expectation of increasing number of such datasets to be generated in the future. High-throughput data are more comprehensive and unbiased, but ‘real signals’ or biological insights, molecular mechanisms and biological principles are buried in the flood of data. In current biological studies, the bottleneck is no longer a lack of data, but the lack of ingenuity and computational means to extract biological insights and principles by integrating knowledge and high-throughput data. 

Here I am reviewing the concepts and principles of network biology and the computational methods which can be applied to cancer research. Furthermore, I am providing a practical guide for computational analysis of cancer gene networks
Exploring Patterns of Epigenetic Information With Data Mining Techniques
[Abstract] Data mining, a part of the Knowledge Discovery in Databases process (KDD), is the process of extracting patterns from large data sets by combining methods from statistics and artificial intelligence with database management. Analyses of epigenetic data have evolved towards genome-wide and high-throughput approaches, thus generating great amounts of data for which data mining is essential. Part of these data may contain patterns of epigenetic information which are mitotically and/or meiotically heritable determining gene expression and cellular differentiation, as well as cellular fate. Epigenetic lesions and genetic mutations are acquired by individuals during their life and accumulate with ageing. Both defects, either together or individually, can result in losing control over cell growth and, thus, causing cancer development. Data mining techniques could be then used to extract the previous patterns. This work reviews some of the most important applications of data mining to epigenetics.Programa Iberoamericano de Ciencia y TecnologĂa para el Desarrollo; 209RT-0366Galicia. ConsellerĂa de EconomĂa e Industria; 10SIN105004PRInstituto de Salud Carlos III; RD07/0067/000
Profile analysis and prediction of tissue-specific CpG island methylation classes
Background: The computational prediction of DNA methylation has become an important topic in the recent years due to its role in the epigenetic control of normal and cancer-related processes. While previous prediction approaches focused merely on differences between methylated and unmethylated DNA sequences, recent experimental results have shown the presence of much more complex patterns of methylation across tissues and time in the human genome. These patterns are only partially described by a binary model of DNA methylation. In this work we propose a novel approach, based on profile analysis of tissue-specific methylation that uncovers significant differences in the sequences of CpG islands (CGIs) that predispose them to a tissuespecific methylation pattern. Results: We defined CGI methylation profiles that separate not only between constitutively methylated and unmethylated CGIs, but also identify CGIs showing a differential degree of methylation across tissues and cell-types or a lack of methylation exclusively in sperm. These profiles are clearly distinguished by a number of CGI attributes including their evolutionary conservation, their significance, as well as the evolutionary evidence of prior methylation. Additionally, we assess profile functionality with respect to the different compartments of protein coding genes and their possible use in the prediction of DNA methylation. Conclusion: Our approach provides new insights into the biological features that determine if a CGI has a functional role in the epigenetic control of gene expression and the features associated with CGI methylation susceptibility. Moreover, we show that the ability to predict CGI methylation is based primarily on the quality of the biological information used and the relationships uncovered between different sources of knowledge. The strategy presented here is able to predict, besides the constitutively methylated and unmethylated classes, two more tissue specific methylation classes conserving the accuracy provided by leading binary methylation classification methods.publishedVersionPeer Reviewe
HOME: A histogram based machine learning approach for effective identification of differentially methylated regions
Background
The development of whole genome bisulfite sequencing has made it possible to identify methylation differences at single base resolution throughout an entire genome. However, a persistent challenge in DNA methylome analysis is the accurate identification of differentially methylated regions (DMRs) between samples. Sensitive and specific identification of DMRs among different conditions requires accurate and efficient algorithms, and while various tools have been developed to tackle this problem, they frequently suffer from inaccurate DMR boundary identification and high false positive rate.
Results
We present a novel Histogram Of MEthylation (HOME) based method that takes into account the inherent difference in the distribution of methylation levels between DMRs and non-DMRs to discriminate between the two using a Support Vector Machine. We show that generated features used by HOME are dataset-independent such that a classifier trained on, for example, a mouse methylome training set of regions of differentially accessible chromatin, can be applied to any other organism’s dataset and identify accurate DMRs. We demonstrate that DMRs identified by HOME exhibit higher association with biologically relevant genes, processes, and regulatory events compared to the existing methods. Moreover, HOME provides additional functionalities lacking in most of the current DMR finders such as DMR identification in non-CG context and time series analysis. HOME is freely available at https://github.com/ListerLab/HOME .
Conclusion
HOME produces more accurate DMRs than the current state-of-the-art methods on both simulated and biological datasets. The broad applicability of HOME to identify accurate DMRs in genomic data from any organism will have a significant impact upon expanding our knowledge of how DNA methylation dynamics affect cell development and differentiation.This work was supported by the Australian Research Council (ARC) Centre of
Excellence program in Plant Energy Biology (CE140100008). RL was
supported by a Sylvia and Charles Viertel Senior Medical Research
Fellowship, ARC Future Fellowship (FT120100862), and Howard Hughes
Medical Institute International Research Scholarship (RL
Hidden genetic variation in LCA9-associated congenital blindness explained by 5′UTR mutations and copy-number variations of NMNAT1
Leber congenital amaurosis (LCA) is a severe autosomal-recessive retinal dystrophy leading to congenital blindness. A recently identified LCA gene is NMNAT1, located in the LCA9 locus. Although most mutations in blindness genes are coding variations, there is accumulating evidence for hidden noncoding defects or structural variations (SVs). The starting point of this study was an LCA9-associated consanguineous family in which no coding mutations were found in the LCA9 region. Exploring the untranslated regions of NMNAT1 revealed a novel homozygous 5'UTR variant, c.-70A>T. Moreover, an adjacent 5'UTR variant, c.-69C>T, was identified in a second consanguineous family displaying a similar phenotype. Both 5'UTR variants resulted in decreased NMNAT1 mRNA abundance in patients' lymphocytes, and caused decreased luciferase activity in human retinal pigment epithelial RPE-1 cells. Second, we unraveled pseudohomozygosity of a coding NMNAT1 mutation in two unrelated LCA patients by the identification of two distinct heterozygous partial NMNAT1 deletions. Molecular characterization of the breakpoint junctions revealed a complex Alu-rich genomic architecture. Our study uncovered hidden genetic variation in NMNAT1-associated LCA and emphasized a shift from coding to noncoding regulatory mutations and repeat-mediated SVs in the molecular pathogenesis of heterogeneous recessive disorders such as hereditary blindness
Recommended from our members
Prediagnostic breast milk DNA methylation alterations in women who develop breast cancer
Prior candidate gene studies have shown tumor suppressor DNA methylation in breast milk related with history of breast biopsy, an established risk factor for breast cancer. To further establish the utility of breast milk as a tissue-specific biospecimen for investigations of breast carcinogenesis, we measured genome-wide DNA methylation in breast milk from women with and without a diagnosis of breast cancer in two independent cohorts. DNA methylation was assessed using Illumina HumanMethylation450k in 87 breast milk samples. Through an epigenome-wide association study we explored CpG sites associated with a breast cancer diagnosis in the prospectively collected milk samples from the breast that would develop cancer compared with women without a diagnosis of breast cancer using linear mixed effects models adjusted for history of breast biopsy, age, RefFreeCellMix cell estimates, time of delivery, array chip and subject as random effect. We identified 58 differentially methylated CpG sites associated with a subsequent breast cancer diagnosis (q-value \u3c0.05). Nearly all CpG sites associated with a breast cancer diagnosis were hypomethylated in cases compared with controls and were enriched for CpG islands. In addition, inferred repeat element methylation was lower in breast milk DNA from cases compared to controls, and cases exhibited increased estimated epigenetic mitotic tick rate as well as DNA methylation age compared with controls. Breast milk has utility as a biospecimen for prospective assessment of disease risk, for understanding the underlying molecular basis of breast cancer risk factors and improving primary and secondary prevention of breast cancer
- …