29 research outputs found

    Machine learning for epigenetics and future medical applications

    No full text
    Understanding epigenetic processes holds immense promise for medical applications. Advances in Machine Learning (ML) are critical to realize this promise. Previous studies used epigenetic data sets associated with the germline transmission of epigenetic transgenerational inheritance of disease and novel ML approaches to predict genome-wide locations of critical epimutations. A combination of Active Learning (ACL) and Imbalanced Class Learning (ICL) was used to address past problems with ML to develop a more efficient feature selection process and address the imbalance problem in all genomic data sets. The power of this novel ML approach and our ability to predict epigenetic phenomena and associated disease is suggested. The current approach requires extensive computation of features over the genome. A promising new approach is to introduce Deep Learning (DL) for the generation and simultaneous computation of novel genomic features tuned to the classification task. This approach can be used with any genomic or biological data set applied to medicine. The application of molecular epigenetic data in advanced machine learning analysis to medicine is the focus of this review

    Imbalanced Class Learning in Epigenetics

    No full text
    In machine learning, one of the important criteria for higher classification accuracy is a balanced dataset. Datasets with a large ratio between minority and majority classes face hindrance in learning using any classifier. Datasets having a magnitude difference in number of instances between the target concept result in an imbalanced class distribution. Such datasets can range from biological data, sensor data, medical diagnostics, or any other domain where labeling any instances of the minority class can be time-consuming or costly or the data may not be easily available. The current study investigates a number of imbalanced class algorithms for solving the imbalanced class distribution present in epigenetic datasets. Epigenetic (DNA methylation) datasets inherently come with few differentially DNA methylated regions (DMR) and with a higher number of non-DMR sites. For this class imbalance problem, a number of algorithms are compared, including the TAN+AdaBoost algorithm. Experiments performed on four epigenetic datasets and several known datasets show that an imbalanced dataset can have similar accuracy as a regular learner on a balanced dataset

    Genome-Wide Locations of Potential Epimutations Associated with Environmentally Induced Epigenetic Transgenerational Inheritance of Disease Using a Sequential Machine Learning Prediction Approach

    No full text
    Environmentally induced epigenetic transgenerational inheritance of disease and phenotypic variation involves germline transmitted epimutations. The primary epimutations identified involve altered differential DNA methylation regions (DMRs). Different environmental toxicants have been shown to promote exposure (i.e., toxicant) specific signatures of germline epimutations. Analysis of genomic features associated with these epimutations identified low-density CpG regions (<3 CpG / 100bp) termed CpG deserts and a number of unique DNA sequence motifs. The rat genome was annotated for these and additional relevant features. The objective of the current study was to use a machine learning computational approach to predict all potential epimutations in the genome. A number of previously identified sperm epimutations were used as training sets. A novel machine learning approach using a sequential combination of Active Learning and Imbalance Class Learner analysis was developed. The transgenerational sperm epimutation analysis identified approximately 50K individual sites with a 1 kb mean size and 3,233 regions that had a minimum of three adjacent sites with a mean size of 3.5 kb. A select number of the most relevant genomic features were identified with the low density CpG deserts being a critical genomic feature of the features selected. A similar independent analysis with transgenerational somatic cell epimutation training sets identified a smaller number of 1,503 regions of genome-wide predicted sites and differences in genomic feature contributions. The predicted genome-wide germline (sperm) epimutations were found to be distinct from the predicted somatic cell epimutations. Validation of the genome-wide germline predicted sites used two recently identified transgenerational sperm epimutation signature sets from the pesticides dichlorodiphenyltrichloroethane (DDT) and methoxychlor (MXC) exposure lineage F3 generation. Analysis of this positive validation data set showed a 100% prediction accuracy for all the DDT-MXC sperm epimutations. Observations further elucidate the genomic features associated with transgenerational germline epimutations and identify a genome-wide set of potential epimutations that can be used to facilitate identification of epigenetic diagnostics for ancestral environmental exposures and disease susceptibility

    Environmentally induced epigenetic transgenerational inheritance of sperm epimutations promote genetic mutations

    No full text
    A variety of environmental factors have been shown to induce the epigenetic transgenerational inheritance of disease and phenotypic variation. This involves the germline transmission of epigenetic information between generations. Exposure specific transgenerational sperm epimutations have been previously observed. The current study was designed to investigate the potential role genetic mutations have in the process, using copy number variations (CNV). In the first (F1) generation following exposure, negligible CNV were identified; however, in the transgenerational F3 generation, a significant increase in CNV was observed in the sperm. The genome-wide locations of differential DNA methylation regions (epimutations) and genetic mutations (CNV) were investigated. Observations suggest the environmental induction of the epigenetic transgenerational inheritance of sperm epimutations promote genome instability, such that genetic CNV mutations are acquired in later generations. A combination of epigenetics and genetics is suggested to be involved in the transgenerational phenotypes. The ability of environmental factors to promote epigenetic inheritance that subsequently promotes genetic mutations is a significant advance in our understanding of how the environment impacts disease and evolution

    Chromosomal plot of somatic cell dataset SG shows the predicted 3+ sites and the clusters.

    No full text
    <p>Potential predicted DMR sites (1,503) when SG is used as the training set to predict on the rest of the genome. X-axis shows each of the 21 chromosomes while Y-axis shows the length of the chromosome with predicted potential DMR locations. Red lines in the bottom are shown as potential DMR sites and clusters (44) with blue boxes are shown on the top of each chromosomes.</p

    Predictive power of repeat elements accuracy based on genomic location of 1k, 5k, 100k from the DMR.

    No full text
    <p>(A) Combined average when each group of repeat elements are used for prediction for DHVPP dataset. (B) Combined average when each group of repeat elements are used for prediction for SG dataset. Shows combined repeat elements in the 100k, 5k and 1k upstream and downstream regions.</p

    Environmentally induced epigenetic transgenerational inheritance of altered SRY genomic binding during gonadal sex determination

    No full text
    A critical transcription factor required for mammalian male sex determination is sex determining region on the Y chromosome (SRY). The expression of SRY in precursor Sertoli cells is one of the initial events in testis development. This study was designed to determine the impact of environmentally induced epigenetic transgenerational inheritance on SRY binding during gonadal sex determination in the male. The agricultural fungicide vinclozolin and vehicle control (dimethyl sulfoxide)-exposed gestating females (F0 generation) during gonadal sex determination promoted the transgenerational inheritance of differential DNA methylation in sperm of the F3 generation (great grand-offspring). The fetal gonads in F3 generation males were used to identify potential alterations in SRY binding sites in the developing Sertoli cells. Chromatin immunoprecipitation with an SRY antibody followed by genome-wide promoter tiling array (ChIP-Chip) was used to identify alterations in SRY binding. A total of 81 adjacent oligonucleotide sites and 173 single oligo SRY binding sites were identified to be altered transgenerationally in the Sertoli cell vinclozolin lineage F3 generation males. Observations demonstrate the majority of the previously identified normal SRY binding sites were not altered and the altered SRY binding sites were novel and new additional sites. The chromosomal locations, gene associations and potentially modified cellular pathways were investigated. In summary, environmentally induced epigenetic transgenerational inheritance of germline epimutations appears to alter the cellular differentiation and development of the precursor Sertoli cell SRY binding during gonadal sex determination that may influence the developmental origins of adult onset testis disease observed

    Genomic chromosome locations of predicted DMR and overlap between germ cell and somatic cell predicted sites.

    No full text
    <p>(A) Germ cell DHVPP and somatic cell SG predicted number of (+3) sites in each chromosome. (B) Germ cell DHVPP and somatic cell SG predicted number of single sites in each chromosome. (C) Overlap between predicted DMR (sites) from the two different datasets. (D) Overlap between predicted DMR (sites) from the two different datasets.</p

    Genomic Clustering of differential DNA methylated regions (epimutations) associated with the epigenetic transgenerational inheritance of disease and phenotypic variation

    Get PDF
    A variety of environmental factors have been shown to promote the epigenetic transgenerational inheritance of disease and phenotypic variation in numerous species. Exposure to environmental factors such as toxicants can promote epigenetic changes (epimutations) involving alterations in DNA methylation to produce specific differential DNA methylation regions (DMRs). The germline (e.g. sperm) transmission of epimutations is associated with epigenetic transgenerational inheritance phenomena. The current study was designed to determine the genomic locations of environmentally induced transgenerational DMRs and assess their potential clustering. The exposure specific DMRs (epimutations) from a number of different studies were used. The clustering approach identified areas of the genome that have statistically significant over represented numbers of epimutations. The location of DMR clusters was compared to the gene clusters of differentially expressed genes found in tissues and cells associated with the transgenerational inheritance of disease. Such gene clusters, termed epigenetic control regions (ECRs), have been previously suggested to regulate gene expression in regions spanning up to 2-5 million bases. DMR clusters were often found to associate with inherent gene clusters within the genome. The current study used a number of epigenetic datasets from previous studies to identify novel DMR clusters across the genome. Observations suggest these clustered DMR within an ECR may be susceptible to epigenetic reprogramming and dramatically influence genome activity

    CpG density plot showing number of predicted DMR sites correlated with CpG density.

    No full text
    <p>(A) CpG density from the potential predicted germ cell DMR sites (3,234) when DHVPP is used as the training set to predict genome-wide. (B) CpG density from the potential predicted somatic cell DMR sites (1,502) when SG is used as the training set to predict genome-wide. X-axis shows the number of CpG's per 100bases on average while Y-axis shows the number of sites.</p
    corecore