4 research outputs found

    An explainable artificial intelligence approach for decoding the enhancer histone modifications code and identification of novel enhancers in Drosophila

    Get PDF
    Background Enhancers are non-coding regions of the genome that control the activity of target genes. Recent efforts to identify active enhancers experimentally and in silico have proven effective. While these tools can predict the locations of enhancers with a high degree of accuracy, the mechanisms underpinning the activity of enhancers are often unclear. Results Using machine learning (ML) and a rule-based explainable artificial intelligence (XAI) model, we demonstrate that we can predict the location of known enhancers in Drosophila with a high degree of accuracy. Most importantly, we use the rules of the XAI model to provide insight into the underlying combinatorial histone modifications code of enhancers. In addition, we identified a large set of putative enhancers that display the same epigenetic signature as enhancers identified experimentally. These putative enhancers are enriched in nascent transcription, divergent transcription and have 3D contacts with promoters of transcribed genes. However, they display only intermediary enrichment of mediator and cohesin complexes compared to previously characterised active enhancers. We also found that 10–15% of the predicted enhancers display similar characteristics to super enhancers observed in other species. Conclusions Here, we applied an explainable AI model to predict enhancers with high accuracy. Most importantly, we identified that different combinations of epigenetic marks characterise different groups of enhancers. Finally, we discovered a large set of putative enhancers which display similar characteristics with previously characterised active enhancers

    Typing myalgic encephalomyelitis by infection at onset: A DecodeME study [version 4; peer review: 2 approved]

    Get PDF
    Background: People with myalgic encephalomyelitis / chronic fatigue syndrome (ME/CFS) experience core symptoms of post-exertional malaise, unrefreshing sleep, and cognitive impairment. Despite numbering 0.2-0.4% of the population, no laboratory test is available for their diagnosis, no effective therapy exists for their treatment, and no scientific breakthrough regarding pathogenesis has been made. It remains unknown, despite decades of small-scale studies, whether individuals experience different types of ME/CFS separated by onset-type, sex or age. Methods: DecodeME is a large population-based study of ME/CFS that recruited 17,074 participants in the first 3 months following full launch. Detailed questionnaire responses from UK-based participants who all reported being diagnosed with ME/CFS by a health professional provided an unparalleled opportunity to investigate, using logistic regression, whether ME/CFS severity or onset type is significantly associated with sex, age, illness duration, comorbid conditions or symptoms. Results: The well-established sex-bias among ME/CFS patients is evident in the initial DecodeME cohort: 83.5% of participants were females. What was not known previously was that females tend to have more comorbidities than males. Moreover, being female, being older and being over 10 years from ME/CFS onset are significantly associated with greater severity.  Five different ME/CFS onset types were examined in the self-reported data: those with ME/CFS onset (i) after glandular fever (infectious mononucleosis); (ii) after COVID-19 infection; (iii) after other infections; (iv) without an infection at onset; and, (v) where the occurrence of an infection at or preceding onset is not known. Among other findings, ME/CFS onset with unknown infection status was significantly associated with active fibromyalgia. Conclusions: DecodeME participants differ in symptoms, comorbid conditions and/or illness severity when stratified by their sex-at-birth and/or infection around the time of ME/CFS onset

    An explainable artificial intelligence approach for decoding the enhancer histone modification code and identification of novel enhancers

    No full text
    Enhancers are non-coding regions of the genome that control the activity of target genes. Recent work to identify active enhancers experimentally and in silico has proven effective. While these methods can predict the locations of enhancers with a high degree of accuracy, the mechanisms underpinning enhancer activity are still not well understood. Here, an explainable artificial intelligence (XAI) method model is trained using a combination of genetic algorithms and type-2 fuzzy logic systems. These XAI modes uses natural language variables and IF-THEN rules to attempt to identify active enhancers, creating a fully transparent classification model. This allows the relationships between the epigenetic features included in the model to be studied. These models are first trained in Drosophila cell lines using histone modifications as inputs and STARR-seq labelling to classify enhancers. The generated XAI models are shown to generalise to previously unseen cell types and perform at a level comparable with a traditional neural network. Many putative enhancers are identified that display the same epigenetic features as the enhancers identified by STARR-seq. These putative enhancers are found to display intermediary enrichment of Mediator and cohesin complexes, but to be bidirectionally transcribed, and make 3D contacts with the promoters of expressed genes. The rules underpinning these classifications are characterised and studied to help determine the underlying epigenetic code at these enhancers. Additional XAI models are then trained in human cell lines with additional features such as DNA accessibility and DNA methylation. Again, the XAI models are shown to perform similarly well to neural networks and identify many previously unidentified enhancers. In humans these putative enhancers are shown to be enriched in motifs for transcription factors known to be involved in response pathways to environmental stress suggesting that the model is identifying putative enhancers in these cell lines

    DecodeME: Community recruitment for a large genetics study of myalgic encephalomyelitis / chronic fatigue syndrome

    Get PDF
    BACKGROUND: Myalgic encephalomyelitis / chronic fatigue syndrome (ME/CFS) is a common, long-term condition characterised by post-exertional malaise, often with fatigue that is not significantly relieved by rest. ME/CFS has no confirmed diagnostic test or effective treatment and we lack knowledge of its causes. Identification of genes and cellular processes whose disruption adds to ME/CFS risk is a necessary first step towards development of effective therapy. METHODS: Here we describe DecodeME, an ongoing study co-produced by people with lived experience of ME/CFS and scientists. Together we designed the study and obtained funding and are now recruiting up to 25,000 people in the UK with a clinical diagnosis of ME/CFS. Those eligible for the study are at least 16 years old, pass international study criteria, and lack any alternative diagnoses that can result in chronic fatigue. These will include 5,000 people whose ME/CFS diagnosis was a consequence of SARS-CoV-2 infection. Questionnaires are completed online or on paper. Participants’ saliva DNA samples are acquired by post, which improves participation by more severely-affected individuals. Digital marketing and social media approaches resulted in 29,000 people with ME/CFS in the UK pre-registering their interest in participating. We will perform a genome-wide association study, comparing participants’ genotypes with those from UK Biobank as controls. This should generate hypotheses regarding the genes, mechanisms and cell types contributing to ME/CFS disease aetiology. DISCUSSION: The DecodeME study has been reviewed and given a favourable opinion by the North West – Liverpool Central Research Ethics Committee (21/NW/0169). Relevant documents will be available online (www.decodeme.org.uk). Genetic data will be disseminated as associated variants and genomic intervals, and as summary statistics. Results will be reported on the DecodeME website and via open access publications
    corecore