An explainable artificial intelligence approach for decoding the enhancer histone modification code and identification of novel enhancers

Abstract

Enhancers are non-coding regions of the genome that control the activity of target genes. Recent work to identify active enhancers experimentally and in silico has proven effective. While these methods can predict the locations of enhancers with a high degree of accuracy, the mechanisms underpinning enhancer activity are still not well understood. Here, an explainable artificial intelligence (XAI) method model is trained using a combination of genetic algorithms and type-2 fuzzy logic systems. These XAI modes uses natural language variables and IF-THEN rules to attempt to identify active enhancers, creating a fully transparent classification model. This allows the relationships between the epigenetic features included in the model to be studied. These models are first trained in Drosophila cell lines using histone modifications as inputs and STARR-seq labelling to classify enhancers. The generated XAI models are shown to generalise to previously unseen cell types and perform at a level comparable with a traditional neural network. Many putative enhancers are identified that display the same epigenetic features as the enhancers identified by STARR-seq. These putative enhancers are found to display intermediary enrichment of Mediator and cohesin complexes, but to be bidirectionally transcribed, and make 3D contacts with the promoters of expressed genes. The rules underpinning these classifications are characterised and studied to help determine the underlying epigenetic code at these enhancers. Additional XAI models are then trained in human cell lines with additional features such as DNA accessibility and DNA methylation. Again, the XAI models are shown to perform similarly well to neural networks and identify many previously unidentified enhancers. In humans these putative enhancers are shown to be enriched in motifs for transcription factors known to be involved in response pathways to environmental stress suggesting that the model is identifying putative enhancers in these cell lines

    Similar works

    Full text

    thumbnail-image