583 research outputs found
Predicting gene expression in the human malaria parasite Plasmodium falciparum using histone modification, nucleosome positioning, and 3D localization features.
Empirical evidence suggests that the malaria parasite Plasmodium falciparum employs a broad range of mechanisms to regulate gene transcription throughout the organism's complex life cycle. To better understand this regulatory machinery, we assembled a rich collection of genomic and epigenomic data sets, including information about transcription factor (TF) binding motifs, patterns of covalent histone modifications, nucleosome occupancy, GC content, and global 3D genome architecture. We used these data to train machine learning models to discriminate between high-expression and low-expression genes, focusing on three distinct stages of the red blood cell phase of the Plasmodium life cycle. Our results highlight the importance of histone modifications and 3D chromatin architecture in Plasmodium transcriptional regulation and suggest that AP2 transcription factors may play a limited regulatory role, perhaps operating in conjunction with epigenetic factors
Deep Learning for Genomics: A Concise Overview
Advancements in genomic research such as high-throughput sequencing
techniques have driven modern genomic studies into "big data" disciplines. This
data explosion is constantly challenging conventional methods used in genomics.
In parallel with the urgent demand for robust algorithms, deep learning has
succeeded in a variety of fields such as vision, speech, and text processing.
Yet genomics entails unique challenges to deep learning since we are expecting
from deep learning a superhuman intelligence that explores beyond our knowledge
to interpret the genome. A powerful deep learning model should rely on
insightful utilization of task-specific knowledge. In this paper, we briefly
discuss the strengths of different deep learning models from a genomic
perspective so as to fit each particular task with a proper deep architecture,
and remark on practical considerations of developing modern deep learning
architectures for genomics. We also provide a concise review of deep learning
applications in various aspects of genomic research, as well as pointing out
potential opportunities and obstacles for future genomics applications.Comment: Invited chapter for Springer Book: Handbook of Deep Learning
Application
Learning the Regulatory Code of Gene Expression
Data-driven machine learning is the method of choice for predicting molecular phenotypes from nucleotide sequence, modeling gene expression events including protein-DNA binding, chromatin states as well as mRNA and protein levels. Deep neural networks automatically learn informative sequence representations and interpreting them enables us to improve our understanding of the regulatory code governing gene expression. Here, we review the latest developments that apply shallow or deep learning to quantify molecular phenotypes and decode the cis-regulatory grammar from prokaryotic and eukaryotic sequencing data. Our approach is to build from the ground up, first focusing on the initiating protein-DNA interactions, then specific coding and non-coding regions, and finally on advances that combine multiple parts of the gene and mRNA regulatory structures, achieving unprecedented performance. We thus provide a quantitative view of gene expression regulation from nucleotide sequence, concluding with an information-centric overview of the central dogma of molecular biology
Biologically Interpretable, Integrative Deep Learning for Cancer Survival Analysis
Identifying complex biological processes associated to patients\u27 survival time at the cellular and molecular level is critical not only for developing new treatments for patients but also for accurate survival prediction. However, highly nonlinear and high-dimension, low-sample size (HDLSS) data cause computational challenges in survival analysis. We developed a novel family of pathway-based, sparse deep neural networks (PASNet) for cancer survival analysis. PASNet family is a biologically interpretable neural network model where nodes in the network correspond to specific genes and pathways, while capturing nonlinear and hierarchical effects of biological pathways associated with certain clinical outcomes. Furthermore, integration of heterogeneous types of biological data from biospecimen holds promise of improving survival prediction and personalized therapies in cancer. Specifically, the integration of genomic data and histopathological images enhances survival predictions and personalized treatments in cancer study, while providing an in-depth understanding of genetic mechanisms and phenotypic patterns of cancer. Two proposed models will be introduced for integrating multi-omics data and pathological images, respectively. Each model in PASNet family was evaluated by comparing the performance of current cutting-edge models with The Cancer Genome Atlas (TCGA) cancer data. In the extensive experiments, PASNet family outperformed the benchmarking methods, and the outstanding performance was statistically assessed. More importantly, PASNet family showed the capability to interpret a multi-layered biological system. A number of biological literature in GBM supported the biological interpretation of the proposed models. The open-source software of PASNet family in PyTorch is publicly available at https://github.com/DataX-JieHao
DeepH&M: Estimating single-CpG hydroxymethylation and methylation levels from enrichment and restriction enzyme sequencing methods
Increased appreciation of 5-hydroxymethylcytosine (5hmC) as a stable epigenetic mark, which defines cell identity and disease progress, has engendered a need for cost-effective, but high-resolution, 5hmC mapping technology. Current enrichment-based technologies provide cheap but low-resolution and relative enrichment of 5hmC levels, while single-base resolution methods can be prohibitively expensive to scale up to large experiments. To address this problem, we developed a deep learning-based method, DeepH&M, which integrates enrichment and restriction enzyme sequencing methods to simultaneously estimate absolute hydroxymethylation and methylation levels at single-CpG resolution. Using 7-week-old mouse cerebellum data for training the DeepH&M model, we demonstrated that the 5hmC and 5mC levels predicted by DeepH&M were in high concordance with whole-genome bisulfite-based approaches. The DeepH&M model can be applied to 7-week-old frontal cortex and 79-week-old cerebellum, revealing the robust generalizability of this method to other tissues from various biological time points
PACS: Prediction and analysis of cancer subtypes from multi-omics data based on a multi-head attention mechanism model
Due to the high heterogeneity and clinical characteristics of cancer, there
are significant differences in multi-omic data and clinical characteristics
among different cancer subtypes. Therefore, accurate classification of cancer
subtypes can help doctors choose the most appropriate treatment options,
improve treatment outcomes, and provide more accurate patient survival
predictions. In this study, we propose a supervised multi-head attention
mechanism model (SMA) to classify cancer subtypes successfully. The attention
mechanism and feature sharing module of the SMA model can successfully learn
the global and local feature information of multi-omics data. Second, it
enriches the parameters of the model by deeply fusing multi-head attention
encoders from Siamese through the fusion module. Validated by extensive
experiments, the SMA model achieves the highest accuracy, F1 macroscopic, F1
weighted, and accurate classification of cancer subtypes in simulated,
single-cell, and cancer multiomics datasets compared to AE, CNN, and GNN-based
models. Therefore, we contribute to future research on multiomics data using
our attention-based approach.Comment: Submitted to BIBM202
Artificial intelligence used in genome analysis studies
Next Generation Sequencing (NGS) or deep sequencing technology enables parallel reading of multiple individual DNA fragments, thereby enabling the identification of millions of base pairs in several hours. Recent research has clearly shown that machine learning technologies can efficiently analyse large sets of genomic data and help to identify novel gene functions and regulation regions. A deep artificial neural network consists of a group of artificial neurons that mimic the properties of living neurons. These mathematical models, termed Artificial Neural Networks (ANN), can be used to solve artificial intelligence engineering problems in several different technological fields (e.g., biology, genomics, proteomics, and metabolomics). In practical terms, neural networks are non-linear statistical structures that are organized as modelling tools and are used to simulate complex genomic relationships between inputs and outputs. To date, Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNN) have been demonstrated to be the best tools for improving performance in problem solving tasks within the genomic field
Clinical epigenetics settings for cancer and cardiovascular diseases: real-life applications of network medicine at the bedside
Despite impressive efforts invested in epigenetic research in the last 50 years, clinical applications are still lacking. Only a few university hospital centers currently use epigenetic biomarkers at the bedside. Moreover, the overall concept of precision medicine is not widely recognized in routine medical practice and the reductionist approach remains predominant in treating patients affected by major diseases such as cancer and cardiovascular diseases. By its' very nature, epigenetics is integrative of genetic networks. The study of epigenetic biomarkers has led to the identification of numerous drugs with an increasingly significant role in clinical therapy especially of cancer patients. Here, we provide an overview of clinical epigenetics within the context of network analysis. We illustrate achievements to date and discuss how we can move from traditional medicine into the era of network medicine (NM), where pathway-informed molecular diagnostics will allow treatment selection following the paradigm of precision medicine
- …