1,421 research outputs found

    Graph Representation Learning in Biomedicine

    Full text link
    Biomedical networks are universal descriptors of systems of interacting elements, from protein interactions to disease networks, all the way to healthcare systems and scientific knowledge. With the remarkable success of representation learning in providing powerful predictions and insights, we have witnessed a rapid expansion of representation learning techniques into modeling, analyzing, and learning with such networks. In this review, we put forward an observation that long-standing principles of networks in biology and medicine -- while often unspoken in machine learning research -- can provide the conceptual grounding for representation learning, explain its current successes and limitations, and inform future advances. We synthesize a spectrum of algorithmic approaches that, at their core, leverage graph topology to embed networks into compact vector spaces, and capture the breadth of ways in which representation learning is proving useful. Areas of profound impact include identifying variants underlying complex traits, disentangling behaviors of single cells and their effects on health, assisting in diagnosis and treatment of patients, and developing safe and effective medicines

    Multivariate Information Fusion With Fast Kernel Learning to Kernel Ridge Regression in Predicting LncRNA-Protein Interactions

    Get PDF
    Long non-coding RNAs (lncRNAs) constitute a large class of transcribed RNA molecules. They have a characteristic length of more than 200 nucleotides which do not encode proteins. They play an important role in regulating gene expression by interacting with the homologous RNA-binding proteins. Due to the laborious and time-consuming nature of wet experimental methods, more researchers should pay great attention to computational approaches for the prediction of lncRNA-protein interaction (LPI). An in-depth literature review in the state-of-the-art in silico investigations, leads to the conclusion that there is still room for improving the accuracy and velocity. This paper propose a novel method for identifying LPI by employing Kernel Ridge Regression, based on Fast Kernel Learning (LPI-FKLKRR). This approach, uses four distinct similarity measures for lncRNA and protein space, respectively. It is remarkable, that we extract Gene Ontology (GO) with proteins, in order to improve the quality of information in protein space. The process of heterogeneous kernels integration, applies Fast Kernel Learning (FastKL) to deal with weight optimization. The extrapolation model is obtained by gaining the ultimate prediction associations, after using Kernel Ridge Regression (KRR). Experimental outcomes show that the ability of modeling with LPI-FKLKRR has extraordinary performance compared with LPI prediction schemes. On benchmark dataset, it has been observed that the best Area Under Precision Recall Curve (AUPR) of 0.6950 is obtained by our proposed model LPI-FKLKRR, which outperforms the integrated LPLNP (AUPR: 0.4584), RWR (AUPR: 0.2827), CF (AUPR: 0.2357), LPIHN (AUPR: 0.2299), and LPBNI (AUPR: 0.3302). Also, combined with the experimental results of a case study on a novel dataset, it is anticipated that LPI-FKLKRR will be a useful tool for LPI prediction

    Neighborhood based computational approaches for the prediction of lncRNA-disease associations

    Get PDF
    Motivation: Long non-coding RNAs (lncRNAs) are a class of molecules involved in important biological processes. Extensive efforts have been provided to get deeper understanding of disease mechanisms at the lncRNA level, guiding towards the detection of biomarkers for disease diagnosis, treatment, prognosis and prevention. Unfortunately, due to costs and time complexity, the number of possible disease-related lncRNAs verified by traditional biological experiments is very limited. Computational approaches for the prediction of disease-lncRNA associations allow to identify the most promising candidates to be verified in laboratory, reducing costs and time consuming. Results: We propose novel approaches for the prediction of lncRNA-disease associations, all sharing the idea of exploring associations among lncRNAs, other intermediate molecules (e.g., miRNAs) and diseases, suitably represented by tripartite graphs. Indeed, while only a few lncRNA-disease associations are still known, plenty of interactions between lncRNAs and other molecules, as well as associations of the latters with diseases, are available. A first approach presented here, NGH, relies on neighborhood analysis performed on a tripartite graph, built upon lncRNAs, miRNAs and diseases. A second approach (CF) relies on collaborative filtering; a third approach (NGH-CF) is obtained boosting NGH by collaborative filtering. The proposed approaches have been validated on both synthetic and real data, and compared against other methods from the literature. It results that neighborhood analysis allows to outperform competitors, and when it is combined with collaborative filtering the prediction accuracy further improves, scoring a value of AUC equal to 0966

    Network Approaches to the Study of Genomic Variation in Cancer

    Get PDF
    Advances in genomic sequencing technologies opened the door for a wider study of cancer etiology. By analyzing datasets with thousands of exomes (or genomes), researchers gained a better understanding of the genomic alterations that confer a selective advantage towards cancerous growth. A predominant narrative in the field has been based on a dichotomy of alterations that confer a strong selective advantage, called cancer drivers, and the bulk of other alterations assumed to have a neutral effect, called passengers. Yet, a series of studies questioned this narrative and assigned potential roles to passengers, be it in terms of facilitating tumorigenesis or countering the effect of drivers. Consequently, the passenger mutational landscape received a higher level of attention in attempt to prioritize the possible effects of its alterations and to identify new therapeutic targets. In this dissertation, we introduce interpretable network approaches to the study of genomic variation in cancer. We rely on two types of networks, namely functional biological networks and artificial neural nets. In the first chapter, we describe a propagation method that prioritizes 230 infrequently mutated genes with respect to their potential contribution to cancer development. In the second chapter, we further transcend the driver-passenger dichotomy and demonstrate a gradient of cancer relevance across human genes. In the last two chapters, we present methods that simplify neural network models to render them more interpretable with a focus on functional genomic applications in cancer and beyond

    Explainable deep learning models for biological sequence classification

    Get PDF
    Biological sequences - DNA, RNA and proteins - orchestrate the behavior of all living cells and trying to understand the mechanisms that govern and regulate the interactions among these molecules has motivated biological research for many years. The introduction of experimental protocols that analyze such interactions on a genome- or transcriptome-wide scale has also established the usage of machine learning in our field to make sense of the vast amounts of generated data. Recently, deep learning, a branch of machine learning based on artificial neural networks, and especially convolutional neural networks (CNNs) were shown to deliver promising results for predictive tasks and automated feature extraction. However, the resulting models are often very complex and thus make model application and interpretation hard, but the possibility to interpret which features a model has learned from the data is crucial to understand and to explain new biological mechanisms. This work therefore presents pysster, our open source software library that enables researchers to more easily train, apply and interpret CNNs on biological sequence data. We evaluate and implement different feature interpretation and visualization strategies and show that the flexibility of CNNs allows for the integration of additional data beyond pure sequences to improve the biological feature interpretability. We demonstrate this by building, among others, predictive models for transcription factor and RNA-binding protein binding sites and by supplementing these models with structural information in the form of DNA shape and RNA secondary structure. Features learned by models are then visualized as sequence and structure motifs together with information about motif locations and motif co-occurrence. By further analyzing an artificial data set containing implanted motifs we also illustrate how the hierarchical feature extraction process in a multi-layer deep neural network operates. Finally, we present a larger biological application by predicting RNA-binding of proteins for transcripts for which experimental protein-RNA interaction data is not yet available. Here, the comprehensive interpretation options of CNNs made us aware of potential technical bias in the experimental eCLIP data (enhanced crosslinking and immunoprecipitation) that were used as a basis for the models. This allowed for subsequent tuning of the models and data to get more meaningful predictions in practice

    LPI-IBNRA: Long Non-coding RNA-Protein Interaction Prediction Based on Improved Bipartite Network Recommender Algorithm

    Get PDF
    According to the latest research, lncRNAs (long non-coding RNAs) play a broad and important role in various biological processes by interacting with proteins. However, identifying whether proteins interact with a specific lncRNA through biological experimental methods is difficult, costly, and time-consuming. Thus, many bioinformatics computational methods have been proposed to predict lncRNA-protein interactions. In this paper, we proposed a novel approach called Long non-coding RNA-Protein Interaction Prediction based on Improved Bipartite Network Recommender Algorithm (LPI-IBNRA). In the proposed method, we implemented a two-round resource allocation and eliminated the second-order correlations appropriately on the bipartite network. Experimental results illustrate that LPI-IBNRA outperforms five previous methods, with the AUC values of 0.8932 in leave-one-out cross validation (LOOCV) and 0.8819 ± 0.0052 in 10-fold cross validation, respectively. In addition, case studies on four lncRNAs were carried out to show the predictive power of LPI-IBNRA

    To Transformers and Beyond: Large Language Models for the Genome

    Full text link
    In the rapidly evolving landscape of genomics, deep learning has emerged as a useful tool for tackling complex computational challenges. This review focuses on the transformative role of Large Language Models (LLMs), which are mostly based on the transformer architecture, in genomics. Building on the foundation of traditional convolutional neural networks and recurrent neural networks, we explore both the strengths and limitations of transformers and other LLMs for genomics. Additionally, we contemplate the future of genomic modeling beyond the transformer architecture based on current trends in research. The paper aims to serve as a guide for computational biologists and computer scientists interested in LLMs for genomic data. We hope the paper can also serve as an educational introduction and discussion for biologists to a fundamental shift in how we will be analyzing genomic data in the future
    • …
    corecore