Leveraging Attention Mechanism to Unlock Gene and Protein Attributes

Abstract

Advancing personalized medicine depends on effectively integrating and interpreting the vast, heterogeneous landscape of biological data, from genomic sequences and transcriptomics to the insights embedded in scientific literature. Current machine learning models often focus on single data modalities, limiting their capacity to capture the multifaceted nature of biological systems. We address this gap by developing three attention-based machine-learning models integrating diverse data modalities. Firstly, DeepVul is a multi-task model that leverages cancer transcriptome data to predict genes critical for cancer survival and their corresponding drugs. Subsequently, LitGene refines gene representations by integrating textual information from the scientific literature. Finally, Protein2Text is a large language model that translates protein sequences into natural language descriptions, making complex biochemical data accessible and interpretable. These models echo a comprehensive approach to integrating various data modalities to provide an alternative view of biological systems, paving the way for truly personalized medicine for everyone

Similar works

Full text

thumbnail-image

University of New Mexico Digital Repository

redirect
Last time updated on 18/11/2025

This paper was published in University of New Mexico Digital Repository.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.