8 research outputs found
A Novel Deep Neural Network Technique for Drug–Target Interaction
The authors wish to acknowledge the financial support of the CoordenacAo de Aperfeicoamento de Pessoal de Nivel Superior (CAPES). This research was supported by the High-Performance Computing Center at UFRN (NPAD/UFRN). This study was financed in part by the CoordenacAo de Aperfeicoamento de Pessoal de Nivel Superior (CAPES)-Finance Code 001.Drug discovery (DD) is a time-consuming and expensive process. Thus, the industry
employs strategies such as drug repositioning and drug repurposing, which allows the application of
already approved drugs to treat a di erent disease, as occurred in the first months of 2020, during the
COVID-19 pandemic. The prediction of drug–target interactions is an essential part of the DD process
because it can accelerate it and reduce the required costs. DTI prediction performed in silico have used
approaches based on molecular docking simulations, including similarity-based and network- and
graph-based ones. This paper presents MPS2IT-DTI, a DTI prediction model obtained from research
conducted in the following steps: the definition of a new method for encoding molecule and protein
sequences onto images; the definition of a deep-learning approach based on a convolutional neural
network in order to create a new method for DTI prediction. Training results conducted with the
Davis and KIBA datasets show that MPS2IT-DTI is viable compared to other state-of-the-art (SOTA)
approaches in terms of performance and complexity of the neural network model. With the Davis
dataset, we obtained 0.876 for the concordance index and 0.276 for the MSE; with the KIBA dataset,
we obtained 0.836 and 0.226 for the concordance index and the MSE, respectively. Moreover, the
MPS2IT-DTI model represents molecule and protein sequences as images, instead of treating them as
an NLP task, and as such, does not employ an embedding layer, which is present in other models.Coordenacao de Aperfeicoamento de Pessoal de Nivel Superior (CAPES) 001High-Performance Computing Center at UFRN (NPAD/UFRN
Do Large Scale Molecular Language Representations Capture Important Structural Information?
Predicting the chemical properties of a molecule is of great importance in
many applications, including drug discovery and material design. Machine
learning based molecular property prediction holds the promise of enabling
accurate predictions at much less computationally complex cost when compared
to, for example, Density Functional Theory (DFT) calculations. Various
representation learning methods in a supervised setting, including the features
extracted using graph neural nets, have emerged for such tasks. However, the
vast chemical space and the limited availability of labels make supervised
learning challenging, calling for learning a general-purpose molecular
representation. Recently, pre-trained transformer-based language models on
large unlabeled corpus have produced state-of-the-art results in many
downstream natural language processing tasks. Inspired by this development, we
present molecular embeddings obtained by training an efficient transformer
encoder model, MoLFormer. This model employs a linear attention mechanism
coupled with highly parallelized training on SMILES sequences of 1.1 billion
unlabeled molecules from the PubChem and ZINC datasets. Experiments show that
the learned molecular representation outperforms supervised and unsupervised
graph neural net baselines on several regression and classification tasks from
10 benchmark datasets, while performing competitively on others. Further
analyses, specifically through the lens of attention, demonstrate that
MoLFormer indeed learns a molecule's local and global structural aspects. These
results provide encouraging evidence that large-scale molecular language models
can capture sufficient structural information to be able to predict diverse
molecular properties, including quantum-chemical propertie
Difficulty in learning chirality for Transformer fed with SMILES
Recent years have seen development of descriptor generation based on
representation learning of extremely diverse molecules, especially those that
apply natural language processing (NLP) models to SMILES, a literal
representation of molecular structure. However, little research has been done
on how these models understand chemical structure. To address this, we
investigated the relationship between the learning progress of SMILES and
chemical structure using a representative NLP model, the Transformer. The
results suggest that while the Transformer learns partial structures of
molecules quickly, it requires extended training to understand overall
structures. Consistently, the accuracy of molecular property predictions using
descriptors generated from models at different learning steps was similar from
the beginning to the end of training. Furthermore, we found that the
Transformer requires particularly long training to learn chirality and
sometimes stagnates with low translation accuracy due to misunderstanding of
enantiomers. These findings are expected to deepen understanding of NLP models
in chemistry.Comment: 20 pages, 6 figure