3 research outputs found
Neural-based approaches to overcome feature selection and applicability domain in drug-related property prediction
In the fields of pharmaceutical research and biomedical sciences, QSAR modeling is an established approach during drug discovery for prediction of biological activity of drug candidates. Yet, QSAR modeling poses a series of open challenges. First, chemical compounds are represented on a high-dimensional space and thus feature selection is typically applied, although this task entails a challenging combinatorial problem with potential loss of information. Second, the definition of the applicability domain of a QSAR model is a desirable aspect to determine the reliability of predictions on unseen chemicals, which is often difficult to assess due to the extent of the chemical space. Finally, interpretability of these models is also a critical issue for drug designers. The purpose of this work is to thoroughly assess the application of neural-based methods and recent advances deep learning for QSAR modeling. We hypothesize that neural-based methods can overcome the need to perform a descriptor selection phase. We developed three QSAR models based on neural networks for prediction of relevant chemical and biomedical properties that, in the absence of any feature selection step, can outperform the state-of-the-art models for such properties. We also implemented an embedded applicability domain technique based on network output probabilities that proved to be effective; its application improved the predictive performance of the model. Finally, we proposed the use of a post hoc feature analysis technique based on an aggregation of network weights, which enabled effective detection of relevant features in the model.Fil: Sabando, MarÃa Virginia. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - BahÃa Blanca. Instituto de Ciencias e IngenierÃa de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e IngenierÃa de la Computación. Instituto de Ciencias e IngenierÃa de la Computación; ArgentinaFil: Ponzoni, Ignacio. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - BahÃa Blanca. Instituto de Ciencias e IngenierÃa de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e IngenierÃa de la Computación. Instituto de Ciencias e IngenierÃa de la Computación; ArgentinaFil: Soto, Axel Juan. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - BahÃa Blanca. Instituto de Ciencias e IngenierÃa de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e IngenierÃa de la Computación. Instituto de Ciencias e IngenierÃa de la Computación; Argentin
Using Molecular Embeddings in QSAR Modeling: Does it Make a Difference?
With the consolidation of deep learning in drug discovery, several novel
algorithms for learning molecular representations have been proposed. Despite
the interest of the community in developing new methods for learning molecular
embeddings and their theoretical benefits, comparing molecular embeddings with
each other and with traditional representations is not straightforward, which
in turn hinders the process of choosing a suitable representation for QSAR
modeling. A reason behind this issue is the difficulty of conducting a fair and
thorough comparison of the different existing embedding approaches, which
requires numerous experiments on various datasets and training scenarios. To
close this gap, we reviewed the literature on methods for molecular embeddings
and reproduced three unsupervised and two supervised molecular embedding
techniques recently proposed in the literature. We compared these five methods
concerning their performance in QSAR scenarios using different classification
and regression datasets. We also compared these representations to traditional
molecular representations, namely molecular descriptors and fingerprints. As
opposed to the expected outcome, our experimental setup consisting of over
25,000 trained models and statistical tests revealed that the predictive
performance using molecular embeddings did not significantly surpass that of
traditional representations. While supervised embeddings yielded competitive
results compared to those using traditional molecular representations,
unsupervised embeddings tended to perform worse than traditional
representations. Our results highlight the need for conducting a careful
comparison and analysis of the different embedding techniques prior to using
them in drug design tasks, and motivate a discussion about the potential of
molecular embeddings in computer-aided drug design