128 research outputs found
Recent Advances in Variational Autoencoders With Representation Learning for Biomedical Informatics: A Survey
Variational autoencoders (VAEs) are deep latent space generative models that have been immensely successful in multiple exciting applications in biomedical informatics such as molecular design, protein design, medical image classification and segmentation, integrated multi-omics data analyses, and large-scale biological sequence analyses, among others. The fundamental idea in VAEs is to learn the distribution of data in such a way that new meaningful data with more intra-class variations can be generated from the encoded distribution. The ability of VAEs to synthesize new data with more representation variance at state-of-art levels provides hope that the chronic scarcity of labeled data in the biomedical field can be resolved. Furthermore, VAEs have made nonlinear latent variable models tractable for modeling complex distributions. This has allowed for efficient extraction of relevant biomedical information from learned features for biological data sets, referred to as unsupervised feature representation learning. In this article, we review the various recent advancements in the development and application of VAEs for biomedical informatics. We discuss challenges and future opportunities for biomedical research with respect to VAEs.https://doi.org/10.1109/ACCESS.2020.304830
Protein-Ligand Binding Affinity Directed Multi-Objective Drug Design Based on Fragment Representation Methods
Drug discovery is a challenging process with a vast molecular space to be explored and numerous pharmacological properties to be appropriately considered. Among various drug design protocols, fragment-based drug design is an effective way of constraining the search space and better utilizing biologically active compounds. Motivated by fragment-based drug search for a given protein target and the emergence of artificial intelligence (AI) approaches in this field, this work advances the field of in silico drug design by (1) integrating a graph fragmentation-based deep generative model with a deep evolutionary learning process for large-scale multi-objective molecular optimization, and (2) applying protein-ligand binding affinity scores together with other desired physicochemical properties as objectives. Our experiments show that the proposed method can generate novel molecules with improved property values and binding affinities
Application of Generative Models on Modeling Biological Molecules
The last decade has been the stage for many groundbreaking Artificial Intelligence technologies, such as revolutionary language models: Generative models capable of synthesizing surprisingly unique data. Such a novelty also brings about public concerns, primarily due to state-of-the-art models' ''black box'' nature. One of the domains that has quickly adopted the generative deep learning paradigm is drug discovery, which, from a pharmaceutical industry point of view, is an extremely expensive and time-consuming process. However, the inner workings of such models are not inherently understandable by humans, causing hesitation to fully trust their results. The concept of disentanglement is one of the fundamental requirements to explain generative models, determining the extent to which steerability and navigation can be achieved in the latent space. Unfortunately, the application potential of interpretability approaches has some limitations depending on the availability of generative latent factors. This work aims to shed some light on the synthesized latent spaces of state-of-the-art molecular generative models: A couple of basic assumptions made about the latent space characteristics are analyzed and potential pitfalls related to domain, architecture, and molecule representation preferences are addressed. The degree to which the steerability in the latent space is achieved is quantified by implementing a novel interpretability approach, providing the basis for the comparison of alternative model configurations. The experiments further revealed that modeling decisions have a direct impact on achievable interpretability; albeit limited by the intricacies of the medicinal chemistry domain
Neural Embeddings for Dimensionality Reduction of Complex Topology Feature Spaces
This study focuses on the main role of neural embeddings and their related design and optimization in the context of Artificial Intelligence (AI), particularly in the field of Deep Learning and Explainable AI (XAI). It explores how neural embeddings of data characterized by complex topology are crucial to address challenges and developments in the areas of data dimensionality reduction and network prediction analysis.In this thesis work, two independent but connected investigations were carried out to investigate the effect of neural encoding generated by the network for the target task in the case of data with a graph structure.The first project involved the study, design, and analysis of neural embeddings of synthetic polymers through the development of two Graph Variational Autoencoder neural networks. The goal is generating new polymers that incorporate additional structural information specific to the compounds, such as stoichiometry and chain architecture.These results were analyzed through several evaluation metrics that compare the two models created and highlight weaknesses and strengths of both approaches.A qualitative investigation of the latent space of the network highlighted how different neural embeddings created by the networks encode different information depending on the decoder model trained for generation, confirming and justifying the results obtained.In the second work, a graph neural network capable of predicting the bioactivity of molecules toward specific proteins was developed, employing neural embeddings to condense the totality of chemical information of the input data. Next, a hierarchical XAI methodology was devised to obtain additional interpretability information on molecular moieties that are relevant for the prediction, thus helping to clarify the model's decision-making process. The results obtained through explainability contribute to a deeper understanding of the data and the underlying problem.Through these studies, the importance of neural embedding design and optimization in the case of data and features with complex topology is highlighted, showing how deep neural networks, downstream of perfectly conducted training, embed all the information needed for the objective task in an encoded representation
Recommended from our members
Machine Learning Methods for Modeling Synthesizable Molecules
The search for new molecules often involves cycles of design-make-test-analyze steps, where new molecules are designed, synthesized in a lab, tested, and then analyzed to inform what is to be designed next. This thesis proposes new machine learning (ML) methods to augment chemists in the design and make steps of this process, focusing on the tasks of (a) how to use ML to predict chemical reaction outcomes, and (b) how to build generative models to search for new molecules. We take a common approach to both tasks, building our ML models around existing powerful tools and abstractions from the field of chemistry, and in doing so, show that the tasks we tackle are intrinsically linked.
Reaction prediction is important for validating synthesis plans before carrying them out. Many previous ML approaches to reaction prediction have treated reactions as either a black box translation or a single graph edit operation. Instead, we propose a model (ELECTRO) that predicts the reaction products through modeling a sequence of electron movements. We show how modeling electron movements in this way has the benefit of being easy for chemists to interpret, and also is a natural format in which to incorporate the constraints of chemistry, such as balanced atom counts before and after a reaction. We show that our model achieves excellent performance on an important subset of chemical reactions and recovers a basic knowledge of chemistry without explicit supervision.
In designing new models to search for molecules with particular properties, it is important that the models describe not only what molecule to make, but also crucially how to make it. These instructions form a synthesis plan, describing how easy-to-obtain building blocks can be combined together to form more complex molecules of interest through chemical reactions. Inspired by this real-world process, we develop two machine learning approaches that incorporate reactions into the virtual generation of new molecules. We show that aligning our model with the real-world process allows us to better link up the design and make steps involved in molecule search, and permits chemists to examine the practicability of both the final molecules we suggest and their synthetic routes. Molecule search is inherently an extrapolation task, and we show that by building our methods around the inductive biases of modeling reactions, we can generalize to new chemical spaces, suggesting molecules that not only perform well, but are synthesizable too.EPSR
- …