82 research outputs found
Advancing Biomedicine with Graph Representation Learning: Recent Progress, Challenges, and Future Directions
Graph representation learning (GRL) has emerged as a pivotal field that has
contributed significantly to breakthroughs in various fields, including
biomedicine. The objective of this survey is to review the latest advancements
in GRL methods and their applications in the biomedical field. We also
highlight key challenges currently faced by GRL and outline potential
directions for future research.Comment: Accepted by 2023 IMIA Yearbook of Medical Informatic
Protein-Ligand Binding Affinity Directed Multi-Objective Drug Design Based on Fragment Representation Methods
Drug discovery is a challenging process with a vast molecular space to be explored and numerous pharmacological properties to be appropriately considered. Among various drug design protocols, fragment-based drug design is an effective way of constraining the search space and better utilizing biologically active compounds. Motivated by fragment-based drug search for a given protein target and the emergence of artificial intelligence (AI) approaches in this field, this work advances the field of in silico drug design by (1) integrating a graph fragmentation-based deep generative model with a deep evolutionary learning process for large-scale multi-objective molecular optimization, and (2) applying protein-ligand binding affinity scores together with other desired physicochemical properties as objectives. Our experiments show that the proposed method can generate novel molecules with improved property values and binding affinities
Learn Biologically Meaningful Representation with Transfer Learning
Machine learning has made significant contributions to bioinformatics and computational biology. In particular, supervised learning approaches have been widely used in solving problems such as biomarker identification, drug response prediction, and so on. However, because of the limited availability of comprehensively labeled and clean data, constructing predictive models in super vised settings is not always desirable or possible, especially when using datahunger, redhot learning paradigms such as deep learning methods. Hence, there are urgent needs to develop new approaches that could leverage more readily available unlabeled data in driving successful machine learning ap plications in this area.
In my dissertation, I focused on exploring and designing deep learningbased unsupervised representation learning methods. A consistent scheme of these methods is that they construct a low dimensional space from the unlabeled raw datasets, and then leverage the learned lowdimensional embedding explicitly or implicitly for diverse downstream supervised tasks. Although progress has been made in recent years, most deep learning applications in biomedical studies are still in their infancy. It remains a challenging task to fully extract the biological meaningful information from a biomedical dataset such as multiomics data to support predictive modeling for practical tasks of interest. To improve the biological relevance of learned representations, innovative approaches that could better integrate mulitomics data and utilize their specific characteristics and natural ”annotations” are needed.
Hence, we proposed two approaches, namely, Cross LEvel Information Transmission (CLEIT) network and Coherent Cellline Tissue Deconfounding Autoencoder (CODEAE). Specifically, CLEIT aims to leverage the hierarchical relationships among omics data at different levels to drive the biologically meaningful representation learning, and CODEAE learns biologically meaningful representations by explicitly deconfounding the confounding factors such as data source origins. As the benchmark results showed, these two methods are able to improve knowledge transfer be tween multiomics data, and invitro and invivo samples respectively, and significantly boost respective performance in drug response prediction task. Thus, they are potentially powerful tools for precision medicine and drug discovery
Graph Priors, Optimal Transport, and Deep Learning in Biomedical Discovery
Recent advances in biomedical data collection allows the collection of massive datasets measuring thousands of features in thousands to millions of individual cells. This data has the potential to advance our understanding of biological mechanisms at a previously impossible resolution. However, there are few methods to understand data of this scale and type. While neural networks have made tremendous progress on supervised learning problems, there is still much work to be done in making them useful for discovery in data with more difficult to represent supervision. The flexibility and expressiveness of neural networks is sometimes a hindrance in these less supervised domains, as is the case when extracting knowledge from biomedical data. One type of prior knowledge that is more common in biological data comes in the form of geometric constraints. In this thesis, we aim to leverage this geometric knowledge to create scalable and interpretable models to understand this data. Encoding geometric priors into neural network and graph models allows us to characterize the models’ solutions as they relate to the fields of graph signal processing and optimal transport. These links allow us to understand and interpret this datatype. We divide this work into three sections. The first borrows concepts from graph signal processing to construct more interpretable and performant neural networks by constraining and structuring the architecture. The second borrows from the theory of optimal transport to perform anomaly detection and trajectory inference efficiently and with theoretical guarantees. The third examines how to compare distributions over an underlying manifold, which can be used to understand how different perturbations or conditions relate. For this we design an efficient approximation of optimal transport based on diffusion over a joint cell graph. Together, these works utilize our prior understanding of the data geometry to create more useful models of the data. We apply these methods to molecular graphs, images, single-cell sequencing, and health record data
Artificial Intelligence for In Silico Clinical Trials: A Review
A clinical trial is an essential step in drug development, which is often
costly and time-consuming. In silico trials are clinical trials conducted
digitally through simulation and modeling as an alternative to traditional
clinical trials. AI-enabled in silico trials can increase the case group size
by creating virtual cohorts as controls. In addition, it also enables
automation and optimization of trial design and predicts the trial success
rate. This article systematically reviews papers under three main topics:
clinical simulation, individualized predictive modeling, and computer-aided
trial design. We focus on how machine learning (ML) may be applied in these
applications. In particular, we present the machine learning problem
formulation and available data sources for each task. We end with discussing
the challenges and opportunities of AI for in silico trials in real-world
applications
A Comprehensive Survey on Graph Neural Networks
Deep learning has revolutionized many machine learning tasks in recent years, ranging from image classification and video processing to speech recognition and natural language understanding. The data in these tasks are typically represented in the Euclidean space. However, there is an increasing number of applications, where data are generated from non-Euclidean domains and are represented as graphs with complex relationships and interdependency between objects. The complexity of graph data has imposed significant challenges on the existing machine learning algorithms. Recently, many studies on extending deep learning approaches for graph data have emerged. In this article, we provide a comprehensive overview of graph neural networks (GNNs) in data mining and machine learning fields. We propose a new taxonomy to divide the state-of-the-art GNNs into four categories, namely, recurrent GNNs, convolutional GNNs, graph autoencoders, and spatial-temporal GNNs. We further discuss the applications of GNNs across various domains and summarize the open-source codes, benchmark data sets, and model evaluation of GNNs. Finally, we propose potential research directions in this rapidly growing field
- …