28 research outputs found

    Predicting potential drugs and drug-drug interactions for drug repositioning

    Get PDF
    The purpose of drug repositioning is to predict novel treatments for existing drugs. It saves time and reduces cost in drug discovery, especially in preclinical procedures. In drug repositioning, the challenging objective is to identify reasonable drugs with strong evidence. Recently, benefiting from various types of data and computational strategies, many methods have been proposed to predict potential drugs. Signature-based methods use signatures to describe a specific disease condition and match it with drug-induced transcriptomic profiles. For a disease signature, a list of potential drugs is produced based on matching scores. In many studies, the top drugs on the list are identified as potential drugs and verified in various ways. However, there are a few limitations in existing methods: (1) For many diseases, especially cancers, the tissue samples are often heterogeneous and multiple subtypes are involved. It is challenging to identify a signature from such a group of profiles. (2) Genes are treated as independent elements in many methods, while they may associate with each other in the given condition. (3) The disease signatures cannot identify potential drugs for personalized treatments. In order to address those limitations, I propose three strategies in this dissertation. (1) I employ clustering methods to identify sub-signatures from the heterogeneous dataset, then use a weighting strategy to concatenate them together. (2) I utilize human protein complex (HPC) information to reflect the dependencies among genes and identify an HPC signature to describe a specific type of cancer. (3) I use an HPC strategy to identify signatures for drugs, then predict a list of potential drugs for each patient. Besides predicting potential drugs directly, more indications are essential to enhance my understanding in drug repositioning studies. The interactions between biological and biomedical entities, such as drug-drug interactions (DDIs) and drug-target interactions (DTIs), help study mechanisms behind the repurposed drugs. Machine learning (ML), especially deep learning (DL), are frontier methods in predicting those interactions. Network strategies, such as constructing a network from interactions and studying topological properties, are commonly used to combine with other methods to make predictions. However, the interactions may have different functions, and merging them in a single network may cause some biases. In order to solve it, I construct two networks for two types of DDIs and employ a graph convolutional network (GCN) model to concatenate them together. In this dissertation, the first chapter introduces background information, objectives of studies, and structure of the dissertation. After that, a comprehensive review is provided in Chapter 2. Biological databases, methods and applications in drug repositioning studies, and evaluation metrics are discussed. I summarize three application scenarios in Chapter 2. The first method proposed in Chapter 3 considers the issue of identifying a cancer gene signature and predicting potential drugs. The k-means clustering method is used to identify highly reliable gene signatures. The identified signature is used to match drug profiles and identify potential drugs for the given disease. The second method proposed in Chapter 4 uses human protein complex (HPC) information to identify a protein complex signature, instead of a gene signature. This strategy improves the prediction accuracy in the experiments of cancers. Chapter 5 introduces the signature-based method in personalized cancer medicine. The profiles of a given drug are used to identify a drug signature, under the HPC strategy. Each patient has a profile, which is matched with the drug signature. Each patient has a different list of potential drugs. Chapter 6 propose a graph convolutional network with multi-kernel to predict DDIs. This method constructs two DDI kernels and concatenates them in the GCN model. It achieves higher performance in predicting DDIs than three state-of-the-art methods. In summary, this dissertation has proposed several computational algorithms for drug repositioning. Experimental results have shown that the proposed methods can achieve very good performance

    Deep Learning for Embedding and Integrating Multimodal Biomedical Data

    Get PDF
    Biomedical data is being generated in extremely high throughput and high dimension by technologies in areas ranging from single-cell genomics, proteomics, and transcriptomics (cytometry, single-cell RNA and ATAC sequencing) to neuroscience and cognition (fMRI and PET) to pharmaceuticals (drug perturbations and interactions). These new and emerging technologies and the datasets they create give an unprecedented view into the workings of their respective biological entities. However, there is a large gap between the information contained in these datasets and the insights that current machine learning methods can extract from them. This is especially the case when multiple technologies can measure the same underlying biological entity or system. By separately analyzing the same system but from different views gathered by different data modalities, patterns are left unobserved if they only emerge from the multi-dimensional joint representation of all of the modalities together. Through an interdisciplinary approach that emphasizes active collaboration with data domain experts, my research has developed models for data integration, extracting important insights through the joint analysis of varied data sources. In this thesis, I discuss models that address this task of multi-modal data integration, especially generative adversarial networks (GANs) and autoencoders (AEs). My research has been focused on using both of these models in a generative way for concrete problems in cutting-edge scientific applications rather than the exclusive focus on the generation of high-resolution natural images. The research in this thesis is united around ideas of building models that can extract new knowledge from scientific data inaccessible to currently existing methods

    Applications of Artificial Intelligence & Machine Learning in Cancer Immunology

    Get PDF
    The treatment of cancer has long relied upon the use of non-specific and toxic chemotherapies and radiation that target quickly dividing cells. As a result, many patients experience the severe side effects associated with these therapies including vomiting, nausea, fatigue, and alopecia. Additionally, these therapies fail to provide durable and lasting responses in most cases of metastatic disease. The immune system has long been though to play an important role in preventing cancer through immune surveillance; the idea that the immune system is poised with the means to detect cancer early on and eliminate malignant cells. However, as evident by aggressive disease, cancer is able to evade immune recognition and ultimately become very advanced. In recent years, immunotherapy has changed the treatment paradigm for several types of cancer. Of note, checkpoint blockade inhibitors have provided durable and lasting responses for a minority with metastatic disease. While these advances in therapy have provided hope where there was none in the cases of aggressive disease, there is still much work to be done to expand the benefits of immunotherapy for a small subset of patients to the whole. In an effort to understand why certain patients respond to immunotherapy while other do not, there has been an effort to collect as much data through a variety of high-throughput ‘big data’ assays including whole exome sequencing, single-cell assays, and T-cell receptor sequencing. In this doctoral work, we develop a variety of machine learning and artificial intelligence methods to parse the nature of this data to unveil concepts that have helped us understand the prerequisites for a successful immune response to eliminate cancer. Of note, we develop a collection of deep learning algorithms to understand the interaction of peptide-MHC and T-cell receptor that is ultimately responsible for successful recognition of tumor by the immune system. Committee: Dr. Drew M. Pardoll (advisor), Dr. Alexander S. Baras, Dr. Steven Salzber

    Deep Risk Prediction and Embedding of Patient Data: Application to Acute Gastrointestinal Bleeding

    Get PDF
    Acute gastrointestinal bleeding is a common and costly condition, accounting for over 2.2 million hospital days and 19.2 billion dollars of medical charges annually. Risk stratification is a critical part of initial assessment of patients with acute gastrointestinal bleeding. Although all national and international guidelines recommend the use of risk-assessment scoring systems, they are not commonly used in practice, have sub-optimal performance, may be applied incorrectly, and are not easily updated. With the advent of widespread electronic health record adoption, longitudinal clinical data captured during the clinical encounter is now available. However, this data is often noisy, sparse, and heterogeneous. Unsupervised machine learning algorithms may be able to identify structure within electronic health record data while accounting for key issues with the data generation process: measurements missing-not-at-random and information captured in unstructured clinical note text. Deep learning tools can create electronic health record-based models that perform better than clinical risk scores for gastrointestinal bleeding and are well-suited for learning from new data. Furthermore, these models can be used to predict risk trajectories over time, leveraging the longitudinal nature of the electronic health record. The foundation of creating relevant tools is the definition of a relevant outcome measure; in acute gastrointestinal bleeding, a composite outcome of red blood cell transfusion, hemostatic intervention, and all-cause 30-day mortality is a relevant, actionable outcome that reflects the need for hospital-based intervention. However, epidemiological trends may affect the relevance and effectiveness of the outcome measure when applied across multiple settings and patient populations. Understanding the trends in practice, potential areas of disparities, and value proposition for using risk stratification in patients presenting to the Emergency Department with acute gastrointestinal bleeding is important in understanding how to best implement a robust, generalizable risk stratification tool. Key findings include a decrease in the rate of red blood cell transfusion since 2014 and disparities in access to upper endoscopy for patients with upper gastrointestinal bleeding by race/ethnicity across urban and rural hospitals. Projected accumulated savings of consistent implementation of risk stratification tools for upper gastrointestinal bleeding total approximately $1 billion 5 years after implementation. Most current risk scores were designed for use based on the location of the bleeding source: upper or lower gastrointestinal tract. However, the location of the bleeding source is not always clear at presentation. I develop and validate electronic health record based deep learning and machine learning tools for patients presenting with symptoms of acute gastrointestinal bleeding (e.g., hematemesis, melena, hematochezia), which is more relevant and useful in clinical practice. I show that they outperform leading clinical risk scores for upper and lower gastrointestinal bleeding, the Glasgow Blatchford Score and the Oakland score. While the best performing gradient boosted decision tree model has equivalent overall performance to the fully connected feedforward neural network model, at the very low risk threshold of 99% sensitivity the deep learning model identifies more very low risk patients. Using another deep learning model that can model longitudinal risk, the long-short-term memory recurrent neural network, need for transfusion of red blood cells can be predicted at every 4-hour interval in the first 24 hours of intensive care unit stay for high risk patients with acute gastrointestinal bleeding. Finally, for implementation it is important to find patients with symptoms of acute gastrointestinal bleeding in real time and characterize patients by risk using available data in the electronic health record. A decision rule-based electronic health record phenotype has equivalent performance as measured by positive predictive value compared to deep learning and natural language processing-based models, and after live implementation appears to have increased the use of the Acute Gastrointestinal Bleeding Clinical Care pathway. Patients with acute gastrointestinal bleeding but with other groups of disease concepts can be differentiated by directly mapping unstructured clinical text to a common ontology and treating the vector of concepts as signals on a knowledge graph; these patients can be differentiated using unbalanced diffusion earth mover’s distances on the graph. For electronic health record data with data missing not at random, MURAL, an unsupervised random forest-based method, handles data with missing values and generates visualizations that characterize patients with gastrointestinal bleeding. This thesis forms a basis for understanding the potential for machine learning and deep learning tools to characterize risk for patients with acute gastrointestinal bleeding. In the future, these tools may be critical in implementing integrated risk assessment to keep low risk patients out of the hospital and guide resuscitation and timely endoscopic procedures for patients at higher risk for clinical decompensation

    Cell Nuclear Morphology Analysis Using 3D Shape Modeling, Machine Learning and Visual Analytics

    Full text link
    Quantitative analysis of morphological changes in a cell nucleus is important for the understanding of nuclear architecture and its relationship with cell differentiation, development, proliferation, and disease. Changes in the nuclear form are associated with reorganization of chromatin architecture related to altered functional properties such as gene regulation and expression. Understanding these processes through quantitative analysis of morphological changes is important not only for investigating nuclear organization, but also has clinical implications, for example, in detection and treatment of pathological conditions such as cancer. While efforts have been made to characterize nuclear shapes in two or pseudo-three dimensions, several studies have demonstrated that three dimensional (3D) representations provide better nuclear shape description, in part due to the high variability of nuclear morphologies. 3D shape descriptors that permit robust morphological analysis and facilitate human interpretation are still under active investigation. A few methods have been proposed to classify nuclear morphologies in 3D, however, there is a lack of publicly available 3D data for the evaluation and comparison of such algorithms. There is a compelling need for robust 3D nuclear morphometric techniques to carry out population-wide analyses. In this work, we address a number of these existing limitations. First, we present a largest publicly available, to-date, 3D microscopy imaging dataset for cell nuclear morphology analysis and classification. We provide a detailed description of the image analysis protocol, from segmentation to baseline evaluation of a number of popular classification algorithms using 2D and 3D voxel-based morphometric measures. We proposed a specific cross-validation scheme that accounts for possible batch effects in data. Second, we propose a new technique that combines mathematical modeling, machine learning, and interpretation of morphometric characteristics of cell nuclei and nucleoli in 3D. Employing robust and smooth surface reconstruction methods to accurately approximate 3D object boundary enables the establishment of homologies between different biological shapes. Then, we compute geometric morphological measures characterizing the form of cell nuclei and nucleoli. We combine these methods into a highly parallel computational pipeline workflow for automated morphological analysis of thousands of nuclei and nucleoli in 3D. We also describe the use of visual analytics and deep learning techniques for the analysis of nuclear morphology data. Third, we evaluate proposed methods for 3D surface morphometric analysis of our data. We improved the performance of morphological classification between epithelial vs mesenchymal human prostate cancer cells compared to the previously reported results due to the more accurate shape representation and the use of combined nuclear and nucleolar morphometry. We confirmed previously reported relevant morphological characteristics, and also reported new features that can provide insight in the underlying biological mechanisms of pathology of prostate cancer. We also assessed nuclear morphology changes associated with chromatin remodeling in drug-induced cellular reprogramming. We computed temporal trajectories reflecting morphological differences in astroglial cell sub-populations administered with 2 different treatments vs controls. We described specific changes in nuclear morphology that are characteristic of chromatin re-organization under each treatment, which previously has been only tentatively hypothesized in literature. Our approach demonstrated high classification performance on each of 3 different cell lines and reported the most salient morphometric characteristics. We conclude with the discussion of the potential impact of method development in nuclear morphology analysis on clinical decision-making and fundamental investigation of 3D nuclear architecture. We consider some open problems and future trends in this field.PHDBioinformaticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/147598/1/akalinin_1.pd

    Learning Discriminative Representations for Gigapixel Images

    Get PDF
    Digital images of tumor tissue are important diagnostic and prognostic tools for pathologists. Recent advancement in digital pathology has led to an abundance of digitized histopathology slides, called whole-slide images. Computational analysis of whole-slide images is a challenging task as they are generally gigapixel files, often one or more gigabytes in size. However, these computational methods provide a unique opportunity to improve the objectivity and accuracy of diagnostic interpretations in histopathology. Recently, deep learning has been successful in characterizing images for vision-based applications in multiple domains. But its applications are relatively less explored in the histopathology domain mostly due to the following two challenges. Firstly, there is difficulty in scaling deep learning methods for processing large gigapixel histopathology images. Secondly, there is a lack of diversified and labeled datasets due to privacy constraints as well as workflow and technical challenges in the healthcare sector. The main goal of this dissertation is to explore and develop deep models to learn discriminative representations of whole slide images while overcoming the existing challenges. A three-staged approach was considered in this research. In the first stage, a framework called Yottixel is proposed. It represents a whole-slide image as a set of multiple representative patches, called mosaic. The mosaic enables convenient processing and compact representation of an entire high-resolution whole-slide image. Yottixel allows faster retrieval of similar whole-slide images within large archives of digital histopathology images. Such retrieval technology enables pathologists to tap into the past diagnostic data on demand. Yottixel is validated on the largest public archive of whole-slide images (The Cancer Genomic Atlas), achieving promising results. Yottixel is an unsupervised method that limits its performance on specific tasks especially when the labeled (or partially labeled) dataset can be available. In the second stage, multi-instance learning (MIL) is used to enhance the cancer subtype prediction through weakly-supervised training. Three MIL methods have been proposed, each improving upon the previous one. The first one is based on memory-based models, the second uses attention-based models, and the third one uses graph neural networks. All three methods are incorporated in Yottixel to classify entire whole-slide images with no pixel-level annotations. Access to large-scale and diversified datasets is a primary driver of the advancement and adoption of machine learning technologies. However, healthcare has many restrictive rules around data sharing, limiting research and model development. In the final stage, a federated learning scheme called ProxyFL is developed that enables collaborative training of Yottixel among the multiple healthcare organizations without centralization of the sensitive medical data. The combined research in all the three stages of the Ph.D. has resulted in the development of a holistic and practical framework for learning discriminative and compact representations of whole-slide images in digital pathology

    Causal inference and interpretable machine learning for personalised medicine

    Get PDF
    In this thesis, we discuss the importance of causal knowledge in healthcare for tailoring treatments to a patient's needs. We propose three different causal models for reasoning about the effects of medical interventions on patients with HIV and sepsis, based on observational data. Both application areas are challenging as a result of patient heterogeneity and the existence of confounding that influences patient outcomes. Our first contribution is a treatment policy mixture model that combines nonparametric, kernel-based learning with model-based reinforcement learning to reason about a series of treatments and their effects. These methods each have their own strengths: non-parametric methods can accurately predict treatment effects where there are overlapping patient instances or where data is abundant; model-based reinforcement learning generalises better in outlier situations by learning a belief state representation of confounding. The overall policy mixture model learns a partition of the space of heterogeneous patients such that we can personalise treatments accordingly. Our second contribution incorporates knowledge from kernel-based reasoning directly into a reinforcement learning model by learning a combined belief state representation. In doing so, we can use the model to simulate counterfactual scenarios to reason about what would happen to a patient if we intervened in a particular way and how would their specific outcomes change. As a result, we may tailor therapies according to patient-specific scenarios. Our third contribution is a reformulation of the information bottleneck problem for learning an interpretable, low-dimensional representation of confounding for medical decision-making. The approach uses the relevance of information to perform a sufficient reduction of confounding. Based on this reduction, we learn equivalence classes among groups of patients, such that we may transfer knowledge to patients with incomplete covariate information at test time. By conditioning on the sufficient statistic we can accurately infer treatment effects on both a population and subgroup level. Our final contribution is the development of a novel regularisation strategy that can be applied to deep machine learning models to enforce clinical interpretability. We specifically train deep time-series models such that their predictions have high accuracy while being closely modelled by small decision trees that can be audited easily by medical experts. Broadly, our tree-based explanations can be used to provide additional context in scenarios where reasoning about treatment effects may otherwise be difficult. Importantly, each of the models we present is an attempt to bring about more understanding in medical applications to inform better decision-making overall

    Seventh Biennial Report : June 2003 - March 2005

    No full text
    corecore