316 research outputs found

    Machine Learning Models for Deciphering Regulatory Mechanisms and Morphological Variations in Cancer

    Get PDF
    The exponential growth of multi-omics biological datasets is resulting in an emerging paradigm shift in fundamental biological research. In recent years, imaging and transcriptomics datasets are increasingly incorporated into biological studies, pushing biology further into the domain of data-intensive-sciences. New approaches and tools from statistics, computer science, and data engineering are profoundly influencing biological research. Harnessing this ever-growing deluge of multi-omics biological data requires the development of novel and creative computational approaches. In parallel, fundamental research in data sciences and Artificial Intelligence (AI) has advanced tremendously, allowing the scientific community to generate a massive amount of knowledge from data. Advances in Deep Learning (DL), in particular, are transforming many branches of engineering, science, and technology. Several of these methodologies have already been adapted for harnessing biological datasets; however, there is still a need to further adapt and tailor these techniques to new and emerging technologies. In this dissertation, we present computational algorithms and tools that we have developed to study gene-regulation and cellular morphology in cancer. The models and platforms that we have developed are general and widely applicable to several problems relating to dysregulation of gene expression in diseases. Our pipelines and software packages are disseminated in public repositories for larger scientific community use. This dissertation is organized in three main projects. In the first project, we present Causal Inference Engine (CIE), an integrated platform for the identification and interpretation of active regulators of transcriptional response. The platform offers visualization tools and pathway enrichment analysis to map predicted regulators to Reactome pathways. We provide a parallelized R-package for fast and flexible directional enrichment analysis to run the inference on custom regulatory networks. Next, we designed and developed MODEX, a fully automated text-mining system to extract and annotate causal regulatory interaction between Transcription Factors (TFs) and genes from the biomedical literature. MODEX uses putative TF-gene interactions derived from high-throughput ChIP-Seq or other experiments and seeks to collect evidence and meta-data in the biomedical literature to validate and annotate the interactions. MODEX is a complementary platform to CIE that provides auxiliary information on CIE inferred interactions by mining the literature. In the second project, we present a Convolutional Neural Network (CNN) classifier to perform a pan-cancer analysis of tumor morphology, and predict mutations in key genes. The main challenges were to determine morphological features underlying a genetic status and assess whether these features were common in other cancer types. We trained an Inception-v3 based model to predict TP53 mutation in five cancer types with the highest rate of TP53 mutations. We also performed a cross-classification analysis to assess shared morphological features across multiple cancer types. Further, we applied a similar methodology to classify HER2 status in breast cancer and predict response to treatment in HER2 positive samples. For this study, our training slides were manually annotated by expert pathologists to highlight Regions of Interest (ROIs) associated with HER2+/- tumor microenvironment. Our results indicated that there are strong morphological features associated with each tumor type. Moreover, our predictions highly agree with manual annotations in the test set, indicating the feasibility of our approach in devising an image-based diagnostic tool for HER2 status and treatment response prediction. We have validated our model using samples from an independent cohort, which demonstrates the generalizability of our approach. Finally, in the third project, we present an approach to use spatial transcriptomics data to predict spatially-resolved active gene regulatory mechanisms in tissues. Using spatial transcriptomics, we identified tissue regions with differentially expressed genes and applied our CIE methodology to predict active TFs that can potentially regulate the marker genes in the region. This project bridged the gap between inference of active regulators using molecular data and morphological studies using images. The results demonstrate a significant local pattern in TF activity across the tissue, indicating differential spatial-regulation in tissues. The results suggest that the integrative analysis of spatial transcriptomics data with CIE can capture discriminant features and identify localized TF-target links in the tissue

    Applications of Artificial Intelligence in Medicine Practice

    Get PDF
    This book focuses on a variety of interdisciplinary perspectives concerning the theory and application of artificial intelligence (AI) in medicine, medically oriented human biology, and healthcare. The list of topics includes the application of AI in biomedicine and clinical medicine, machine learning-based decision support, robotic surgery, data analytics and mining, laboratory information systems, and usage of AI in medical education. Special attention is given to the practical aspect of a study. Hence, the inclusion of a clinical assessment of the usefulness and potential impact of the submitted work is strongly highlighted

    Label2label: Training a neural network to selectively restore cellular structures in fluorescence microscopy

    Get PDF
    Immunofluorescence (IF) microscopy is routinely used to visualise the spatial distribution of proteins that dictates their cellular function. However, unspecific antibody binding often results in high cytosolic background signals, decreasing the image contrast of a target structure. Recently, convolutional neural networks (CNNs) were successfully employed for image restoration in IF microscopy, but current methods cannot correct for those background signals. We report a new method that trains a CNN to reduce unspecific signals in IF images; we name this method label2label (L2L). In L2L, a CNN is trained with image pairs of two non-identical labels that target the same cellular structure. We show that after L2L training a network predicts images with significantly increased contrast of a target structure, which is further improved after implementing a multi-scale structural similarity loss function. Here, our results suggest that sample differences in the training data decrease hallucination effects that are observed with other methods. We further assess the performance of a cycle generative adversarial network, and show that a CNN can be trained to separate structures in superposed IF images of two targets

    Learning cell representations in temporal and 3D contexts

    Get PDF
    Cell morphology and its changes under different circumstances is one of the primary ways by which we can understand biology. Computational tools for characterization and analysis, therefore, play a critical role in advancing studies involving cell morphology. In this thesis, I explored the use of representation learning and self-supervised methods to analyze nuclear texture in fluorescence imaging across different contexts and scales. To analyze the cell cycle using 2D temporal imaging data, as well as DNA damage in 3D imaging data, I employed a simple model based on the VAE-GAN architecture. Through the VAE-GAN model, I constructed manifolds in which the latent representations of the data can be grouped and clustered based on textural similarities without the need for exhaustive training annotations. I used these representations, as well as manually engineered features, to perform various analyses both at the single cell and tissue levels. The application on the cell cycle data revealed that common tasks such as cell cycle staging and cell cycle time estimation can be done even with minimal fluorescence information and user annotation. On the other hand, the texture classes derived to characterize DNA damage in 3D histology images unveiled differences between control and treated tissue regions. Lastly, by aggregating cell-level information to characterize local cell neighborhoods, interactions between DNA-damaged cells and immune cells can be quantified and some tissue microstructures can be identified. The results presented in this thesis demonstrated the utility of the representations learned through my approach in supporting biological inquiries involving temporal and 3D spatial data. The quantitative measurements computed using the presented methods have the potential to aid not only similar experiments on the cell cycle and DNA damage but also in exploratory studies in 3D histology

    Artificial intelligence in histopathology image analysis for cancer precision medicine

    Get PDF
    In recent years, there have been rapid advancements in the field of computational pathology. This has been enabled through the adoption of digital pathology workflows that generate digital images of histopathological slides, the publication of large data sets of these images and improvements in computing infrastructure. Objectives in computational pathology can be subdivided into two categories, first the automation of routine workflows that would otherwise be performed by pathologists and second the addition of novel capabilities. This thesis focuses on the development, application, and evaluation of methods in this second category, specifically the prediction of gene expression from pathology images and the registration of pathology images among each other. In Study I, we developed a computationally efficient cluster-based technique to perform transcriptome-wide predictions of gene expression in prostate cancer from H&E-stained whole-slide-images (WSIs). The suggested method outperforms several baseline methods and is non-inferior to single-gene CNN predictions, while reducing the computational cost with a factor of approximately 300. We included 15,586 transcripts that encode proteins in the analysis and predicted their expression with different modelling approaches from the WSIs. In a cross-validation, 6,618 of these predictions were significantly associated with the RNA-seq expression estimates with FDR-adjusted p-values <0.001. Upon validation of these 6,618 expression predictions in a held-out test set, the association could be confirmed for 5,419 (81.9%). Furthermore, we demonstrated that it is feasible to predict the prognostic cell-cycle progression score with a Spearman correlation to the RNA-seq score of 0.527 [0.357, 0.665]. The objective of Study II is the investigation of attention layers in the context of multiple-instance-learning for regression tasks, exemplified by a simulation study and gene expression prediction. We find that for gene expression prediction, the compared methods are not distinguishable regarding their performance, which indicates that attention mechanisms may not be superior to weakly supervised learning in this context. Study III describes the results of the ACROBAT 2022 WSI registration challenge, which we organised in conjunction with the MICCAI 2022 conference. Participating teams were ranked on the median 90th percentile of distances between registered and annotated target landmarks. Median 90th percentiles for eight teams that were eligible for ranking in the test set consisting of 303 WSI pairs ranged from 60.1 µm to 15,938.0 µm. The best performing method therefore has a score slightly below the median 90th percentile of distances between first and second annotator of 67.0 µm. Study IV describes the data set that we published to facilitate the ACROBAT challenge. The data set is available publicly through the Swedish National Data Service SND and consists of 4,212 WSIs from 1,153 breast cancer patients. Study V is an example of the application of WSI registration for computational pathology. In this study, we investigate the possibility to register invasive cancer annotations from H&E to KI67 WSIs and then subsequently train cancer detection models. To this end, we compare the performance of models optimised with registered annotations to the performance of models that were optimised with annotations generated for the KI67 WSIs. The data set consists of 272 female breast cancer cases, including an internal test set of 54 cases. We find that in this test set, the performance of both models is not distinguishable regarding performance, while there are small differences in model calibration

    Deep learning integrates histopathology and proteogenomics at a pan-cancer level

    Get PDF
    We introduce a pioneering approach that integrates pathology imaging with transcriptomics and proteomics to identify predictive histology features associated with critical clinical outcomes in cancer. We utilize 2,755 H&E-stained histopathological slides from 657 patients across 6 cancer types from CPTAC. Our models effectively recapitulate distinctions readily made by human pathologists: tumor vs. normal (AUROC = 0.995) and tissue-of-origin (AUROC = 0.979). We further investigate predictive power on tasks not normally performed from H&E alone, including TP53 prediction and pathologic stage. Importantly, we describe predictive morphologies not previously utilized in a clinical setting. The incorporation of transcriptomics and proteomics identifies pathway-level signatures and cellular processes driving predictive histology features. Model generalizability and interpretability is confirmed using TCGA. We propose a classification system for these tasks, and suggest potential clinical applications for this integrated human and machine learning approach. A publicly available web-based platform implements these models

    Multimodal Biomedical Data Visualization: Enhancing Network, Clinical, and Image Data Depiction

    Get PDF
    In this dissertation, we present visual analytics tools for several biomedical applications. Our research spans three types of biomedical data: reaction networks, longitudinal multidimensional clinical data, and biomedical images. For each data type, we present intuitive visual representations and efficient data exploration methods to facilitate visual knowledge discovery. Rule-based simulation has been used for studying complex protein interactions. In a rule-based model, the relationships of interacting proteins can be represented as a network. Nevertheless, understanding and validating the intended behaviors in large network models are ineffective and error prone. We have developed a tool that first shows a network overview with concise visual representations and then shows relevant rule-specific details on demand. This strategy significantly improves visualization comprehensibility and disentangles the complex protein-protein relationships by showing them selectively alongside the global context of the network. Next, we present a tool for analyzing longitudinal multidimensional clinical datasets, that we developed for understanding Parkinson's disease progression. Detecting patterns involving multiple time-varying variables is especially challenging for clinical data. Conventional computational techniques, such as cluster analysis and dimension reduction, do not always generate interpretable, actionable results. Using our tool, users can select and compare patient subgroups by filtering patients with multiple symptoms simultaneously and interactively. Unlike conventional visualizations that use local features, many targets in biomedical images are characterized by high-level features. We present our research characterizing such high-level features through multiscale texture segmentation and deep-learning strategies. First, we present an efficient hierarchical texture segmentation approach that scales up well to gigapixel images to colorize electron microscopy (EM) images. This enhances visual comprehensibility of gigapixel EM images across a wide range of scales. Second, we use convolutional neural networks (CNNs) to automatically derive high-level features that distinguish cell states in live-cell imagery and voxel types in 3D EM volumes. In addition, we present a CNN-based 3D segmentation method for biomedical volume datasets with limited training samples. We use factorized convolutions and feature-level augmentations to improve model generalization and avoid overfitting

    Cell Nuclear Morphology Analysis Using 3D Shape Modeling, Machine Learning and Visual Analytics

    Full text link
    Quantitative analysis of morphological changes in a cell nucleus is important for the understanding of nuclear architecture and its relationship with cell differentiation, development, proliferation, and disease. Changes in the nuclear form are associated with reorganization of chromatin architecture related to altered functional properties such as gene regulation and expression. Understanding these processes through quantitative analysis of morphological changes is important not only for investigating nuclear organization, but also has clinical implications, for example, in detection and treatment of pathological conditions such as cancer. While efforts have been made to characterize nuclear shapes in two or pseudo-three dimensions, several studies have demonstrated that three dimensional (3D) representations provide better nuclear shape description, in part due to the high variability of nuclear morphologies. 3D shape descriptors that permit robust morphological analysis and facilitate human interpretation are still under active investigation. A few methods have been proposed to classify nuclear morphologies in 3D, however, there is a lack of publicly available 3D data for the evaluation and comparison of such algorithms. There is a compelling need for robust 3D nuclear morphometric techniques to carry out population-wide analyses. In this work, we address a number of these existing limitations. First, we present a largest publicly available, to-date, 3D microscopy imaging dataset for cell nuclear morphology analysis and classification. We provide a detailed description of the image analysis protocol, from segmentation to baseline evaluation of a number of popular classification algorithms using 2D and 3D voxel-based morphometric measures. We proposed a specific cross-validation scheme that accounts for possible batch effects in data. Second, we propose a new technique that combines mathematical modeling, machine learning, and interpretation of morphometric characteristics of cell nuclei and nucleoli in 3D. Employing robust and smooth surface reconstruction methods to accurately approximate 3D object boundary enables the establishment of homologies between different biological shapes. Then, we compute geometric morphological measures characterizing the form of cell nuclei and nucleoli. We combine these methods into a highly parallel computational pipeline workflow for automated morphological analysis of thousands of nuclei and nucleoli in 3D. We also describe the use of visual analytics and deep learning techniques for the analysis of nuclear morphology data. Third, we evaluate proposed methods for 3D surface morphometric analysis of our data. We improved the performance of morphological classification between epithelial vs mesenchymal human prostate cancer cells compared to the previously reported results due to the more accurate shape representation and the use of combined nuclear and nucleolar morphometry. We confirmed previously reported relevant morphological characteristics, and also reported new features that can provide insight in the underlying biological mechanisms of pathology of prostate cancer. We also assessed nuclear morphology changes associated with chromatin remodeling in drug-induced cellular reprogramming. We computed temporal trajectories reflecting morphological differences in astroglial cell sub-populations administered with 2 different treatments vs controls. We described specific changes in nuclear morphology that are characteristic of chromatin re-organization under each treatment, which previously has been only tentatively hypothesized in literature. Our approach demonstrated high classification performance on each of 3 different cell lines and reported the most salient morphometric characteristics. We conclude with the discussion of the potential impact of method development in nuclear morphology analysis on clinical decision-making and fundamental investigation of 3D nuclear architecture. We consider some open problems and future trends in this field.PHDBioinformaticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/147598/1/akalinin_1.pd

    Longitudinal Brain Tumor Tracking, Tumor Grading, and Patient Survival Prediction Using MRI

    Get PDF
    This work aims to develop novel methods for brain tumor classification, longitudinal brain tumor tracking, and patient survival prediction. Consequently, this dissertation proposes three tasks. First, we develop a framework for brain tumor segmentation prediction in longitudinal multimodal magnetic resonance imaging (mMRI) scans, comprising two methods: feature fusion and joint label fusion (JLF). The first method fuses stochastic multi-resolution texture features with tumor cell density features, in order to obtain tumor segmentation predictions in follow-up scans from a baseline pre-operative timepoint. The second method utilizes JLF to combine segmentation labels obtained from (i) the stochastic texture feature-based and Random Forest (RF)-based tumor segmentation method; and (ii) another state-of-the-art tumor growth and segmentation method known as boosted Glioma Image Segmentation and Registration (GLISTRboost, or GB). With the advantages of feature fusion and label fusion, we achieve state-of-the-art brain tumor segmentation prediction. Second, we propose a deep neural network (DNN) learning-based method for brain tumor type and subtype grading using phenotypic and genotypic data, following the World Health Organization (WHO) criteria. In addition, the classification method integrates a cellularity feature which is derived from the morphology of a pathology image to improve classification performance. The proposed method achieves state-of-the-art performance for tumor grading following the new CNS tumor grading criteria. Finally, we investigate brain tumor volume segmentation, tumor subtype classification, and overall patient survival prediction, and then we propose a new context- aware deep learning method, known as the Context Aware Convolutional Neural Network (CANet). Using the proposed method, we participated in the Multimodal Brain Tumor Segmentation Challenge 2019 (BraTS 2019) for brain tumor volume segmentation and overall survival prediction tasks. In addition, we also participated in the Radiology-Pathology Challenge 2019 (CPM-RadPath 2019) for Brain Tumor Subtype Classification, organized by the Medical Image Computing & Computer Assisted Intervention (MICCAI) Society. The online evaluation results show that the proposed methods offer competitive performance from their use of state-of-the-art methods in tumor volume segmentation, promising performance on overall survival prediction, and state-of-the-art performance on tumor subtype classification. Moreover, our result was ranked second place in the testing phase of the CPM-RadPath 2019
    corecore