2 research outputs found

    Machine Learning Models for Deciphering Regulatory Mechanisms and Morphological Variations in Cancer

    Get PDF
    The exponential growth of multi-omics biological datasets is resulting in an emerging paradigm shift in fundamental biological research. In recent years, imaging and transcriptomics datasets are increasingly incorporated into biological studies, pushing biology further into the domain of data-intensive-sciences. New approaches and tools from statistics, computer science, and data engineering are profoundly influencing biological research. Harnessing this ever-growing deluge of multi-omics biological data requires the development of novel and creative computational approaches. In parallel, fundamental research in data sciences and Artificial Intelligence (AI) has advanced tremendously, allowing the scientific community to generate a massive amount of knowledge from data. Advances in Deep Learning (DL), in particular, are transforming many branches of engineering, science, and technology. Several of these methodologies have already been adapted for harnessing biological datasets; however, there is still a need to further adapt and tailor these techniques to new and emerging technologies. In this dissertation, we present computational algorithms and tools that we have developed to study gene-regulation and cellular morphology in cancer. The models and platforms that we have developed are general and widely applicable to several problems relating to dysregulation of gene expression in diseases. Our pipelines and software packages are disseminated in public repositories for larger scientific community use. This dissertation is organized in three main projects. In the first project, we present Causal Inference Engine (CIE), an integrated platform for the identification and interpretation of active regulators of transcriptional response. The platform offers visualization tools and pathway enrichment analysis to map predicted regulators to Reactome pathways. We provide a parallelized R-package for fast and flexible directional enrichment analysis to run the inference on custom regulatory networks. Next, we designed and developed MODEX, a fully automated text-mining system to extract and annotate causal regulatory interaction between Transcription Factors (TFs) and genes from the biomedical literature. MODEX uses putative TF-gene interactions derived from high-throughput ChIP-Seq or other experiments and seeks to collect evidence and meta-data in the biomedical literature to validate and annotate the interactions. MODEX is a complementary platform to CIE that provides auxiliary information on CIE inferred interactions by mining the literature. In the second project, we present a Convolutional Neural Network (CNN) classifier to perform a pan-cancer analysis of tumor morphology, and predict mutations in key genes. The main challenges were to determine morphological features underlying a genetic status and assess whether these features were common in other cancer types. We trained an Inception-v3 based model to predict TP53 mutation in five cancer types with the highest rate of TP53 mutations. We also performed a cross-classification analysis to assess shared morphological features across multiple cancer types. Further, we applied a similar methodology to classify HER2 status in breast cancer and predict response to treatment in HER2 positive samples. For this study, our training slides were manually annotated by expert pathologists to highlight Regions of Interest (ROIs) associated with HER2+/- tumor microenvironment. Our results indicated that there are strong morphological features associated with each tumor type. Moreover, our predictions highly agree with manual annotations in the test set, indicating the feasibility of our approach in devising an image-based diagnostic tool for HER2 status and treatment response prediction. We have validated our model using samples from an independent cohort, which demonstrates the generalizability of our approach. Finally, in the third project, we present an approach to use spatial transcriptomics data to predict spatially-resolved active gene regulatory mechanisms in tissues. Using spatial transcriptomics, we identified tissue regions with differentially expressed genes and applied our CIE methodology to predict active TFs that can potentially regulate the marker genes in the region. This project bridged the gap between inference of active regulators using molecular data and morphological studies using images. The results demonstrate a significant local pattern in TF activity across the tissue, indicating differential spatial-regulation in tissues. The results suggest that the integrative analysis of spatial transcriptomics data with CIE can capture discriminant features and identify localized TF-target links in the tissue

    Deep Learning Enabled Semantic Communication Systems

    Get PDF
    In the past decades, communications primarily focus on how to accurately and effectively transmit symbols (measured by bits) from the transmitter to the receiver. Recently, various new applications appear, such as autonomous transportation, consumer robotics, environmental monitoring, and tele-health. The interconnection of these applications will generate a staggering amount of data in the order of zetta-bytes and require massive connectivity over limited spectrum resources but with lower latency, which poses critical challenges to conventional communication systems. Semantic communication has been proposed to overcome the challenges by extracting the meanings of data and filtering out the useless, irrelevant, and unessential information, which is expected to be robust to terrible channel environments and reduce the size of transmitted data. While semantic communications have been proposed decades ago, their applications to the wireless communication scenario remain limited. Deep learning (DL) based neural networks can effectively extract semantic information and can be optimized in an end-to-end (E2E) manner. The inborn characteristics of DL are suitable for semantic communications, which motivates us to exploit DL-enabled semantic communication. Inspired by the above, this thesis focus on exploring the semantic communication theory and designing semantic communication systems. First, a basic DL based semantic communication system, named DeepSC, is proposed for text transmission. In addition, DL based multi-user semantic communication systems are investigated for transmitting single-modal data and multimodal data, respectively, in which intelligent tasks are performed at the receiver directly. Moreover, a semantic communication system with a memory module, named Mem-DeepSC, is designed to support both memoryless and memory intelligent tasks. Finally, a lite distributed semantic communication system based on DL, named L-DeepSC, is proposed with low complexity, where the data transmission from the Internet-of-Things (IoT) devices to the cloud/edge works at the semantic level to improve transmission efficiency. The proposed various DeepSC systems can achieve less data transmission to reduce the transmission latency, lower complexity to fit capacity-constrained devices, higher robustness to multi-user interference and channel noise, and better performance to perform various intelligent tasks compared to the conventional communication systems
    corecore