FRAMEWORK FOR THE EVALUATION OF PERTURBATIONS IN THE SYSTEMS BIOLOGY LANDSCAPE AND INTER-SAMPLE SIMILARITY FROM TRANSCRIPTOMIC DATASETS — A DIGITAL TWIN PERSPECTIVE

Abstract

One approach to interrogating the complexities of human systems in their well-regulated and dysregulated states is through the use of digital twins. Digital twins are virtual representations of physical systems that are descriptive of an individual\u27s state of health, an object fundamentally related to precision medicine. A key element for building a functional digital twin type for a disease or predicting the therapeutic efficacy of a potential treatment is harmonized, machine-parsable domain knowledge. Hypothesis-driven investigations are the gold standard for representing subsystems, but their results encompass a limited knowledge of the full biosystem. Multi-omics data is one rich source of knowledge for characterizing disease- and therapy-induced shifts across the systems biology landscape. However, systematic biases in and between the data types limits the functionality of big multi-omics data. In this dissertation, the generation of and results from transcriptomic analysis pipelines are assessed in their biological context and respective to their usability for applications such as digital twins. This latter is achieved by assessing the adherence of the workflows to the FAIR principles --- Findability, Accessibility, Interoperability, and Reusability --- and the extent to which they connect to the broader systems biology landscape. The first two specific aims of this work emphasize the transcriptomic shifts induced by atypical teratoid rhabdoid tumors (ATRT) relative to the normal brain and those induced by treatment of tumor models by 4SC-202 across disease states including medulloblastoma, ATRT, triple negative breast cancer, osteosarcoma, and pancreatic cancer. These are problem-driven workflows, tightly connected to biological hypotheses that contribute to disease and therapy-specific domain knowledge. In contrast, the third specific aim introduces a domain-agnostic approach for developing transcriptomic pipelines to harmonize bulk RNA-sequencing datasets. This framework does not directly contribute to a given biological domain, but instead provides a generalized approach for integrating large RNA-sequencing datasets and assessing the resultant representation for biological meaningfulness. This harmonization framework may also have utility in assessing the clinical relevance of in vitro biomodels. Collectively, this work presents and assesses the efficacy of multiple transcriptomic workflows within their biological context and broader machine learning applicability

    Similar works