1,115 research outputs found

    A comparison of curated gene sets versus transcriptomics-derived gene signatures for detecting pathway activation in immune cells

    Get PDF
    Background: Despite the significant contribution of transcriptomics to the fields of biological and biomedical research, interpreting long lists of significantly differentially expressed genes remains a challenging step in the analysis process. Gene set enrichment analysis is a standard approach for summarizing differentially expressed genes into pathways or other gene groupings. Here, we explore an alternative approach to utilizing gene sets from curated databases. We examine the method of deriving custom gene sets which may be relevant to a given experiment using reference data sets from previous transcriptomics studies. We call these data-derived gene sets, "gene signatures" for the biological process tested in the previous study. We focus on the feasibility of this approach in analyzing immune-related processes, which are complicated in their nature but play an important role in the medical research. Results: We evaluate several statistical approaches to detecting the activity of a gene signature in a target data set. We compare the performance of the data-derived gene signature approach with comparable GO term gene sets across all of the statistical tests. A total of 61 differential expression comparisons generated from 26 transcriptome experiments were included in the analysis. These experiments covered eight immunological processes in eight types of leukocytes. The data-derived signatures were used to detect the presence of immunological processes in the test data with modest accuracy (AUC = 0.67). The performance for GO and literature based gene sets was worse (AUC = 0.59). Both approaches were plagued by poor specificity. Conclusions: When investigators seek to test specific hypotheses, the data-derived signature approach can perform as well, if not better than standard gene-set based approaches for immunological signatures. Furthermore, the data-derived signatures can be generated in the cases that well-defined gene sets are lacking from pathway databases and also offer the opportunity for defining signatures in a cell-type specific manner. However, neither the data-derived signatures nor standard gene-sets can be demonstrated to reliably provide negative predictions for negative cases. We conclude that the data-derived signature approach is a useful and sometimes necessary tool, but analysts should be weary of false positives. © 2020 The Author(s)

    An ancestral molecular response to nanomaterial particulates

    Get PDF
    The varied transcriptomic response to nanoparticles has hampered the understanding of the mechanism of action. Here, by performing a meta-analysis of a large collection of transcriptomics data from various engineered nanoparticle exposure studies, we identify common patterns of gene regulation that impact the transcriptomic response. Analysis identifies deregulation of immune functions as a prominent response across different exposure studies. Looking at the promoter regions of these genes, a set of binding sites for zinc finger transcription factors C2H2, involved in cell stress responses, protein misfolding and chromatin remodelling and immunomodulation, is identified. The model can be used to explain the outcomes of mechanism of action and is observed across a range of species indicating this is a conserved part of the innate immune system.Peer reviewe

    Comprehensive characterization of the prostate tumor microenvironment identifies CXCR4/CXCL12 crosstalk as a novel antiangiogenic therapeutic target in prostate cancer

    Get PDF
    Background: Crosstalk between neoplastic and stromal cells fosters prostate cancer (PCa) progression and dissemination. Insight in cell-to-cell communication networks provides new therapeutic avenues to mold processes that contribute to PCa tumor microenvironment (TME) alterations. Here we performed a detailed characterization of PCa tumor endothelial cells (TEC) to delineate intercellular crosstalk between TEC and the PCa TME. Methods: TEC isolated from 67 fresh radical prostatectomy (RP) specimens underwent multi-omic ex vivo characterization as well as orthogonal validation of both TEC functions and key markers by immunohistochemistry (IHC) and immunofluorescence (IF). To identify cell-cell interaction targets in TEC, we performed single-cell RNA sequencing (scRNA-seq) in four PCa patients who underwent a RP to catalogue cellular TME composition. Targets were cross-validated using IHC, publicly available datasets, cell culture expriments as well as a PCa xenograft mouse model. Results: Compared to adjacent normal endothelial cells (NEC) bulk RNA-seq analysis revealed upregulation of genes associated with tumor vasculature, collagen modification and extracellular matrix remodeling in TEC. PTGIR, PLAC9, CXCL12 and VDR were identified as TEC markers and confirmed by IF and IHC in an independent patient cohort. By scRNA-seq we identified 27 cell (sub)types, including endothelial cells (EC) with arterial, venous and immature signatures, as well as angiogenic tip EC. A focused molecular analysis revealed that arterial TEC displayed highest CXCL12 mRNA expression levels when compared to all other TME cell (sub)populations and showed a negative prognostic role. Receptor-ligand interaction analysis predicted interactions between arterial TEC derived CXCL12 and its cognate receptor CXCR4 on angiogenic tip EC. CXCL12 was in vitro and in vivo validated as actionable TEC target by highlighting the vessel number- and density- reducing activity of the CXCR4-inhibitor AMD3100 in murine PCa as well as by inhibition of TEC proliferation and migration in vitro. Conclusions: Overall, our comprehensive analysis identified novel PCa TEC targets and highlights CXCR4/CXCL12 interaction as a potential novel target to interfere with tumor angiogenesis in PCa

    Integration of stemness gene signatures reveals core functional modules of stem cells and potential novel stemness genes

    Get PDF
    Stem cells encompass a variety of different cell types which converge on the dual capacity to self-renew and differentiate into one or more lineages. These characteristic features are key for the involvement of stem cells in crucial biological processes such as development and ageing. To decipher their underlying genetic substrate, it is important to identify so-called stemness genes that are common to different stem cell types and are consistently identified across different studies. In this meta-analysis, 21 individual stemness signatures for humans and another 21 for mice, obtained from a variety of stem cell types and experimental techniques, were compared. Although we observed biological and experimental variability, a highly significant overlap between gene signatures was identified. This enabled us to define integrated stemness signatures (ISSs) comprised of genes frequently occurring among individual stemness signatures. Such integrated signatures help to exclude false positives that can compromise individual studies and can provide a more robust basis for investigation. To gain further insights into the relevance of ISSs, their genes were functionally annotated and connected within a molecular interaction network. Most importantly, the present analysis points to the potential roles of several less well-studied genes in stemness and thus provides promising candidates for further experimental validation.info:eu-repo/semantics/publishedVersio

    Doctor of Philosophy

    Get PDF
    dissertationDespite the advancements in therapies, next-generation sequencing, and our knowledge, breast cancer is claiming hundreds of thousands of lives around the world every year. We have therapy options that work for only a fraction of the population due to the heterogeneity of the disease. It is still overwhelmingly challenging to match a patient with the appropriate available therapy for the optimal outcome. This dissertation work focuses on using biomedical informatics approaches to development of pathwaybased biomarkers to predict personalized drug response in breast cancer and assessment of feasibility integrating such biomarkers in current electronic health records to better implement genomics-based personalized medicine. The uncontrolled proliferation in breast cancer is frequently driven by HER2/PI3K/AKT/mTOR pathway. In this pathway, the AKT node plays an important role in controlling the signal transduction. In normal breast cells, the proliferation of cells is tightly maintained at a stable rate via AKT. However, in cancer, the balance is disrupted by amplification of the upstream growth factor receptors (GFR) such as HER2, IGF1R and/or deleterious mutations in PTEN, PI3KCA. Overexpression of AKT leads to increased proliferation and decreased apoptosis and autophagy, leading to cancer. Often these known amplifications and the mutation status associated with the disease progression are used as biomarkers for determining targeting therapies. However, downstream known or unknown mutations and activations in the pathways, crosstalk iv between the pathways, can make the targeted therapies ineffective. For example, one third of HER2 amplified breast cancer patients do not respond to HER2-targeting therapies such as trastuzumab, possibly due to downstream PTEN loss of mutation or PIK3CA mutations. To identify pathway aberration with better sensitivity and specificity, I first developed gene-expression-based pathway biomarkers that can identify the deregulation status of the pathway activation status in the sample of interest. Second, I developed drug response prediction models primarily based on the pathway activity, breast cancer subtype, proteomics and mutation data. Third, I assessed the feasibility of including gene expression data or transcriptomics data in current electronic health record so that we can implement such biomarkers in routine clinical care

    Uncovering a neurological protein signature for severe COVID-19

    Get PDF
    Coronavirus disease of 2019 (COVID-19), caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has sparked a global pandemic with severe complications and high morbidity rate. Neurological symptoms in COVID-19 patients, and neurological sequelae post COVID-19 recovery have been extensively reported. Yet, neurological molecular signature and signaling pathways that are affected in the central nervous system (CNS) of COVID-19 severe patients remain still unknown and need to be identified. Plasma samples from 49 severe COVID-19 patients, 50 mild COVID-19 patients, and 40 healthy controls were subjected to Olink proteomics analysis of 184 CNS-enriched proteins. By using a multi-approach bioinformatics analysis, we identified a 34-neurological protein signature for COVID-19 severity and unveiled dysregulated neurological pathways in severe cases. Here, we identified a new neurological protein signature for severe COVID-19 that was validated in different independent cohorts using blood and postmortem brain samples and shown to correlate with neurological diseases and pharmacological drugs. This protein signature could potentially aid the development of prognostic and diagnostic tools for neurological complications in post-COVID-19 convalescent patients with long term neurological sequelae

    Immunomic and transcriptomic profiling of the immune response to gluten exposure in celiac disease

    Get PDF
    Celiac disease (CD) is an immune-mediated gastrointestinal disease that is precipitated by ingestion of dietary gluten, a protein found in wheat, barley and rye. The development of CD almost always requires genetic predisposition with the patients carrying either the DQ2 or DQ8 HLA-DQ alleles. As these genetic alleles are prevalent also in the healthy population, the main factors involved in CD pathogenesis, gluten and the genetic association, are necessary but not sufficient for CD development. Thus, the initial milieu of factors that lead to CD pathogenesis is still not completely understood, with current research suggesting other yet unidentified factors. The complexity of CD is also mirrored in its manifestation, with CD patients having symptoms ranging from gastrointestinal disorders to system-wide extra-intestinal manifestations, including skin and neurological conditions. It can also be asymptomatic. Consequently, CD misdiagnosis or late diagnosis is prevalent. Thus, further investigation of the immune response in CD is needed to better shed light on the intricacies of the cell and molecular changes involved, as well as to provide better diagnostic and therapeutic options. The advent and application of high throughput sequencing at the beginning of the last decade provided the opportunity to study the immune response in CD on an unprecedented scale. Particularly, with immune repertoire sequencing (RepSeq) and genome wide RNA sequencing (RNAseq), as well as the development of bioinformatics analysis methods, we considered the possibility of investigating the effect of gluten exposure in CD at a systemic level. The aim of this work was then to utilize RepSeq and RNAseq to characterize the global immunological and transcriptomic changes that occur in CD during in vivo gluten exposure. We also aimed to develop new computational methodologies for mining immune repertoire datasets to identify gluten associated T-cell receptor (TCR) clonotypes. At the beginning of our first study, which examined the global immune repertoire, there were few groundbreaking studies that had reported immunogenic gluten peptide-specific T-cell receptors in CD patients. However, as these studies used tetramer assays that allowed investigation of only a handful of gluten peptides at a time, they largely ignored the repertoire-wide immune response dynamics and the repertoire of T-cell receptors induced by gluten that may target not just gluten peptides but other antigens possibly relevant in CD. With study I, some of these shortcomings were addressed by using RepSeq to study the gluten-exposed global repertoire in the blood and gut of CD patients in an unbiased manner. The study showed that gluten exposure leads to increased TCR sharing in both blood and gut between unrelated CD patients, suggesting that the public component of the TCR immune response is important in CD. In addition, we identified particular TCR clonotypes that were induced by gluten exposure through the bioinformatics pipeline developed for differential abundance analysis in this study. The identified gluten-induced TCR clonotypes included novel as well as previously reported gluten-peptide binding TCRs, indicating that in spite of the immense diversity of the total immune repertoire, it was possible to utilize RepSeq to identify CD relevant clonotypes computationally without necessarily knowing their targeted antigen. In study I, the limited sharing of TCRs across individuals necessitated the comparison of TCR abundances within an individual (across different time points) or across-individuals, but only using the small set of public TCRs that were seen in multiple individuals. In study II, a comprehensive bioinformatic method that allows direct population level comparison of RepSeq datasets in two conditions for the identification of both public and private condition-associated TCRs was developed. The method relies on the assumption that private TCRs that are specific to an antigen, for example gluten peptide, are likely to have high similarity in their sequence to public TCRs targeting the same antigen and thus could be detected by proxy. It also assumes that such immune sequence components needed to mount an immune response to an antigen are likely to be shared across individuals, at least to a degree that may prove useful for computational identification. By dissecting the immune repertoire into clusters of TCRs with similar kmer composition and finding shared clusters of TCRs with similar kmer composition across individuals, the method facilitates the comparison of clonal abundances between condition groups and the identification of condition-relevant TCRs. The method was applied on CD RepSeq datasets from study I and successfully identified gluten-induced clonotypes, with TRBV-gene usage and positional amino acid usage patterns similar to known gluten-specific clonotypes. Overall, development of the method and its application on CD demonstrated that direct cross-individual comparison of immune repertoires for identification of disease relevant TCRs was possible, paving the way for direct investigation of the TCR immune response at the population-level, without necessarily knowing all the antigens targeted in an autoimmune disease like CD. In the final study, RNAseq was utilized to investigate the genome-wide transcriptional changes in the PBMC of CD patients, which showed that a short 3-day gluten exposure was enough to induce distinct transcriptional profile in patients. Importantly, this study identified genes with persistently altered expression and biological pathways with persistently perturbed regulation in CD patient PBMC, regardless of a long period of treatment with gluten-free diet. This study also suggested new candidate genes for known CD linked and/or associated genetic loci 19p13.11 and 21q22.3. In conclusion, this thesis developed new bioinformatic methods for the analysis of high throughput TCR immune repertoire datasets and the identification of condition-relevant clonotypes and applied the method on CD patient immune repertoires to identify gluten induced clonotypes. The thesis also provides several new insights into the global immune and transcriptional signatures associated with gluten exposure in CD. The methods and findings in this thesis have potential future use in CD disease stratification, diagnosis, therapy, and monitoring.Keliakia on immuunivälitteinen maha-suolikanavan sairaus, jonka aiheuttaa vehnässä, ohrassa ja rukiissa esiintyvä gluteeni. Keliakian kehittyminen vaatii melkein aina geneettisen alttiuden; potilaat kantavat joko HLA-DQ-geenin alleelia DQ2 tai DQ8. Koska nämä alleelit ovat yleisiä myös terveessä populaatiossa, tärkeimmät keliakian patogeneesiin liittyvät tekijät, gluteeni ja perintötekijät, ovat välttämättömiä, mutta eivät riittäviä keliakian kehittymiselle. Näin ollen kaikkia keliakian kehittymiseen johtavia tekijöitä ei vieläkään täysin ymmärretä, ja nykyiset tutkimukset viittaavat muihin, vielä tunnistamattomiin tekijöihin. Keliakian monimutkaisuus näkyy myös sen ilmenemismuodoissa: keliaakikoilla voi olla oireita aina maha-suolikanavan häiriöistä laajempiin suolen ulkopuolisiin ilmenemismuotoihin, mukaan lukien iho- ja neurologiset sairaudet. Tauti voi olla myös oireeton. Siten väärä tai viivästynyt diagnoosi on yleistä. Keliakian immuunivasteen lisätutkimusta tarvitaan, jotta voidaan paremmin selvittää siihen liittyviä monimutkaisia solu- ja molekyylimuutoksia sekä kehittää parempia diagnostisia ja terapeuttisia vaihtoehtoja. Uuden sukupolven sekvensointimenetelmien kehitys ja käyttö viime vuosikymmenen alussa on mahdollistanut keliakian immuunivasteen tutkimuksen ennennäkemättömässä mittakaavassa. Erityisesti immunorepertuaarisekvensointi (RepSeq), genomin laajuinen RNA-sekvensointi (RNAseq) sekä bioinformatiikan analyysimenetelmien kehittäminen mahdollisti tämän systemaattisen tutkimuksemme gluteenialtistuksen vaikutuksista keliakiassa. Tämän työn tavoitteena oli hyödyntää RepSeqiä ja RNAseq:ia löytämään ne immunologiset ja geeniekspressiotasojen muutokset, joita esiintyy keliakiassa in vivo gluteenialtistuksen aikana. Kehitimme myös uusia laskennallisia menetelmiä immunorepertuaaridatan louhintaan gluteenia tunnistavien T-solureseptorien (TCR) klonotyyppien tunnistamiseksi. Tutkimuksemme alussa keliakian immunogeenisistä gluteenispesifisistä T-solureseptoreista oli tehty vasta muutamia uraauurtavia tutkimuksia tetrameerimääritysten avulla, jotka kuitenkin mahdollistivat vain yksittäisten gluteenipeptidien tutkimisen kerrallaan. Siten ne jättivät suurelta osin huomiotta koko laajemman immuunivasteen dynamiikan ja gluteenin indusoimien T-solureseptorien repertuaarin, joka saattaa kohdistua gluteenipeptidien lisäksi myös muihin keliakian kannalta merkityksellisiin antigeeneihin. Osatyössä I tätä tutkittiin käyttämällä RepSeq menetelmää valikoimattomasti koko veren ja suoliston repertuaarikirjolle gluteenialtistuksen yhteydessä. Tutkimus osoitti, että gluteenialtistus johtaa potilaiden kesken samankaltaisten T-solureseptoreiden lisääntymiseen sekä veressä että suolistossa, mikä viittaa siihen, että TCR-immuunivasteen ns. julkinen komponentti on tärkeä keliakiassa. Lisäksi löysimme tiettyjä gluteenialtistuksen indusoimia TCR-klonotyyppejä käyttämällä kehittämäämme bioinformatiikan työkalua toisistaan poikkeavien klonotyyppimäärien tilastolliseen vertailuun. Tunnistetut gluteenin indusoimat TCR-klonotyypit sisälsivät sekä uusia, että aiemmin raportoituja gluteenipeptidejä sitovia reseptoreita. Tämä osoittaa, että koko repertuaarin valtavasta monimuotoisuudesta huolimatta RepSeqillä oli mahdollista tunnistaa keliakian kannalta merkityksellisiä klonotyyppejä laskennallisesti, ilman tarkkaa tietoa spesifisistä antigeeneistä. Tutkimuksessa I TCR-klonotyyppien identtisyys yksilöiden välillä mahdollisti TCR-määrien vertailun saman yksilön sisällä (eri ajankohtina) tai eri yksilöiden välillä, mutta analyyseissä käytettiin vain pientä joukkoa niitä julkisia T-solu-reseptoreita, jotka havaittiin useilla yksilöillä. Tutkimuksessa II kehitettiin kattava bioinformatiikan menetelmä, joka mahdollistaa Repseq-aineistojen suoran populaatiotason vertailun sekä julkisten että yksityisten tautiin tai altistukseen liittyvien T-solureseptoreiden tunnistamiseksi. Menetelmä perustuu oletukseen, että yksityiset reseptorit, jotka ovat spesifisiä antigeenille kuten gluteenipeptidille, ovat todennäköisesti sekvensseiltään samankaltaisia julkisten reseptorien kanssa, jotka kohdistuvat samaan antigeeniin, ja siten ne voidaan löytää sen perusteella. Menetelmässä oletetaan myös, että sellaiset immuunisekvenssikomponentit, joita tarvitaan immuunivasteen muodostamiseen antigeenille, jaetaan todennäköisesti yksilöiden kesken ainakin siinä määrin, että sitä voidaan hyödyntää laskennallisessa tunnistamisessa. Jakamalla immunorepertuaari TCR-klustereihin, joilla on samanlainen kmer-koostumus, ja löytämällä jaettuja TCR-klustereita, joilla on samanlainen kmer-koostumus, menetelmä mahdollistaa kloonimäärien vertailun ryhmien välillä ja esimerkiksi taudin tai altistuksen kannalta merkityksellisten TCR:ien tunnistamisen. Menetelmää sovellettiin keliakian RepSeq dataan tutkimuksesta I, ja sillä tunnistettiin onnistuneesti gluteenin indusoimia klonotyyppejä, joissa TRBV-geenin käyttö ja aminohappojen sijainnit olivat samanlaisia tunnettujen gluteenispesifisten klonotyyppien kanssa. Kaiken kaikkiaan menetelmän kehittäminen ja sen soveltaminen keliakiaan osoitti, että immuunorepertuaarien suora ristiinvertailu taudin kannalta merkityksellisten T-solureseptoreiden tunnistamiseksi oli mahdollista, mikä viitoittaa tietä TCR-immuunivasteen suoralle tutkimukselle populaatiotasolla ilman että kaikkia autoimmuunitautiin liittyviä antigeenejä tarvitsee tuntea. Viimeisessä osatyössä käytettiin RNAseq menetelmää genominlaajuisten transkriptiomuutosten tutkimiseen keliakia-potilaiden veren mononukleaarisoluissa. Tutkimus osoitti, että lyhyt kolmen päivän gluteenialtistus riitti indusoimaan selvän transkriptioprofiilin muutoksen potilailla. Tutkimuksessa tunnistettiin geenejä, joiden ilmentyminen oli pysyvästi muuttunutta, sekä biologisia reittejä, joiden säätely on pysyvästi häiriintynyt keliaakikoiden valkosoluissa, riippumatta pitkästä hoidosta gluteenittomalla ruokavaliolla. Tutkimus löysi myös uusia ehdokasgeenejä tunnetuille keliakiaan kytkeytyville ja/tai assosioituville geenilokuksille 19p13.11 ja 21q22.3. Yhteenvetona voidaan todeta, että väitöskirjatyössä kehitettiin uusia bioinformaattisia menetelmiä massiivisten TCR-immunorepertuaariaineistojen analysointiin ja esimerkiksi tautien kannalta merkittävien klonotyyppien tunnistamiseen, ja sovellettiin menetelmää keliakia-potilaiden immunorepertuaareihin gluteenin indusoimien klonotyyppien tunnistamiseksi. Väitöskirja tarjoaa myös useita uusia havaintoja keliakian gluteenialtistukseen liittyvistä immuuni- ja transkriptioprofiileista. Tämän väitöskirjatyön menetelmistä ja tuloksista on tulevaisuudessa potentiaalisesti hyötyä keliakian diagnosoinnissa, hoidossa ja seurannassa

    Evaluation of dormancy states in cancer and associated therapeutic opportunities

    Get PDF
    Tumour mass dormancy and cancer cell quiescence represent the two facets of cancer dormancy and play key roles in cancer development and progression. Quiescence describes the reversible, proliferative arrest of individual cancer cells that has been observed as a contributing factor of resistance to chemotherapy and other treatments targeting cycling cells. In contrast, tumour mass dormancy describes the state of no net tumour growth, which can arise due to inadequate tumour vascularisation or anti- tumour immune response, during which tumours can acquire additional mutations and establish a microenvironment permissive for growth. Currently, both dormancy states remain poorly characterised. This thesis presents computational frameworks for evaluating the two states and comprehensively profiles their abundance and associated genomic and cellular features across 31 solid cancers from the Cancer Genome Atlas. Using machine learning approaches, I demonstrate that cancer cell quiescence preferentially arises in less mutated tumours with intact TP53 and DNA damage repair pathways. I also highlight novel genomic dependencies, such as CEP89 amplification, which drive an impairment of quiescence. Similarly, mutations within CASP8 and HRAS oncogenes are shown to be enriched and positively selected in samples with tumour mass dormancy. I also highlight an association between APOBEC mutagenesis and both dormancy states. Moreover, tumour mass dormancy is shown to be associated with infiltration with macrophages and cytotoxic and regulatory T cells but a decreased infiltration with Th17 cells. Lastly, using single-cell data, I demonstrate that quiescence underlies resistance to a wide range of therapies, including treatments targeting cell cycle regulation, proliferative kinase signalling and epigenetic regulation. Ultimately, this analysis sheds light on the underlying biology of cancer dormancy states, potentially highlighting vulnerabilities that can be targeted in the clinic. It also provides a transcriptional signature of therapy-tolerant quiescent cells that could be explored further in the clinic to monitor patient therapy response

    Machine Learning Models for Deciphering Regulatory Mechanisms and Morphological Variations in Cancer

    Get PDF
    The exponential growth of multi-omics biological datasets is resulting in an emerging paradigm shift in fundamental biological research. In recent years, imaging and transcriptomics datasets are increasingly incorporated into biological studies, pushing biology further into the domain of data-intensive-sciences. New approaches and tools from statistics, computer science, and data engineering are profoundly influencing biological research. Harnessing this ever-growing deluge of multi-omics biological data requires the development of novel and creative computational approaches. In parallel, fundamental research in data sciences and Artificial Intelligence (AI) has advanced tremendously, allowing the scientific community to generate a massive amount of knowledge from data. Advances in Deep Learning (DL), in particular, are transforming many branches of engineering, science, and technology. Several of these methodologies have already been adapted for harnessing biological datasets; however, there is still a need to further adapt and tailor these techniques to new and emerging technologies. In this dissertation, we present computational algorithms and tools that we have developed to study gene-regulation and cellular morphology in cancer. The models and platforms that we have developed are general and widely applicable to several problems relating to dysregulation of gene expression in diseases. Our pipelines and software packages are disseminated in public repositories for larger scientific community use. This dissertation is organized in three main projects. In the first project, we present Causal Inference Engine (CIE), an integrated platform for the identification and interpretation of active regulators of transcriptional response. The platform offers visualization tools and pathway enrichment analysis to map predicted regulators to Reactome pathways. We provide a parallelized R-package for fast and flexible directional enrichment analysis to run the inference on custom regulatory networks. Next, we designed and developed MODEX, a fully automated text-mining system to extract and annotate causal regulatory interaction between Transcription Factors (TFs) and genes from the biomedical literature. MODEX uses putative TF-gene interactions derived from high-throughput ChIP-Seq or other experiments and seeks to collect evidence and meta-data in the biomedical literature to validate and annotate the interactions. MODEX is a complementary platform to CIE that provides auxiliary information on CIE inferred interactions by mining the literature. In the second project, we present a Convolutional Neural Network (CNN) classifier to perform a pan-cancer analysis of tumor morphology, and predict mutations in key genes. The main challenges were to determine morphological features underlying a genetic status and assess whether these features were common in other cancer types. We trained an Inception-v3 based model to predict TP53 mutation in five cancer types with the highest rate of TP53 mutations. We also performed a cross-classification analysis to assess shared morphological features across multiple cancer types. Further, we applied a similar methodology to classify HER2 status in breast cancer and predict response to treatment in HER2 positive samples. For this study, our training slides were manually annotated by expert pathologists to highlight Regions of Interest (ROIs) associated with HER2+/- tumor microenvironment. Our results indicated that there are strong morphological features associated with each tumor type. Moreover, our predictions highly agree with manual annotations in the test set, indicating the feasibility of our approach in devising an image-based diagnostic tool for HER2 status and treatment response prediction. We have validated our model using samples from an independent cohort, which demonstrates the generalizability of our approach. Finally, in the third project, we present an approach to use spatial transcriptomics data to predict spatially-resolved active gene regulatory mechanisms in tissues. Using spatial transcriptomics, we identified tissue regions with differentially expressed genes and applied our CIE methodology to predict active TFs that can potentially regulate the marker genes in the region. This project bridged the gap between inference of active regulators using molecular data and morphological studies using images. The results demonstrate a significant local pattern in TF activity across the tissue, indicating differential spatial-regulation in tissues. The results suggest that the integrative analysis of spatial transcriptomics data with CIE can capture discriminant features and identify localized TF-target links in the tissue

    Leveraging single-cell genomics to uncover clinical and preclinical responses to cancer immunotherapy

    Get PDF
    Immune checkpoint inhibitors (ICIs) provide durable clinical responses in about 20% of cancer patients, but have been largely ineffective for non-immunogenic cancers that lack intratumoral T cells. Most tumors have somatic mutations that encode for mutant proteins that are tumor-specific and not expressed on normal cells (termed neoantigens). Cancers, such as melanoma, with the highest mutational burdens are more likely to respond to single agent ICIs. However, most cancers, including pancreatic ductal adenocarcinoma (PDAC), have lower mutational loads, resulting in fewer T cells infiltrating the tumor. Studies have previously demonstrated that an allogeneic GM-CSF-based vaccine enhances T cell infiltration into human pancreatic cancer. Recent work with Panc02 cells, which express around 60 neoantigens similar to human PDAC, showed that PancVAX, a neoantigen-targeted vaccine, when paired with immune modulators cleared tumors in Panc02-bearing mice. This data suggests that cancer vaccines targeting tumor neoantigens induce neoepitope-specific T cells, which can be further activated by ICIs, leading to tumor rejection. Currently, the impact of ICIs and neoantigen-targeted vaccines on immune cell expression states and the underlying mechanism of therapeutic response remains poorly defined. Comprehensive characterization of responding immune cells, particularly T cells, will be critical in understanding mechanisms of response and providing a rationale for combinatorial therapies. In this work, we develop innovative computational methods and analysis pipelines to analyze the tumor-immune microenvironment at single-cell resolution. We establish an algorithm to quantify differential heterogeneity in single-cell RNA-seq data, demonstrate the use of non-negative matrix factorization and transfer learning algorithms to identify previously unknown and conserved ICI responses between species, and develop a novel algorithm to physicochemically compare single-cell T cell receptor sequences. We leverage these methods in various contexts to yield new insight into the biological mechanisms underlying positive immunotherapeutic responses in diverse tumor types, including PDAC
    corecore