1,152 research outputs found

    Deregulation upon DNA damage revealed by joint analysis of context-specific perturbation data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Deregulation between two different cell populations manifests itself in changing gene expression patterns and changing regulatory interactions. Accumulating knowledge about biological networks creates an opportunity to study these changes in their cellular context.</p> <p>Results</p> <p>We analyze re-wiring of regulatory networks based on cell population-specific perturbation data and knowledge about signaling pathways and their target genes. We quantify deregulation by merging regulatory signal from the two cell populations into one score. This joint approach, called JODA, proves advantageous over separate analysis of the cell populations and analysis without incorporation of knowledge. JODA is implemented and freely available in a Bioconductor package 'joda'.</p> <p>Conclusions</p> <p>Using JODA, we show wide-spread re-wiring of gene regulatory networks upon neocarzinostatin-induced DNA damage in Human cells. We recover 645 deregulated genes in thirteen functional clusters performing the rich program of response to damage. We find that the clusters contain many previously characterized neocarzinostatin target genes. We investigate connectivity between those genes, explaining their cooperation in performing the common functions. We review genes with the most extreme deregulation scores, reporting their involvement in response to DNA damage. Finally, we investigate the indirect impact of the ATM pathway on the deregulated genes, and build a hypothetical hierarchy of direct regulation. These results prove that JODA is a step forward to a systems level, mechanistic understanding of changes in gene regulation between different cell populations.</p

    Machine Learning Models for Deciphering Regulatory Mechanisms and Morphological Variations in Cancer

    Get PDF
    The exponential growth of multi-omics biological datasets is resulting in an emerging paradigm shift in fundamental biological research. In recent years, imaging and transcriptomics datasets are increasingly incorporated into biological studies, pushing biology further into the domain of data-intensive-sciences. New approaches and tools from statistics, computer science, and data engineering are profoundly influencing biological research. Harnessing this ever-growing deluge of multi-omics biological data requires the development of novel and creative computational approaches. In parallel, fundamental research in data sciences and Artificial Intelligence (AI) has advanced tremendously, allowing the scientific community to generate a massive amount of knowledge from data. Advances in Deep Learning (DL), in particular, are transforming many branches of engineering, science, and technology. Several of these methodologies have already been adapted for harnessing biological datasets; however, there is still a need to further adapt and tailor these techniques to new and emerging technologies. In this dissertation, we present computational algorithms and tools that we have developed to study gene-regulation and cellular morphology in cancer. The models and platforms that we have developed are general and widely applicable to several problems relating to dysregulation of gene expression in diseases. Our pipelines and software packages are disseminated in public repositories for larger scientific community use. This dissertation is organized in three main projects. In the first project, we present Causal Inference Engine (CIE), an integrated platform for the identification and interpretation of active regulators of transcriptional response. The platform offers visualization tools and pathway enrichment analysis to map predicted regulators to Reactome pathways. We provide a parallelized R-package for fast and flexible directional enrichment analysis to run the inference on custom regulatory networks. Next, we designed and developed MODEX, a fully automated text-mining system to extract and annotate causal regulatory interaction between Transcription Factors (TFs) and genes from the biomedical literature. MODEX uses putative TF-gene interactions derived from high-throughput ChIP-Seq or other experiments and seeks to collect evidence and meta-data in the biomedical literature to validate and annotate the interactions. MODEX is a complementary platform to CIE that provides auxiliary information on CIE inferred interactions by mining the literature. In the second project, we present a Convolutional Neural Network (CNN) classifier to perform a pan-cancer analysis of tumor morphology, and predict mutations in key genes. The main challenges were to determine morphological features underlying a genetic status and assess whether these features were common in other cancer types. We trained an Inception-v3 based model to predict TP53 mutation in five cancer types with the highest rate of TP53 mutations. We also performed a cross-classification analysis to assess shared morphological features across multiple cancer types. Further, we applied a similar methodology to classify HER2 status in breast cancer and predict response to treatment in HER2 positive samples. For this study, our training slides were manually annotated by expert pathologists to highlight Regions of Interest (ROIs) associated with HER2+/- tumor microenvironment. Our results indicated that there are strong morphological features associated with each tumor type. Moreover, our predictions highly agree with manual annotations in the test set, indicating the feasibility of our approach in devising an image-based diagnostic tool for HER2 status and treatment response prediction. We have validated our model using samples from an independent cohort, which demonstrates the generalizability of our approach. Finally, in the third project, we present an approach to use spatial transcriptomics data to predict spatially-resolved active gene regulatory mechanisms in tissues. Using spatial transcriptomics, we identified tissue regions with differentially expressed genes and applied our CIE methodology to predict active TFs that can potentially regulate the marker genes in the region. This project bridged the gap between inference of active regulators using molecular data and morphological studies using images. The results demonstrate a significant local pattern in TF activity across the tissue, indicating differential spatial-regulation in tissues. The results suggest that the integrative analysis of spatial transcriptomics data with CIE can capture discriminant features and identify localized TF-target links in the tissue

    Statistical Methods in Integrative Genomics

    Get PDF
    Statistical methods in integrative genomics aim to answer important biology questions by jointly analyzing multiple types of genomic data (vertical integration) or aggregating the same type of data across multiple studies (horizontal integration). In this article, we introduce different types of genomic data and data resources, and then review statistical methods of integrative genomics, with emphasis on the motivation and rationale of these methods. We conclude with some summary points and future research directions

    Perturbation Detection Through Modeling of Gene Expression on a Latent Biological Pathway Network: A Bayesian hierarchical approach

    Full text link
    Cellular response to a perturbation is the result of a dynamic system of biological variables linked in a complex network. A major challenge in drug and disease studies is identifying the key factors of a biological network that are essential in determining the cell's fate. Here our goal is the identification of perturbed pathways from high-throughput gene expression data. We develop a three-level hierarchical model, where (i) the first level captures the relationship between gene expression and biological pathways using confirmatory factor analysis, (ii) the second level models the behavior within an underlying network of pathways induced by an unknown perturbation using a conditional autoregressive model, and (iii) the third level is a spike-and-slab prior on the perturbations. We then identify perturbations through posterior-based variable selection. We illustrate our approach using gene transcription drug perturbation profiles from the DREAM7 drug sensitivity predication challenge data set. Our proposed method identified regulatory pathways that are known to play a causative role and that were not readily resolved using gene set enrichment analysis or exploratory factor models. Simulation results are presented assessing the performance of this model relative to a network-free variant and its robustness to inaccuracies in biological databases

    ๋™์‹œ์กฐ์ ˆ ์œ ์ „์  ์ƒํ˜ธ์ž‘์šฉ ๋ฐœ๊ตด์„ ์œ„ํ•œ ํ•˜์ดํผ๊ทธ๋ž˜ํ”„ ๋ชจ๋ธ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ํ˜‘๋™๊ณผ์ • ์ƒ๋ฌผ์ •๋ณดํ•™์ „๊ณต, 2014. 2. ์žฅ๋ณ‘ํƒ.A comprehensive understanding of biological systems requires the analysis of higher-order interactions among many genomic factors. Various genomic factors cooperate to affect biological processes including cancer occurrence, progression and metastasis. However, the complexity of genomic interactions presents a major barrier to identifying their co-regulatory roles and functional effects. Thus, this dissertation addresses the problem of analyzing complex relationships among many genomic factors in biological processes including cancers. We propose a hypergraph approach for modeling, learning and extracting: explicitly modeling higher-order genomic interactions, efficiently learning based on evolutionary methods, and effectively extracting biological knowledge from the model. A hypergraph model is a higher-order graphical model explicitly representing complex relationships among many variables from high-dimensional data. This property allows the proposed model to be suitable for the analysis of biological and medical phenomena characterizing higher-order interactions between various genomic factors. This dissertation proposes the advanced hypergraph-based models in terms of the learning methods and the model structures to analyze large-scale biological data focusing on identifying co-regulatory genomic interactions on a genome-wide level. We introduce an evolutionary approach based on information-theoretic criteria into the learning mechanisms for efficiently searching a huge problem space reflecting higher-order interactions between factors. This evolutionary learning is explained from the perspective of a sequential Bayesian sampling framework. Also, a hierarchy is introduced into the hypergraph model for modeling hierarchical genomic relationships. This hierarchical structure allows the hypergraph model to explicitly represent gene regulatory circuits as functional blocks or groups across the level of epigenetic, transcriptional, and post-transcriptional regulation. Moreover, the proposed graph-analyzing method is able to grasp the global structures of biological systems such as genomic modules and regulatory networks by analyzing the learned model structures. The proposed model is applied to analyzing cancer genomics considered as a major topic in current biology and medicine. We show that the performance of our model competes with or outperforms state-of-the-art models on multiple cancer genomic data. Furthermore, the propose model is capable of discovering new or hidden patterns as candidates of potential gene regulatory circuits such as gene modules, miRNA-mRNA networks, and multiple genomic interactions, associated with the specific cancer. The results of these analysis can provide several crucial evidences that can pave the way for identifying unknown functions in the cancer system. The proposed hypergraph model will contribute to elucidating core regulatory mechanisms and to comprehensive understanding of biological processes including cancers.Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .i 1 Introduction 1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Problems to be Addressed . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 The Proposed Approach and its Contribution . . . . . . . . . . . . . . 4 1.4 Organization of the Dissertation . . . . . . . . . . . . . . . . . . . . . 6 2 Related Work 2.1 Analysis of Co-Regulatory Genomic Interactions from Omics Data . . 9 2.2 Probabilistic Graphical Models for Biological Problems . . . . . . . . 11 2.2.1 Bayesian Networks . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2.2 Markov Random Fields . . . . . . . . . . . . . . . . . . . . . . 13 2.2.3 Hidden Markov Models . . . . . . . . . . . . . . . . . . . . . . 15 2.3 Higher-order Graphical Models for Biological Problems . . . . . . . . 16 2.3.1 Higher-Order Models . . . . . . . . . . . . . . . . . . . . . . . 16 2.3.2 Hypergraphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3 Hypergraph Classifiers for Identifying Prognostic Modules in Cancer 3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.2 Analyzing Gene Modules for Cancer Prognosis Prediction . . . . . . 24 3.3 Hypergraph Classifiers for Identifying Cancer Gene Modules . . . . 26 3.3.1 Hypergraph Classifiers . . . . . . . . . . . . . . . . . . . . . . 26 3.3.2 Bayesian Evolutionary Algorithm . . . . . . . . . . . . . . . . 27 3.3.3 Bayesian Evolutionary Learning for Hypergraph Classifiers . 29 3.4 Predicting Cancer Clinical Outcomes Based on Gene Modules . . . . 34 3.4.1 Data and Experimental Settings . . . . . . . . . . . . . . . . . 34 3.4.2 Prediction Performance . . . . . . . . . . . . . . . . . . . . . . 36 3.4.3 Model Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.4.4 Identification of Prognostic Gene Modules . . . . . . . . . . . 44 3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4 Hypergraph-based Models for Constructing Higher-Order miRNA-mRNA Interaction Networks in Cancer 4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.2 Analyzing Relationships between miRNAs and mRNAs from Heterogeneous Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.3 Hypergraph-based Models for Identifying miRNA-mRNA Interactions 57 4.3.1 Hypergraph-based Models . . . . . . . . . . . . . . . . . . . . 57 4.3.2 Learning Hypergraph-based Models . . . . . . . . . . . . . . . 61 4.3.3 Building Interaction Networks from Hypergraphs . . . . . . . 64 4.4 Constructing miRNA-mRNA Interaction Networks Based on Higher- Order Relationships . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.4.1 Data and Experimental Settings . . . . . . . . . . . . . . . . . 66 4.4.2 Classification Performance . . . . . . . . . . . . . . . . . . . . 68 4.4.3 Model Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 70 CONTENTS iii 4.4.4 Constructed Higher-Order miRNA-mRNA Interaction Networks in Prostate Cancer . . . . . . . . . . . . . . . . . . . . . 74 4.4.5 Functional Analysis of the Constructed Interaction Networks 78 4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 5 Hierarchical Hypergraphs for Identifying Higher-Order Genomic Interactions in Multilevel Regulation 5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 5.2 Analyzing Epigenetic and Genetic Interactions from Multiple Genomic Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 5.3 Hierarchical Hypergraphs for Identifying Epigenetic and Genetic Interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 5.3.1 Hierarchical Hypergraphs . . . . . . . . . . . . . . . . . . . . . 92 5.3.2 Learning Hierarchical Hypergraphs . . . . . . . . . . . . . . . 95 5.4 Identifying Higher-Order Genomic Interactions in Multilevel Regulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 5.4.1 Data and Experimental Settings . . . . . . . . . . . . . . . . . 100 5.4.2 Identified Higher-Order miRNA-mRNA Interactions Induced by DNA Methylation in Ovarian Cancer . . . . . . . . . . . . 102 5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 6 Concluding Remarks 6.1 Summary of the Dissertation . . . . . . . . . . . . . . . . . . . . . . . 107 6.2 Directions for Further Research . . . . . . . . . . . . . . . . . . . . . . 109 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 ์ดˆ๋ก . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132Docto

    Identification of phenotype-specific networks from paired gene expression-cell shape imaging data

    Get PDF
    The morphology of breast cancer cells is often used as an indicator of tumor severity and prognosis. Additionally, morphology can be used to identify more fine-grained, molecular developments within a cancer cell, such as transcriptomic changes and signaling pathway activity. Delineating the interface between morphology and signaling is important to understand the mechanical cues that a cell processes in order to undergo epithelial-to-mesenchymal transition and consequently metastasize. However, the exact regulatory systems that define these changes remain poorly characterized. In this study, we used a network-systems approach to integrate imaging data and RNA-seq expression data. Our workflow allowed the discovery of unbiased and context-specific gene expression signatures and cell signaling subnetworks relevant to the regulation of cell shape, rather than focusing on the identification of previously known, but not always representative, pathways. By constructing a cell-shape signaling network from shape-correlated gene expression modules and their upstream regulators, we found central roles for developmental pathways such as WNT and Notch, as well as evidence for the fine control of NF-kB signaling by numerous kinase and transcriptional regulators. Further analysis of our network implicates a gene expression module enriched in the RAP1 signaling pathway as a mediator between the sensing of mechanical stimuli and regulation of NF-kB activity, with specific relevance to cell shape in breast cancer

    ์ƒ๋ฌผํ•™์  ์‚ฌ์ „ ์ง€์‹์„ ํ™œ์šฉํ•œ ๊ณ ์ฐจ์›์˜ ๋‹ค์ค‘ ์˜ค๋ฏน์Šค ๊ด€๊ณ„๋ฅผ ์ฐพ๋Š” ์ปดํ“จํ„ฐ ๊ณตํ•™์  ์ ‘๊ทผ ๋ฐฉ๋ฒ•

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2021.8. ๊น€์„ .์„ธํฌ๊ฐ€ ์–ด๋–ป๊ฒŒ ๊ธฐ๋Šฅํ•˜๊ณ  ์™ธ๋ถ€ ์ž๊ทน์— ๋ฐ˜์‘ํ•˜๋Š”์ง€ ์ดํ•ดํ•˜๋Š” ๊ฒƒ์€ ์ƒ๋ฌผํ•™, ์˜ํ•™์—์„œ ๊ฐ€์žฅ ์ค‘์š”ํ•œ ๊ด€์‹ฌ์‚ฌ ์ค‘ ํ•˜๋‚˜์ด๋‹ค. ๊ธฐ์ˆ ์˜ ๋ฐœ์ „์œผ๋กœ ๊ณผํ•™์ž๋“ค์€ ๋‹จ์ผ ์ƒ๋ฌผํ•™์  ์‹คํ—˜์œผ๋กœ ์„ธํฌ์˜ ๋ณ€ํ™”์š”์ธ๋“ค์„ ์‰ฝ๊ฒŒ ์ธก์ •ํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋˜์—ˆ๋‹ค. ์ฃผ๋ชฉํ• ๋งŒํ•œ ์˜ˆ์‹œ๋กœ ๊ฒŒ๋†ˆ ์‹œํ€€์‹ฑ, ์œ ์ „์ž ๋ฐœํ˜„๋Ÿ‰ ์ธก์ •, ์œ ์ „์ž ๋ฐœํ˜„์„ ์กฐ์ ˆํ•˜๋Š” ํ›„์„ฑ ์œ ์ „์ฒด ์ธก์ • ๊ฐ™์€ ๋‹ค์ค‘ ์˜ค๋ฏน์Šค ๋ฐ์ดํ„ฐ๊ฐ€ ์žˆ๋‹ค. ์„ธํฌ์˜ ์ƒํƒœ๋ฅผ ๋” ์ž์„ธํžˆ ์ดํ•ดํ•˜๊ธฐ ์œ„ํ•ด์„œ ๋‹ค์ค‘ ์˜ค๋ฏน์Šค ์กฐ์ ˆ์ž์™€ ์œ ์ „์ž ์‚ฌ์ด์˜ ์กฐ์ ˆ ๊ด€๊ณ„๋ฅผ ์•Œ์•„๋‚ด๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•˜๋‹ค. ํ•˜์ง€๋งŒ ๋‹ค์ค‘ ์˜ค๋ฏน์Šค ์กฐ์ ˆ ๊ด€๊ณ„๋Š” ๋งค์šฐ ๋ณต์žกํ•˜๊ณ  ๋ชจ๋“  ์„ธํฌ ์ƒํƒœ ํŠน์ด์ ์ธ ๊ด€๊ณ„๋ฅผ ์‹คํ—˜์ ์œผ๋กœ ๊ฒ€์ฆํ•˜๋Š” ๊ฒƒ์€ ๋ถˆ๊ฐ€๋Šฅํ•˜๋‹ค. ๋”ฐ๋ผ์„œ, ์„œ๋กœ ๋‹ค๋ฅธ ์œ ํ˜•์˜ ๊ณ ์ฐจ์› ์˜ค๋ฏน์Šค ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ๊ด€๊ณ„๋ฅผ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•œ ํšจ์œจ์ ์ธ ์ปดํ“จํ„ฐ ๊ณตํ•™์  ์ ‘๊ทผ๋ฐฉ๋ฒ•์ด ์š”๊ตฌ๋œ๋‹ค. ์ด๋Ÿฌํ•œ ๊ณ ์ฐจ์› ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•˜๋Š” ํ•œ ๊ฐ€์ง€ ๋ฐฉ๋ฒ•์€ ๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์—์„œ ์„ ๋ณ„๋œ ์œ ์ „์ž์˜ ๊ธฐ๋Šฅ๊ณผ ์˜ค๋ฏน์Šค ๊ฐ„์˜ ๊ด€๊ณ„์™€ ๊ฐ™์€ ์™ธ๋ถ€ ์ƒ๋ฌผํ•™์  ์ง€์‹์„ ํ†ตํ•ฉํ•˜์—ฌ ํ™œ์šฉํ•˜๋Š” ๊ฒƒ์ด๋‹ค. ๋ณธ ๋ฐ•์‚ฌํ•™์œ„ ๋…ผ๋ฌธ์€ ์ƒ๋ฌผํ•™์  ์‚ฌ์ „ ์ง€์‹์„ ํ™œ์šฉํ•˜์—ฌ ๋‹ค์ค‘ ์˜ค๋ฏน์Šค ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ์œ ์ „์ž์˜ ๋ฐœํ˜„์„ ์กฐ์ ˆํ•˜๋Š” ๊ด€๊ณ„๋ฅผ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•œ ์„ธ ๊ฐ€์ง€ ์ปดํ“จํ„ฐ ๊ณตํ•™์ ์ธ ์ ‘๊ทผ๋ฒ•์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ์ฒซ ๋ฒˆ์งธ๋Š” ๋งˆ์ดํฌ๋กœ ์•Œ์—”์—์ด์™€ ์œ ์ „์ž์˜ ์ผ๋Œ€๋‹ค ๊ด€๊ณ„๋ฅผ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•œ ๊ธฐ๋ฒ•์ด๋‹ค. ๋งˆ์ดํฌ๋กœ ์•Œ์—”์—์ด ํ‘œ์  ์˜ˆ์ธก ๋ฌธ์ œ๋Š” ๊ฐ€๋Šฅํ•œ ํ‘œ์  ์œ ์ „์ž์˜ ๊ฐœ์ˆ˜๊ฐ€ ๋„ˆ๋ฌด ๋งŽ์œผ๋ฉฐ ๊ฑฐ์ง“ ์–‘์„ฑ๊ณผ ๊ฑฐ์ง“์Œ์„ฑ์˜ ๋น„์œจ์„ ์กฐ์ ˆํ•ด์•ผ ํ•˜๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋งˆ์ดํฌ๋กœ ์•Œ์—”์—์ด-์œ ์ „์ž์™€ ๋ฐ์ดํ„ฐ์˜ ๋งฅ๋ฝ ์‚ฌ์ด์˜ ์—ฐ๊ด€์„ฑ์„ ๋ฌธํ—Œ ์ง€์‹์„ ํ™œ์šฉํ•˜์—ฌ ๊ฒฐ์ •ํ•˜๊ณ  ๋งˆ์ดํฌ๋กœ ์•Œ์—”์—์ด-์œ ์ „์ž ๊ด€๊ณ„๋ฅผ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•œ ContextMMIA๋ฅผ ๊ฐœ๋ฐœํ•˜์˜€๋‹ค. ContextMMIA๋Š” ํ†ต๊ณ„์  ์œ ์˜์„ฑ๊ณผ ๋ฌธํ—Œ ๊ด€๋ จ์„ฑ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๋งˆ์ดํฌ๋กœ ์•Œ์—”์—์ด-์œ ์ „์ž ๊ด€๊ณ„์˜ ์ ์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•˜์—ฌ ๊ด€๊ณ„์˜ ์šฐ์„ ์ˆœ์œ„๋ฅผ ๊ฒฐ์ •ํ•œ๋‹ค. ์˜ˆํ›„๊ฐ€ ๋‹ค๋ฅธ ์œ ๋ฐฉ์•” ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์‹คํ—˜์—์„œ ContextMMIA๋Š” ์˜ˆํ›„๊ฐ€ ๋‚˜์œ ์œ ๋ฐฉ์•”์—์„œ ํ™œ์„ฑํ™”๋œ ๋งˆ์ดํฌ๋กœ ์•Œ์—”์—์ด-์œ ์ „์ž ๊ด€๊ณ„๋ฅผ ์˜ˆ์ธกํ•˜์˜€๊ณ  ๊ธฐ์กด ์‹คํ—˜์ ์œผ๋กœ ๊ฒ€์ฆ๋œ ๊ด€๊ณ„๊ฐ€ ๋†’์€ ์šฐ์„ ์ˆœ์œ„๋กœ ์˜ˆ์ธก๋˜์—ˆ์œผ๋ฉฐ ํ•ด๋‹น ์œ ์ „์ž๋“ค์ด ์œ ๋ฐฉ์•” ๊ด€๋ จ ๊ฒฝ๋กœ์— ๊ด€์—ฌํ•˜๋Š” ๊ฒƒ์œผ๋กœ ์•Œ๋ ค์กŒ๋‹ค. ๋‘ ๋ฒˆ์งธ๋Š” ์•ฝ๋ฌผ ๋ฐ˜์‘์„ ์ผ์œผํ‚ค๋Š” ์œ ์ „์ž์˜ ๋‹ค๋Œ€์ผ ์กฐ์ ˆ ๊ด€๊ณ„๋ฅผ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•œ ๊ธฐ๋ฒ•์ด๋‹ค. ์•ฝ๋ฌผ ๋ฐ˜์‘ ์˜ˆ์ธก์„ ์œ„ํ•ด์„œ ์•ฝ๋ฌผ ๋ฐ˜์‘ ๋งค๊ฐœ ์œ ์ „์ž๋ฅผ ๊ฒฐ์ •ํ•ด์•ผ ํ•˜๋ฉฐ ์ด๋ฅผ ์œ„ํ•ด 20,000๊ฐœ ์œ ์ „์ž์˜ ๋‹ค์ค‘ ์˜ค๋ฏน์Šค ๋ฐ์ดํ„ฐ๋ฅผ ํ†ตํ•ฉ ๋ถ„์„ํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ํ•„์š”ํ•˜๋‹ค. ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์ €์ฐจ์› ์ž„๋ฒ ๋”ฉ ๋ฐฉ๋ฒ•, ์•ฝ๋ฌผ-์œ ์ „์ž ์—ฐ๊ด€์„ฑ์— ๋Œ€ํ•œ ๋ฌธํ—Œ ์ง€์‹ ๋ฐ ์œ ์ „์ž-์œ ์ „์ž ์ƒํ˜ธ ์ž‘์šฉ ์ง€์‹์„ ํ™œ์šฉํ•˜์—ฌ ์•ฝ๋ฌผ ๋ฐ˜์‘์„ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•œ DRIM์„ ๊ฐœ๋ฐœํ•˜์˜€๋‹ค. DRIM์€ ์˜คํ† ์ธ์ฝ”๋”, ํ…์„œ ๋ถ„ํ•ด, ์•ฝ๋ฌผ-์œ ์ „์ž ์—ฐ๊ด€์„ฑ์„ ์ด์šฉํ•˜์—ฌ ๋‹ค์ค‘ ์˜ค๋ฏน์Šค ๋ฐ์ดํ„ฐ์—์„œ ๋‹ค๋Œ€์ผ ๊ด€๊ณ„๋ฅผ ๊ฒฐ์ •ํ•œ๋‹ค. ๊ฒฐ์ •๋œ ๋งค๊ฐœ ์œ ์ „์ž์˜ ์กฐ์ ˆ ๊ด€๊ณ„๋ฅผ ์œ ์ „์ž-์œ ์ „์ž ์ƒํ˜ธ ์ž‘์šฉ ์ง€์‹๊ณผ ์•ฝ๋ฌผ ๋ฐ˜์‘ ์‹œ๊ณ„์—ด ์œ ์ „์ž ๋ฐœํ˜„ ๋ฐ์ดํ„ฐ์˜ ์ƒํ˜ธ ์ƒ๊ด€๊ด€๊ณ„๋ฅผ ์ด์šฉํ•˜์—ฌ ๊ฒฐ์ •ํ•œ๋‹ค. ์œ ๋ฐฉ์•” ์„ธํฌ์ฃผ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์‹คํ—˜์—์„œ DRIM์€ ๋ผํŒŒํ‹ฐ๋‹™์ด ํ‘œ์ ์œผ๋กœ ํ•˜๋Š” PI3K-Akt ํŒจ์Šค์›จ์ด์— ๊ด€์—ฌํ•˜๋Š” ์œ ์ „์ž๋“ค์˜ ์•ฝ๋ฌผ ๋ฐ˜์‘ ์กฐ์ ˆ ๊ด€๊ณ„๋ฅผ ์˜ˆ์ธกํ•˜์˜€๊ณ  ๋ผํŒŒํ‹ฐ๋‹™ ๋ฐ˜์‘์„ฑ๊ณผ ๊ด€๋ จ๋œ ๋งค๊ฐœ ์œ ์ „์ž๋ฅผ ์˜ˆ์ธกํ•˜์˜€๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์˜ˆ์ธก๋œ ์กฐ์ ˆ ๊ด€๊ณ„๊ฐ€ ์„ธํฌ์ฃผ ํŠน์ด์ ์ธ ํŒจํ„ด์„ ๋ณด์ด๋Š” ๊ฒƒ์„ ํ™•์ธํ•˜์˜€๋‹ค. ์„ธ ๋ฒˆ์งธ๋Š” ์„ธํฌ์˜ ์ƒํƒœ๋ฅผ ์„ค๋ช…ํ•˜๋Š” ์กฐ์ ˆ์ž์™€ ์œ ์ „์ž์˜ ๋‹ค๋Œ€๋‹ค ์กฐ์ ˆ ๊ด€๊ณ„๋ฅผ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•œ ๊ธฐ๋ฒ•์ด๋‹ค. ๋‹ค๋Œ€๋‹ค ๊ด€๊ณ„ ์˜ˆ์ธก์„ ์œ„ํ•ด ๊ด€์ฐฐ๋œ ์œ ์ „์ž ๋ฐœํ˜„ ๊ฐ’๊ณผ ์œ ์ „์ž ์กฐ์ ˆ ๋„คํŠธ์›Œํฌ๋กœ๋ถ€ํ„ฐ ์ถ”์ •๋œ ์œ ์ „์ž ๋ฐœํ˜„ ๊ฐ’ ์‚ฌ์ด์˜ ์ฐจ์ด๋ฅผ ์ธก์ •ํ•˜๋Š” ๋ชฉ์  ํ•จ์ˆ˜๋ฅผ ๋งŒ๋“ค์—ˆ๋‹ค. ๋ชฉ์  ํ•จ์ˆ˜๋ฅผ ์ตœ์†Œํ™”ํ•˜๊ธฐ ์œ„ํ•˜์—ฌ ์กฐ์ ˆ์ธ์ž์™€ ์œ ์ „์ž์˜ ์ˆ˜์— ๋”ฐ๋ผ ๊ธฐํ•˜๊ธ‰์ˆ˜์ ์œผ๋กœ ์ฆ๊ฐ€ํ•˜๋Š” ๊ฒ€์ƒ‰ ๊ณต๊ฐ„์„ ํƒ์ƒ‰ํ•ด์•ผ ํ•œ๋‹ค. ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์กฐ์ ˆ์ž-์œ ์ „์ž ์ƒํ˜ธ ์ž‘์šฉ ์ง€์‹์„ ํ™œ์šฉํ•˜์—ฌ ๋‘ ๊ฐ€์ง€ ์—ฐ์‚ฐ์„ ๋ฐ˜๋ณตํ•˜์—ฌ ์กฐ์ ˆ ๊ด€๊ณ„๋ฅผ ์ฐพ๋Š” ์ตœ์ ํ™” ๊ธฐ๋ฒ•์„ ๊ฐœ๋ฐœํ•˜์˜€๋‹ค. ์ฒซ ๋ฒˆ์งธ ๋‹จ๊ณ„๋Š” ๋„คํŠธ์›Œํฌ์— ๊ฐ„์„ ์„ ์ถ”๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด ๊ฐ•ํ™” ํ•™์Šต ๊ธฐ๋ฐ˜ ํœด๋ฆฌ์Šคํ‹ฑ์„ ํ†ตํ•ด ์กฐ์ ˆ์ž๋ฅผ ์„ ํƒํ•˜๋Š” ๋‹ค๋Œ€์ผ ์œ ์ „์ž ์ค‘์‹ฌ ๊ด€๊ณ„๋ฅผ ํƒ์ƒ‰ํ•˜๋Š” ๋‹จ๊ณ„์ด๋‹ค. ๋‘ ๋ฒˆ์งธ ๋‹จ๊ณ„๋Š” ๋„คํŠธ์›Œํฌ์—์„œ ๊ฐ„์„ ์„ ์ œ๊ฑฐํ•˜๊ธฐ ์œ„ํ•ด ์œ ์ „์ž๋ฅผ ํ™•๋ฅ ์ ์œผ๋กœ ์„ ํƒํ•˜๋Š” ์ผ๋Œ€๋‹ค ์กฐ์ ˆ์ž ์ค‘์‹ฌ ๊ด€๊ณ„๋ฅผ ํƒ์ƒ‰ํ•˜๋Š” ๋‹จ๊ณ„์ด๋‹ค. ์œ ๋ฐฉ์•” ์„ธํฌ์ฃผ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์‹คํ—˜์—์„œ ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์€ ์ด์ „์˜ ์ตœ์ ํ™” ๋ฐฉ๋ฒ•๋ณด๋‹ค ๋” ์ •ํ™•ํ•œ ์œ ์ „์ž ๋ฐœํ˜„๋Ÿ‰ ์ถ”์ •์„ ํ•˜์˜€๊ณ  ์กฐ์ ˆ์ž ๋ฐ ์œ ์ „์ž ๋ฐœํ˜„ ๋ฐ์ดํ„ฐ๋กœ ์œ ๋ฐฉ์•” ์•„ํ˜• ํŠน์ด์  ๋„คํŠธ์›Œํฌ๋ฅผ ๊ตฌ์„ฑํ•˜์˜€๋‹ค. ๋˜ํ•œ, ์œ ๋ฐฉ์•” ์•„ํ˜• ๊ด€๋ จ ์‹คํ—˜ ๊ฒ€์ฆ๋œ ์กฐ์ ˆ ๊ด€๊ณ„๋ฅผ ์˜ˆ์ธกํ•˜์˜€๋‹ค. ์š”์•ฝํ•˜๋ฉด, ๋ณธ ๋ฐ•์‚ฌํ•™์œ„ ๋…ผ๋ฌธ์€ ๋‹ค์ค‘ ์˜ค๋ฏน์Šค ์กฐ์ ˆ์ž์™€ ์œ ์ „์ž์˜ ์‚ฌ์ด์˜ ์ผ๋Œ€๋‹ค, ๋‹ค๋Œ€์ผ, ๋‹ค๋Œ€๋‹ค ๊ด€๊ณ„๋ฅผ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•˜์—ฌ ์ƒ๋ฌผํ•™์  ์ง€์‹์„ ํ™œ์šฉํ•œ ์ปดํ“จํ„ฐ ๊ณตํ•™์  ์ ‘๊ทผ๋ฒ•์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์€ ์ฆ๊ฐ€ํ•˜๊ณ  ์žˆ๋Š” ๋ถ„์ž ์ƒ๋ฌผํ•™ ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„์„ํ•˜์—ฌ ์œ ์ „์ž ์กฐ์ ˆ ์ƒํ˜ธ ์ž‘์šฉ์„ ์ดํ•ดํ•จ์œผ๋กœ์จ ์„ธํฌ ๊ธฐ๋Šฅ์— ๋Œ€ํ•œ ์‹ฌ์ธต์ ์ธ ์ดํ•ด๋ฅผ ๋„์™€์ค„ ์ˆ˜ ์žˆ์„ ๊ฒƒ์œผ๋กœ ๊ธฐ๋Œ€๋œ๋‹ค.Understanding how cells function or respond to external stimuli is one of the most important questions in biology and medicine. Thanks to the advances in instrumental technologies, scientists can routinely measure events within cells in single biological experiments. Notable examples are multi-omics data: sequencing of genomes, quantifications of gene expression, and identification of epigenetic events that regulate expression of genes. In order to better understand cellular mechanisms, it is essential to identify regulatory relationships between multi-omics regulators and genes. However, regulatory relationships are very complex and it is infeasible to validate all condition-specific relationships experimentally. Thus, there is an urgent need for an efficient computational method to extract relationships from different types of high-dimensional omics data. One way to address these high-dimensional data is to incorporate external biological knowledge such as relationships between omics and functions of genes curated in various databases. In my doctoral study, I developed three computational approaches to identify the regulatory relationships from multi-omics data utilizing biological prior knowledge. The first study proposes a method to predict one-to-m relationships between miRNA and genes. The computational challenge of miRNA target prediction is that there are many miRNA target candidates, and the ratio of false positives to false negatives needs to be adjusted. This challenge is addressed by utilizing literature knowledge for determining the association between miRNA-gene and a given context. In this study, I developed ContextMMIA to predict miRNA-gene relationships from miRNA and gene expression data. ContextMMIA computes scores of miRNA-gene relationships based on statistical significance and literature relevance and prioritizes the relationships based on the scores. In experiments on breast cancer data with different prognosis, ContextMMIA predicted differentially activated miRNA-gene relationships in invasive breast cancer. The experimentally verified miRNA-gene relationships were predicted with high priority and those genes are known to be involved in breast cancer-related pathways. The second study proposes a method to predict n-to-one relationships between regulators and gene on drug response. The computational challenge of drug response prediction is how to integrate multi-omics data of 20,000 genes for determining drug response mediator genes. This challenge is addressed by utilizing low-dimensional embedding methods, literature knowledge of drug-gene associations, and gene-gene interaction knowledge. For this problem, I developed DRIM to predict drug response relationships from the multi-omics data and drug-induced time-series gene expression data. DRIM uses autoencoder, tensor decomposition, and drug-gene association to determine n-to-one relationships from multi-omics data. Then, regulatory relationships of mediator genes are determined by gene-gene interaction knowledge and cross-correlation of drug-induced time-series gene expression data. In experiments on breast cancer cell line data, DRIM extracted mediator genes relevant to drug response and regulatory relationships of genes involved in the PI3K-Akt pathway targeted by lapatinib. In addition, DRIM revealed distinguished patterns of relationships in breast cancer cell lines with different lapatinib resistance. The third study proposes a method to predict n-to-m relationships between regulators and genes. In order to predict n-to-m relationships, this study formulated an objective function that measures the deviation between observed gene expression values and estimated gene expression values derived from gene regulatory networks. The computational challenge of minimizing the objective function is to navigate the search space of relationships exponentially increasing according to the number of regulators and genes. This challenge is addressed by the iterative local optimization with regulator-gene interaction knowledge. In this study, I developed a two-step iterative RL-based method to predict n-to-m relationships from regulator and gene expression data. The first step is to explore the n-to-one gene-oriented step that selects regulators by reinforcement learning based heuristic to add edges to the network. The second step is to explore the one-to-m regulator-oriented step that stochastically selects genes to remove edges from the network. In experiments on breast cancer cell line data, the proposed method constructed breast cancer subtype-specific networks from the regulator and gene expression profiles with a more accurate gene expression estimation than previous combinatorial optimization methods. Moreover, regulatory relationships involved in the networks were associated with breast cancer subtypes. In summary, in this thesis, I proposed computational methods for predicting one-to-m, n-to-one, and n-to-m relationships between multi-omics regulators and genes utilizing external domain knowledge. The proposed methods are expected to deepen our knowledge of cellular mechanisms by understanding gene regulatory interactions by analyzing the ever-increasing molecular biology data such as The Cancer Genome Atlas, Cancer Cell Line Encyclopedia.Chapter 1 Introduction 1 1.1 Biological background 1 1.1.1 Multi-omics analysis 1 1.1.2 Multi-omics relationships indicating cell state 2 1.1.3 Biological prior knowledge 4 1.2 Research problems for the multi-omics relationship 6 1.3 Computational challenges and approaches in the exploring multiomics relationship 6 1.4 Outline of the thesis 12 Chapter 2 Literature-based condition-specific miRNA-mRNA target prediction 13 2.1 Computational Problem & Evaluation criterion 14 2.2 Related works 15 2.3 Motivation 17 2.4 Methods 20 2.4.1 Identifying genes and miRNAs based on the user-provided context 22 2.4.2 Omics Score 23 2.4.3 Context Score 24 2.4.4 Confidence Score 26 2.5 Results 26 2.5.1 Pathway analysis 27 2.5.2 Reproducibility of validated targets in humans 31 2.5.3 Sensitivity tests when different keywords are used 33 2.6 Summary 34 Chapter 3 DRIM: A web-based system for investigating drug response at the molecular level by condition-specific multi-omics data integration 36 3.1 Computational Problem & Evaluation criterion 37 3.2 Related works 38 3.3 Motivation 42 3.4 Methods 44 3.4.1 Step 1: Input 45 3.4.2 Step 2: Identifying perturbed sub-pathway with time-series 45 3.4.3 Step 3: Embedding multi-omics for selecting potential mediator genes 47 3.4.4 Step 4: Construct TF-regulatory time-bounded network and identify regulatory path 52 3.4.5 Step 5: Analysis result on the web 52 3.5 Case study: Comparative analysis of breast cancer cell lines that have different sensitivity with lapatinib 54 3.5.1 Multi-omics analysis result before drug treatment 56 3.5.2 Time-series gene expression analysis after drug treatment 57 3.6 Summary 61 Chapter 4 Combinatorial modeling and optimization using iterative RL search for inferring sample-specific regulatory network 63 4.1 Computational Problem & Evaluation criterion 64 4.2 Related works 64 4.3 Motivation 66 4.4 Methods 68 4.4.1 Formulating an objective function 68 4.4.2 Overview of an iterative search method 70 4.4.3 G-step for exploring n-to-one gene-oriented relationship 73 4.4.4 R-step for exploring one-to-m regulator-oriented relationship 79 4.5 Results 80 4.5.1 Cancer cell line data 80 4.5.2 Hyperparameters 81 4.5.3 Quantitative evaluation 82 4.5.4 Qualitative evaluation 83 4.6 Summary 86 Chapter 5 Conclusions 88 ๊ตญ๋ฌธ์ดˆ๋ก 111๋ฐ•

    Drug repurposing using biological networks

    Get PDF
    Drug repositioning is a strategy to identify new uses for existing, approved, or research drugs that are outside the scope of its original medical indication. Drug repurposing is based on the fact that one drug can act on multiple targets or that two diseases can have molecular similarities, among others. Currently, thanks to the rapid advancement of high-performance technologies, a massive amount of biological and biomedical data is being generated. This allows the use of computational methods and models based on biological networks to develop new possibilities for drug repurposing. Therefore, here, we provide an in-depth review of the main applications of drug repositioning that have been carried out using biological network models. The goal of this review is to show the usefulness of these computational methods to predict associations and to find candidate drugs for repositioning in new indications of certain diseases
    • โ€ฆ
    corecore