1 research outputs found

    생물학적 사전 지식을 ν™œμš©ν•œ κ³ μ°¨μ›μ˜ 닀쀑 였믹슀 관계λ₯Ό μ°ΎλŠ” 컴퓨터 곡학적 μ ‘κ·Ό 방법

    Get PDF
    ν•™μœ„λ…Όλ¬Έ(박사) -- μ„œμšΈλŒ€ν•™κ΅λŒ€ν•™μ› : κ³΅κ³ΌλŒ€ν•™ 컴퓨터곡학뢀, 2021.8. κΉ€μ„ .세포가 μ–΄λ–»κ²Œ κΈ°λŠ₯ν•˜κ³  μ™ΈλΆ€ μžκ·Ήμ— λ°˜μ‘ν•˜λŠ”μ§€ μ΄ν•΄ν•˜λŠ” 것은 생물학, μ˜ν•™μ—μ„œ κ°€μž₯ μ€‘μš”ν•œ 관심사 쀑 ν•˜λ‚˜μ΄λ‹€. 기술의 λ°œμ „μœΌλ‘œ κ³Όν•™μžλ“€μ€ 단일 생물학적 μ‹€ν—˜μœΌλ‘œ μ„Έν¬μ˜ λ³€ν™”μš”μΈλ“€μ„ μ‰½κ²Œ μΈ‘μ •ν•  수 있게 λ˜μ—ˆλ‹€. μ£Όλͺ©ν• λ§Œν•œ μ˜ˆμ‹œλ‘œ κ²Œλ†ˆ μ‹œν€€μ‹±, μœ μ „μž λ°œν˜„λŸ‰ μΈ‘μ •, μœ μ „μž λ°œν˜„μ„ μ‘°μ ˆν•˜λŠ” ν›„μ„± μœ μ „μ²΄ μΈ‘μ • 같은 닀쀑 였믹슀 데이터가 μžˆλ‹€. μ„Έν¬μ˜ μƒνƒœλ₯Ό 더 μžμ„Ένžˆ μ΄ν•΄ν•˜κΈ° μœ„ν•΄μ„œ 닀쀑 였믹슀 μ‘°μ ˆμžμ™€ μœ μ „μž μ‚¬μ΄μ˜ 쑰절 관계λ₯Ό μ•Œμ•„λ‚΄λŠ” 것이 μ€‘μš”ν•˜λ‹€. ν•˜μ§€λ§Œ 닀쀑 였믹슀 쑰절 κ΄€κ³„λŠ” 맀우 λ³΅μž‘ν•˜κ³  λͺ¨λ“  세포 μƒνƒœ 특이적인 관계λ₯Ό μ‹€ν—˜μ μœΌλ‘œ κ²€μ¦ν•˜λŠ” 것은 λΆˆκ°€λŠ₯ν•˜λ‹€. λ”°λΌμ„œ, μ„œλ‘œ λ‹€λ₯Έ μœ ν˜•μ˜ 고차원 였믹슀 λ°μ΄ν„°λ‘œλΆ€ν„° 관계λ₯Ό μ˜ˆμΈ‘ν•˜κΈ° μœ„ν•œ 효율적인 컴퓨터 곡학적 접근방법이 μš”κ΅¬λœλ‹€. μ΄λŸ¬ν•œ 고차원 데이터λ₯Ό μ²˜λ¦¬ν•˜λŠ” ν•œ 가지 방법은 λ‹€μ–‘ν•œ λ°μ΄ν„°λ² μ΄μŠ€μ—μ„œ μ„ λ³„λœ μœ μ „μžμ˜ κΈ°λŠ₯κ³Ό 였믹슀 κ°„μ˜ 관계와 같은 μ™ΈλΆ€ 생물학적 지식을 ν†΅ν•©ν•˜μ—¬ ν™œμš©ν•˜λŠ” 것이닀. λ³Έ λ°•μ‚¬ν•™μœ„ 논문은 생물학적 사전 지식을 ν™œμš©ν•˜μ—¬ 닀쀑 였믹슀 λ°μ΄ν„°λ‘œλΆ€ν„° μœ μ „μžμ˜ λ°œν˜„μ„ μ‘°μ ˆν•˜λŠ” 관계λ₯Ό μ˜ˆμΈ‘ν•˜κΈ° μœ„ν•œ μ„Έ 가지 컴퓨터 곡학적인 접근법을 μ œμ•ˆν•˜μ˜€λ‹€. 첫 λ²ˆμ§ΈλŠ” 마이크둜 μ•Œμ—”μ—μ΄μ™€ μœ μ „μžμ˜ μΌλŒ€λ‹€ 관계λ₯Ό μ˜ˆμΈ‘ν•˜κΈ° μœ„ν•œ 기법이닀. 마이크둜 μ•Œμ—”μ—μ΄ ν‘œμ  예츑 λ¬Έμ œλŠ” κ°€λŠ₯ν•œ ν‘œμ  μœ μ „μžμ˜ κ°œμˆ˜κ°€ λ„ˆλ¬΄ 많으며 거짓 μ–‘μ„±κ³Ό κ±°μ§“μŒμ„±μ˜ λΉ„μœ¨μ„ μ‘°μ ˆν•΄μ•Ό ν•˜λŠ” λ¬Έμ œκ°€ μžˆλ‹€. μ΄λŸ¬ν•œ 문제λ₯Ό ν•΄κ²°ν•˜κΈ° μœ„ν•΄ 마이크둜 μ•Œμ—”μ—μ΄-μœ μ „μžμ™€ λ°μ΄ν„°μ˜ λ§₯락 μ‚¬μ΄μ˜ 연관성을 λ¬Έν—Œ 지식을 ν™œμš©ν•˜μ—¬ κ²°μ •ν•˜κ³  마이크둜 μ•Œμ—”μ—μ΄-μœ μ „μž 관계λ₯Ό μ˜ˆμΈ‘ν•˜κΈ° μœ„ν•œ ContextMMIAλ₯Ό κ°œλ°œν•˜μ˜€λ‹€. ContextMMIAλŠ” 톡계적 μœ μ˜μ„±κ³Ό λ¬Έν—Œ 관련성을 기반으둜 마이크둜 μ•Œμ—”μ—μ΄-μœ μ „μž κ΄€κ³„μ˜ 점수λ₯Ό κ³„μ‚°ν•˜μ—¬ κ΄€κ³„μ˜ μš°μ„ μˆœμœ„λ₯Ό κ²°μ •ν•œλ‹€. μ˜ˆν›„κ°€ λ‹€λ₯Έ μœ λ°©μ•” 데이터에 λŒ€ν•œ μ‹€ν—˜μ—μ„œ ContextMMIAλŠ” μ˜ˆν›„κ°€ λ‚˜μœ μœ λ°©μ•”μ—μ„œ ν™œμ„±ν™”λœ 마이크둜 μ•Œμ—”μ—μ΄-μœ μ „μž 관계λ₯Ό μ˜ˆμΈ‘ν•˜μ˜€κ³  κΈ°μ‘΄ μ‹€ν—˜μ μœΌλ‘œ κ²€μ¦λœ 관계가 높은 μš°μ„ μˆœμœ„λ‘œ μ˜ˆμΈ‘λ˜μ—ˆμœΌλ©° ν•΄λ‹Ή μœ μ „μžλ“€μ΄ μœ λ°©μ•” κ΄€λ ¨ κ²½λ‘œμ— κ΄€μ—¬ν•˜λŠ” κ²ƒμœΌλ‘œ μ•Œλ €μ‘Œλ‹€. 두 λ²ˆμ§ΈλŠ” μ•½λ¬Ό λ°˜μ‘μ„ μΌμœΌν‚€λŠ” μœ μ „μžμ˜ λ‹€λŒ€μΌ 쑰절 관계λ₯Ό μ˜ˆμΈ‘ν•˜κΈ° μœ„ν•œ 기법이닀. μ•½λ¬Ό λ°˜μ‘ μ˜ˆμΈ‘μ„ μœ„ν•΄μ„œ μ•½λ¬Ό λ°˜μ‘ 맀개 μœ μ „μžλ₯Ό κ²°μ •ν•΄μ•Ό ν•˜λ©° 이λ₯Ό μœ„ν•΄ 20,000개 μœ μ „μžμ˜ 닀쀑 였믹슀 데이터λ₯Ό 톡합 λΆ„μ„ν•˜λŠ” 방법이 ν•„μš”ν•˜λ‹€. 이 문제λ₯Ό ν•΄κ²°ν•˜κΈ° μœ„ν•΄ 저차원 μž„λ² λ”© 방법, μ•½λ¬Ό-μœ μ „μž 연관성에 λŒ€ν•œ λ¬Έν—Œ 지식 및 μœ μ „μž-μœ μ „μž μƒν˜Έ μž‘μš© 지식을 ν™œμš©ν•˜μ—¬ μ•½λ¬Ό λ°˜μ‘μ„ μ˜ˆμΈ‘ν•˜κΈ° μœ„ν•œ DRIM을 κ°œλ°œν•˜μ˜€λ‹€. DRIM은 μ˜€ν† μΈμ½”λ”, ν…μ„œ λΆ„ν•΄, μ•½λ¬Ό-μœ μ „μž 연관성을 μ΄μš©ν•˜μ—¬ 닀쀑 였믹슀 λ°μ΄ν„°μ—μ„œ λ‹€λŒ€μΌ 관계λ₯Ό κ²°μ •ν•œλ‹€. κ²°μ •λœ 맀개 μœ μ „μžμ˜ 쑰절 관계λ₯Ό μœ μ „μž-μœ μ „μž μƒν˜Έ μž‘μš© 지식과 μ•½λ¬Ό λ°˜μ‘ μ‹œκ³„μ—΄ μœ μ „μž λ°œν˜„ λ°μ΄ν„°μ˜ μƒν˜Έ 상관관계λ₯Ό μ΄μš©ν•˜μ—¬ κ²°μ •ν•œλ‹€. μœ λ°©μ•” 세포주 데이터에 λŒ€ν•œ μ‹€ν—˜μ—μ„œ DRIM은 λΌνŒŒν‹°λ‹™μ΄ ν‘œμ μœΌλ‘œ ν•˜λŠ” PI3K-Akt νŒ¨μŠ€μ›¨μ΄μ— κ΄€μ—¬ν•˜λŠ” μœ μ „μžλ“€μ˜ μ•½λ¬Ό λ°˜μ‘ 쑰절 관계λ₯Ό μ˜ˆμΈ‘ν•˜μ˜€κ³  λΌνŒŒν‹°λ‹™ λ°˜μ‘μ„±κ³Ό κ΄€λ ¨λœ 맀개 μœ μ „μžλ₯Ό μ˜ˆμΈ‘ν•˜μ˜€λ‹€. 그리고 예츑된 쑰절 관계가 세포주 특이적인 νŒ¨ν„΄μ„ λ³΄μ΄λŠ” 것을 ν™•μΈν•˜μ˜€λ‹€. μ„Έ λ²ˆμ§ΈλŠ” μ„Έν¬μ˜ μƒνƒœλ₯Ό μ„€λͺ…ν•˜λŠ” μ‘°μ ˆμžμ™€ μœ μ „μžμ˜ λ‹€λŒ€λ‹€ 쑰절 관계λ₯Ό μ˜ˆμΈ‘ν•˜κΈ° μœ„ν•œ 기법이닀. λ‹€λŒ€λ‹€ 관계 μ˜ˆμΈ‘μ„ μœ„ν•΄ κ΄€μ°°λœ μœ μ „μž λ°œν˜„ κ°’κ³Ό μœ μ „μž 쑰절 λ„€νŠΈμ›Œν¬λ‘œλΆ€ν„° μΆ”μ •λœ μœ μ „μž λ°œν˜„ κ°’ μ‚¬μ΄μ˜ 차이λ₯Ό μΈ‘μ •ν•˜λŠ” λͺ©μ  ν•¨μˆ˜λ₯Ό λ§Œλ“€μ—ˆλ‹€. λͺ©μ  ν•¨μˆ˜λ₯Ό μ΅œμ†Œν™”ν•˜κΈ° μœ„ν•˜μ—¬ μ‘°μ ˆμΈμžμ™€ μœ μ „μžμ˜ μˆ˜μ— 따라 κΈ°ν•˜κΈ‰μˆ˜μ μœΌλ‘œ μ¦κ°€ν•˜λŠ” 검색 곡간을 탐색해야 ν•œλ‹€. 이 문제λ₯Ό ν•΄κ²°ν•˜κΈ° μœ„ν•΄ 쑰절자-μœ μ „μž μƒν˜Έ μž‘μš© 지식을 ν™œμš©ν•˜μ—¬ 두 가지 연산을 λ°˜λ³΅ν•˜μ—¬ 쑰절 관계λ₯Ό μ°ΎλŠ” μ΅œμ ν™” 기법을 κ°œλ°œν•˜μ˜€λ‹€. 첫 번째 λ‹¨κ³„λŠ” λ„€νŠΈμ›Œν¬μ— 간선을 μΆ”κ°€ν•˜κΈ° μœ„ν•΄ κ°•ν™” ν•™μŠ΅ 기반 νœ΄λ¦¬μŠ€ν‹±μ„ 톡해 쑰절자λ₯Ό μ„ νƒν•˜λŠ” λ‹€λŒ€μΌ μœ μ „μž 쀑심 관계λ₯Ό νƒμƒ‰ν•˜λŠ” 단계이닀. 두 번째 λ‹¨κ³„λŠ” λ„€νŠΈμ›Œν¬μ—μ„œ 간선을 μ œκ±°ν•˜κΈ° μœ„ν•΄ μœ μ „μžλ₯Ό ν™•λ₯ μ μœΌλ‘œ μ„ νƒν•˜λŠ” μΌλŒ€λ‹€ 쑰절자 쀑심 관계λ₯Ό νƒμƒ‰ν•˜λŠ” 단계이닀. μœ λ°©μ•” 세포주 데이터에 λŒ€ν•œ μ‹€ν—˜μ—μ„œ μ œμ•ˆλœ 방법은 μ΄μ „μ˜ μ΅œμ ν™” 방법보닀 더 μ •ν™•ν•œ μœ μ „μž λ°œν˜„λŸ‰ 좔정을 ν•˜μ˜€κ³  쑰절자 및 μœ μ „μž λ°œν˜„ λ°μ΄ν„°λ‘œ μœ λ°©μ•” μ•„ν˜• 특이적 λ„€νŠΈμ›Œν¬λ₯Ό κ΅¬μ„±ν•˜μ˜€λ‹€. λ˜ν•œ, μœ λ°©μ•” μ•„ν˜• κ΄€λ ¨ μ‹€ν—˜ κ²€μ¦λœ 쑰절 관계λ₯Ό μ˜ˆμΈ‘ν•˜μ˜€λ‹€. μš”μ•½ν•˜λ©΄, λ³Έ λ°•μ‚¬ν•™μœ„ 논문은 닀쀑 였믹슀 μ‘°μ ˆμžμ™€ μœ μ „μžμ˜ μ‚¬μ΄μ˜ μΌλŒ€λ‹€, λ‹€λŒ€μΌ, λ‹€λŒ€λ‹€ 관계λ₯Ό μ˜ˆμΈ‘ν•˜κΈ° μœ„ν•˜μ—¬ 생물학적 지식을 ν™œμš©ν•œ 컴퓨터 곡학적 접근법을 μ œμ•ˆν•˜μ˜€λ‹€. μ œμ•ˆλœ 방법은 μ¦κ°€ν•˜κ³  μžˆλŠ” λΆ„μž 생물학 데이터λ₯Ό λΆ„μ„ν•˜μ—¬ μœ μ „μž 쑰절 μƒν˜Έ μž‘μš©μ„ μ΄ν•΄ν•¨μœΌλ‘œμ¨ 세포 κΈ°λŠ₯에 λŒ€ν•œ 심측적인 이해λ₯Ό 도와쀄 수 μžˆμ„ κ²ƒμœΌλ‘œ κΈ°λŒ€λœλ‹€.Understanding how cells function or respond to external stimuli is one of the most important questions in biology and medicine. Thanks to the advances in instrumental technologies, scientists can routinely measure events within cells in single biological experiments. Notable examples are multi-omics data: sequencing of genomes, quantifications of gene expression, and identification of epigenetic events that regulate expression of genes. In order to better understand cellular mechanisms, it is essential to identify regulatory relationships between multi-omics regulators and genes. However, regulatory relationships are very complex and it is infeasible to validate all condition-specific relationships experimentally. Thus, there is an urgent need for an efficient computational method to extract relationships from different types of high-dimensional omics data. One way to address these high-dimensional data is to incorporate external biological knowledge such as relationships between omics and functions of genes curated in various databases. In my doctoral study, I developed three computational approaches to identify the regulatory relationships from multi-omics data utilizing biological prior knowledge. The first study proposes a method to predict one-to-m relationships between miRNA and genes. The computational challenge of miRNA target prediction is that there are many miRNA target candidates, and the ratio of false positives to false negatives needs to be adjusted. This challenge is addressed by utilizing literature knowledge for determining the association between miRNA-gene and a given context. In this study, I developed ContextMMIA to predict miRNA-gene relationships from miRNA and gene expression data. ContextMMIA computes scores of miRNA-gene relationships based on statistical significance and literature relevance and prioritizes the relationships based on the scores. In experiments on breast cancer data with different prognosis, ContextMMIA predicted differentially activated miRNA-gene relationships in invasive breast cancer. The experimentally verified miRNA-gene relationships were predicted with high priority and those genes are known to be involved in breast cancer-related pathways. The second study proposes a method to predict n-to-one relationships between regulators and gene on drug response. The computational challenge of drug response prediction is how to integrate multi-omics data of 20,000 genes for determining drug response mediator genes. This challenge is addressed by utilizing low-dimensional embedding methods, literature knowledge of drug-gene associations, and gene-gene interaction knowledge. For this problem, I developed DRIM to predict drug response relationships from the multi-omics data and drug-induced time-series gene expression data. DRIM uses autoencoder, tensor decomposition, and drug-gene association to determine n-to-one relationships from multi-omics data. Then, regulatory relationships of mediator genes are determined by gene-gene interaction knowledge and cross-correlation of drug-induced time-series gene expression data. In experiments on breast cancer cell line data, DRIM extracted mediator genes relevant to drug response and regulatory relationships of genes involved in the PI3K-Akt pathway targeted by lapatinib. In addition, DRIM revealed distinguished patterns of relationships in breast cancer cell lines with different lapatinib resistance. The third study proposes a method to predict n-to-m relationships between regulators and genes. In order to predict n-to-m relationships, this study formulated an objective function that measures the deviation between observed gene expression values and estimated gene expression values derived from gene regulatory networks. The computational challenge of minimizing the objective function is to navigate the search space of relationships exponentially increasing according to the number of regulators and genes. This challenge is addressed by the iterative local optimization with regulator-gene interaction knowledge. In this study, I developed a two-step iterative RL-based method to predict n-to-m relationships from regulator and gene expression data. The first step is to explore the n-to-one gene-oriented step that selects regulators by reinforcement learning based heuristic to add edges to the network. The second step is to explore the one-to-m regulator-oriented step that stochastically selects genes to remove edges from the network. In experiments on breast cancer cell line data, the proposed method constructed breast cancer subtype-specific networks from the regulator and gene expression profiles with a more accurate gene expression estimation than previous combinatorial optimization methods. Moreover, regulatory relationships involved in the networks were associated with breast cancer subtypes. In summary, in this thesis, I proposed computational methods for predicting one-to-m, n-to-one, and n-to-m relationships between multi-omics regulators and genes utilizing external domain knowledge. The proposed methods are expected to deepen our knowledge of cellular mechanisms by understanding gene regulatory interactions by analyzing the ever-increasing molecular biology data such as The Cancer Genome Atlas, Cancer Cell Line Encyclopedia.Chapter 1 Introduction 1 1.1 Biological background 1 1.1.1 Multi-omics analysis 1 1.1.2 Multi-omics relationships indicating cell state 2 1.1.3 Biological prior knowledge 4 1.2 Research problems for the multi-omics relationship 6 1.3 Computational challenges and approaches in the exploring multiomics relationship 6 1.4 Outline of the thesis 12 Chapter 2 Literature-based condition-specific miRNA-mRNA target prediction 13 2.1 Computational Problem & Evaluation criterion 14 2.2 Related works 15 2.3 Motivation 17 2.4 Methods 20 2.4.1 Identifying genes and miRNAs based on the user-provided context 22 2.4.2 Omics Score 23 2.4.3 Context Score 24 2.4.4 Confidence Score 26 2.5 Results 26 2.5.1 Pathway analysis 27 2.5.2 Reproducibility of validated targets in humans 31 2.5.3 Sensitivity tests when different keywords are used 33 2.6 Summary 34 Chapter 3 DRIM: A web-based system for investigating drug response at the molecular level by condition-specific multi-omics data integration 36 3.1 Computational Problem & Evaluation criterion 37 3.2 Related works 38 3.3 Motivation 42 3.4 Methods 44 3.4.1 Step 1: Input 45 3.4.2 Step 2: Identifying perturbed sub-pathway with time-series 45 3.4.3 Step 3: Embedding multi-omics for selecting potential mediator genes 47 3.4.4 Step 4: Construct TF-regulatory time-bounded network and identify regulatory path 52 3.4.5 Step 5: Analysis result on the web 52 3.5 Case study: Comparative analysis of breast cancer cell lines that have different sensitivity with lapatinib 54 3.5.1 Multi-omics analysis result before drug treatment 56 3.5.2 Time-series gene expression analysis after drug treatment 57 3.6 Summary 61 Chapter 4 Combinatorial modeling and optimization using iterative RL search for inferring sample-specific regulatory network 63 4.1 Computational Problem & Evaluation criterion 64 4.2 Related works 64 4.3 Motivation 66 4.4 Methods 68 4.4.1 Formulating an objective function 68 4.4.2 Overview of an iterative search method 70 4.4.3 G-step for exploring n-to-one gene-oriented relationship 73 4.4.4 R-step for exploring one-to-m regulator-oriented relationship 79 4.5 Results 80 4.5.1 Cancer cell line data 80 4.5.2 Hyperparameters 81 4.5.3 Quantitative evaluation 82 4.5.4 Qualitative evaluation 83 4.6 Summary 86 Chapter 5 Conclusions 88 ꡭ문초둝 111λ°•
    corecore