48 research outputs found

    Computationally Linking Chemical Exposure to Molecular Effects with Complex Data: Comparing Methods to Disentangle Chemical Drivers in Environmental Mixtures and Knowledge-based Deep Learning for Predictions in Environmental Toxicology

    Get PDF
    Chemical exposures affect the environment and may lead to adverse outcomes in its organisms. Omics-based approaches, like standardised microarray experiments, have expanded the toolbox to monitor the distribution of chemicals and assess the risk to organisms in the environment. The resulting complex data have extended the scope of toxicological knowledge bases and published literature. A plethora of computational approaches have been applied in environmental toxicology considering systems biology and data integration. Still, the complexity of environmental and biological systems given in data challenges investigations of exposure-related effects. This thesis aimed at computationally linking chemical exposure to biological effects on the molecular level considering sources of complex environmental data. The first study employed data of an omics-based exposure study considering mixture effects in a freshwater environment. We compared three data-driven analyses in their suitability to disentangle mixture effects of chemical exposures to biological effects and their reliability in attributing potentially adverse outcomes to chemical drivers with toxicological databases on gene and pathway levels. Differential gene expression analysis and a network inference approach resulted in toxicologically meaningful outcomes and uncovered individual chemical effects โ€” stand-alone and in combination. We developed an integrative computational strategy to harvest exposure-related gene associations from environmental samples considering mixtures of lowly concentrated compounds. The applied approaches allowed assessing the hazard of chemicals more systematically with correlation-based compound groups. This dissertation presents another achievement toward a data-driven hypothesis generation for molecular exposure effects. The approach combined text-mining and deep learning. The study was entirely data-driven and involved state-of-the-art computational methods of artificial intelligence. We employed literature-based relational data and curated toxicological knowledge to predict chemical-biomolecule interactions. A word embedding neural network with a subsequent feed-forward network was implemented. Data augmentation and recurrent neural networks were beneficial for training with curated toxicological knowledge. The trained models reached accuracies of up to 94% for unseen test data of the employed knowledge base. However, we could not reliably confirm known chemical-gene interactions across selected data sources. Still, the predictive models might derive unknown information from toxicological knowledge sources, like literature, databases or omics-based exposure studies. Thus, the deep learning models might allow predicting hypotheses of exposure-related molecular effects. Both achievements of this dissertation might support the prioritisation of chemicals for testing and an intelligent selection of chemicals for monitoring in future exposure studies.:Table of Contents ... I Abstract ... V Acknowledgements ... VII Prelude ... IX 1 Introduction 1.1 An overview of environmental toxicology ... 2 1.1.1 Environmental toxicology ... 2 1.1.2 Chemicals in the environment ... 4 1.1.3 Systems biological perspectives in environmental toxicology ... 7 Computational toxicology ... 11 1.2.1 Omics-based approaches ... 12 1.2.2 Linking chemical exposure to transcriptional effects ... 14 1.2.3 Up-scaling from the gene level to higher biological organisation levels ... 19 1.2.4 Biomedical literature-based discovery ... 24 1.2.5 Deep learning with knowledge representation ... 27 1.3 Research question and approaches ... 29 2 Methods and Data ... 33 2.1 Linking environmental relevant mixture exposures to transcriptional effects ... 34 2.1.1 Exposure and microarray data ... 34 2.1.2 Preprocessing ... 35 2.1.3 Differential gene expression ... 37 2.1.4 Association rule mining ... 38 2.1.5 Weighted gene correlation network analysis ... 39 2.1.6 Method comparison ... 41 Predicting exposure-related effects on a molecular level ... 44 2.2.1 Input ... 44 2.2.2 Input preparation ... 47 2.2.3 Deep learning models ... 49 2.2.4 Toxicogenomic application ... 54 3 Method comparison to link complex stream water exposures to effects on the transcriptional level ... 57 3.1 Background and motivation ... 58 3.1.1 Workflow ... 61 3.2 Results ... 62 3.2.1 Data preprocessing ... 62 3.2.2 Differential gene expression analysis ... 67 3.2.3 Association rule mining ... 71 3.2.4 Network inference ... 78 3.2.5 Method comparison ... 84 3.2.6 Application case of method integration ... 87 3.3 Discussion ... 91 3.4 Conclusion ... 99 4 Deep learning prediction of chemical-biomolecule interactions ... 101 4.1 Motivation ... 102 4.1.1Workflow ...105 4.2 Results ... 107 4.2.1 Input preparation ... 107 4.2.2 Model selection ... 110 4.2.3 Model comparison ... 118 4.2.4 Toxicogenomic application ... 121 4.2.5 Horizontal augmentation without tail-padding ...123 4.2.6 Four-class problem formulation ... 124 4.2.7 Training with CTD data ... 125 4.3 Discussion ... 129 4.3.1 Transferring biomedical knowledge towards toxicology ... 129 4.3.2 Deep learning with biomedical knowledge representation ...133 4.3.3 Data integration ...136 4.4 Conclusion ... 141 5 Conclusion and Future perspectives ... 143 5.1 Conclusion ... 143 5.1.1 Investigating complex mixtures in the environment ... 144 5.1.2 Complex knowledge from literature and curated databases predict chemical- biomolecule interactions ... 145 5.1.3 Linking chemical exposure to biological effects by integrating CTD ... 146 5.2 Future perspectives ... 147 S1 Supplement Chapter 1 ... 153 S1.1 Example of an estrogen bioassay ... 154 S1.2 Types of mode of action ... 154 S1.3 The dogma of molecular biology ... 157 S1.4 Transcriptomics ... 159 S2 Supplement Chapter 3 ... 161 S3 Supplement Chapter 4 ... 175 S3.1 Hyperparameter tuning results ... 176 S3.2 Functional enrichment with predicted chemical-gene interactions and CTD reference pathway genesets ... 179 S3.3 Reduction of learning rate in a model with large word embedding vectors ... 183 S3.4 Horizontal augmentation without tail-padding ... 183 S3.5 Four-relationship classification ... 185 S3.6 Interpreting loss observations for SemMedDB trained models ... 187 List of Abbreviations ... i List of Figures ... vi List of Tables ... x Bibliography ... xii Curriculum scientiae ... xxxix Selbstรคndigkeitserklรคrung ... xlii

    Relation Prediction over Biomedical Knowledge Bases for Drug Repositioning

    Get PDF
    Identifying new potential treatment options for medical conditions that cause human disease burden is a central task of biomedical research. Since all candidate drugs cannot be tested with animal and clinical trials, in vitro approaches are first attempted to identify promising candidates. Likewise, identifying other essential relations (e.g., causation, prevention) between biomedical entities is also critical to understand biomedical processes. Hence, it is crucial to develop automated relation prediction systems that can yield plausible biomedical relations to expedite the discovery process. In this dissertation, we demonstrate three approaches to predict treatment relations between biomedical entities for the drug repositioning task using existing biomedical knowledge bases. Our approaches can be broadly labeled as link prediction or knowledge base completion in computer science literature. Specifically, first we investigate the predictive power of graph paths connecting entities in the publicly available biomedical knowledge base, SemMedDB (the entities and relations constitute a large knowledge graph as a whole). To that end, we build logistic regression models utilizing semantic graph pattern features extracted from the SemMedDB to predict treatment and causative relations in Unified Medical Language System (UMLS) Metathesaurus. Second, we study matrix and tensor factorization algorithms for predicting drug repositioning pairs in repoDB, a general purpose gold standard database of approved and failed drugโ€“disease indications. The idea here is to predict repoDB pairs by approximating the given input matrix/tensor structure where the value of a cell represents the existence of a relation coming from SemMedDB and UMLS knowledge bases. The essential goal is to predict the test pairs that have a blank cell in the input matrix/tensor based on the shared biomedical context among existing non-blank cells. Our final approach involves graph convolutional neural networks where entities and relation types are embedded in a vector space involving neighborhood information. Basically, we minimize an objective function to guide our model to concept/relation embeddings such that distance scores for positive relation pairs are lower than those for the negative ones. Overall, our results demonstrate that recent link prediction methods applied to automatically curated, and hence imprecise, knowledge bases can nevertheless result in high accuracy drug candidate prediction with appropriate configuration of both the methods and datasets used

    Knowledge-based Biomedical Data Science 2019

    Full text link
    Knowledge-based biomedical data science (KBDS) involves the design and implementation of computer systems that act as if they knew about biomedicine. Such systems depend on formally represented knowledge in computer systems, often in the form of knowledge graphs. Here we survey the progress in the last year in systems that use formally represented knowledge to address data science problems in both clinical and biological domains, as well as on approaches for creating knowledge graphs. Major themes include the relationships between knowledge graphs and machine learning, the use of natural language processing, and the expansion of knowledge-based approaches to novel domains, such as Chinese Traditional Medicine and biodiversity.Comment: Manuscript 43 pages with 3 tables; Supplemental material 43 pages with 3 table

    Comparing Attributional and Relational Similarity as a Means to Identify Clinically Relevant Drug-gene Relationships

    Get PDF
    In emerging domains, such as precision oncology, knowledge extracted from explicit assertions may be insufficient to identify relationships of interest. One solution to this problem involves drawing inference on the basis of similarity. Computational methods have been developed to estimate the semantic similarity and relatedness between terms and relationships that are distributed across corpora of literature such as Medline abstracts and other forms of human readable text. Most research on distributional similarity has focused on the notion of attributional similarity, which estimates the similarity between entities based on the contexts in which they occur across a large corpus. A relatively under-researched area concerns relational similarity, in which the similarity between pairs of entities is estimated from the contexts in which these entity pairs occur together. While it seems intuitive that models capturing the structure of the relationships between entities might mediate the identification of biologically important relationships, there is to date no comparison of the relative utility of attributional and relational models for this purpose. In this research, I compare the performance of a range of relational and attributional similarity methods, on the task of identifying drugs that may be therapeutically useful in the context of particular aberrant genes, as identified by a team of human experts. My hypothesis is that relational similarity will be of greater utility than attributional similarity as a means to identify biological relationships that may provide answers to clinical questions, (such as โ€œwhich drugs INHIBIT gene xโ€?) in the context of rapidly evolving domains. My results show that models based on relational similarity outperformed models based on attributional similarity on this task. As the methods explained in this research can be applied to identify any sort of relationship for which cue pairs exist, my results suggest that relational similarity may be a suitable approach to apply to other biomedical problems. Furthermore, I found models based on neural word embeddings (NWE) to be particularly useful for this task, given their higher performance than Random Indexing-based models, and significantly less computational effort needed to create them. NWE methods (such as those produced by the popular word2vec tool) are a relatively recent development in the domain of distributional semantics, and are considered by many as the state-of-the-art when it comes to semantic language modeling. However, their application in identifying biologically important relationships from Medline in general, and specifically, in the domain of precision oncology has not been well studied. The results of this research can guide the design and implementation of biomedical question answering and other relationship extraction applications for precision medicine, precision oncology and other similar domains, where there is rapid emergence of novel knowledge. The methods developed and evaluated in this project can help NLP applications provide more accurate results by leveraging corpus based methods that are by design scalable and robust

    Literature Based Discovery (LBD): Towards Hypothesis Generation and Knowledge Discovery in Biomedical Text Mining

    Full text link
    Biomedical knowledge is growing in an astounding pace with a majority of this knowledge is represented as scientific publications. Text mining tools and methods represents automatic approaches for extracting hidden patterns and trends from this semi structured and unstructured data. In Biomedical Text mining, Literature Based Discovery (LBD) is the process of automatically discovering novel associations between medical terms otherwise mentioned in disjoint literature sets. LBD approaches proven to be successfully reducing the discovery time of potential associations that are hidden in the vast amount of scientific literature. The process focuses on creating concept profiles for medical terms such as a disease or symptom and connecting it with a drug and treatment based on the statistical significance of the shared profiles. This knowledge discovery approach introduced in 1989 still remains as a core task in text mining. Currently the ABC principle based two approaches namely open discovery and closed discovery are mostly explored in LBD process. This review starts with general introduction about text mining followed by biomedical text mining and introduces various literature resources such as MEDLINE, UMLS, MESH, and SemMedDB. This is followed by brief introduction of the core ABC principle and its associated two approaches open discovery and closed discovery in LBD process. This review also discusses the deep learning applications in LBD by reviewing the role of transformer models and neural networks based LBD models and its future aspects. Finally, reviews the key biomedical discoveries generated through LBD approaches in biomedicine and conclude with the current limitations and future directions of LBD.Comment: 43 Pages, 5 Figures, 4 Table

    KindMed: Knowledge-Induced Medicine Prescribing Network for Medication Recommendation

    Full text link
    Extensive adoption of electronic health records (EHRs) offers opportunities for its use in various clinical analyses. We could acquire more comprehensive insights by enriching an EHR cohort with external knowledge (e.g., standardized medical ontology and wealthy semantics curated on the web) as it divulges a spectrum of informative relations between observed medical codes. This paper proposes a novel Knowledge-Induced Medicine Prescribing Network (KindMed) framework to recommend medicines by inducing knowledge from myriad medical-related external sources upon the EHR cohort, rendering them as medical knowledge graphs (KGs). On top of relation-aware graph representation learning to unravel an adequate embedding of such KGs, we leverage hierarchical sequence learning to discover and fuse clinical and medicine temporal dynamics across patients' historical admissions for encouraging personalized recommendations. In predicting safe, precise, and personalized medicines, we devise an attentive prescribing that accounts for and associates three essential aspects, i.e., a summary of joint historical medical records, clinical condition progression, and the current clinical state of patients. We exhibited the effectiveness of our KindMed on the augmented real-world EHR cohorts, etching leading performances against graph-driven competing baselines

    ๋”ฅ ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ๋ฅผ ํ™œ์šฉํ•œ ์˜ํ•™ ๊ฐœ๋… ๋ฐ ํ™˜์ž ํ‘œํ˜„ ํ•™์Šต๊ณผ ์˜๋ฃŒ ๋ฌธ์ œ์—์˜ ์‘์šฉ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€, 2022. 8. ์ •๊ต๋ฏผ.๋ณธ ํ•™์œ„ ๋…ผ๋ฌธ์€ ์ „๊ตญ๋ฏผ ์˜๋ฃŒ ๋ณดํ—˜๋ฐ์ดํ„ฐ์ธ ํ‘œ๋ณธ์ฝ”ํ˜ธํŠธDB๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋”ฅ ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ ๊ธฐ๋ฐ˜์˜ ์˜ํ•™ ๊ฐœ๋… ๋ฐ ํ™˜์ž ํ‘œํ˜„ ํ•™์Šต ๋ฐฉ๋ฒ•๊ณผ ์˜๋ฃŒ ๋ฌธ์ œ ํ•ด๊ฒฐ ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๋จผ์ € ์ˆœ์ฐจ์ ์ธ ํ™˜์ž ์˜๋ฃŒ ๊ธฐ๋ก๊ณผ ๊ฐœ์ธ ํ”„๋กœํŒŒ์ผ ์ •๋ณด๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ™˜์ž ํ‘œํ˜„์„ ํ•™์Šตํ•˜๊ณ  ํ–ฅํ›„ ์งˆ๋ณ‘ ์ง„๋‹จ ๊ฐ€๋Šฅ์„ฑ์„ ์˜ˆ์ธกํ•˜๋Š” ์žฌ๊ท€์‹ ๊ฒฝ๋ง ๋ชจ๋ธ์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ์šฐ๋ฆฌ๋Š” ๋‹ค์–‘ํ•œ ์„ฑ๊ฒฉ์˜ ํ™˜์ž ์ •๋ณด๋ฅผ ํšจ์œจ์ ์œผ๋กœ ํ˜ผํ•ฉํ•˜๋Š” ๊ตฌ์กฐ๋ฅผ ๋„์ž…ํ•˜์—ฌ ํฐ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ์–ป์—ˆ๋‹ค. ๋˜ํ•œ ํ™˜์ž์˜ ์˜๋ฃŒ ๊ธฐ๋ก์„ ์ด๋ฃจ๋Š” ์˜๋ฃŒ ์ฝ”๋“œ๋“ค์„ ๋ถ„์‚ฐ ํ‘œํ˜„์œผ๋กœ ๋‚˜ํƒ€๋‚ด ์ถ”๊ฐ€ ์„ฑ๋Šฅ ๊ฐœ์„ ์„ ์ด๋ฃจ์—ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ์˜๋ฃŒ ์ฝ”๋“œ์˜ ๋ถ„์‚ฐ ํ‘œํ˜„์ด ์ค‘์š”ํ•œ ์‹œ๊ฐ„์  ์ •๋ณด๋ฅผ ๋‹ด๊ณ  ์žˆ์Œ์„ ํ™•์ธํ•˜์˜€๊ณ , ์ด์–ด์ง€๋Š” ์—ฐ๊ตฌ์—์„œ๋Š” ์ด๋Ÿฌํ•œ ์‹œ๊ฐ„์  ์ •๋ณด๊ฐ€ ๊ฐ•ํ™”๋  ์ˆ˜ ์žˆ๋„๋ก ๊ทธ๋ž˜ํ”„ ๊ตฌ์กฐ๋ฅผ ๋„์ž…ํ•˜์˜€๋‹ค. ์šฐ๋ฆฌ๋Š” ์˜๋ฃŒ ์ฝ”๋“œ์˜ ๋ถ„์‚ฐ ํ‘œํ˜„ ๊ฐ„์˜ ์œ ์‚ฌ๋„์™€ ํ†ต๊ณ„์  ์ •๋ณด๋ฅผ ๊ฐ€์ง€๊ณ  ๊ทธ๋ž˜ํ”„๋ฅผ ๊ตฌ์ถ•ํ•˜์˜€๊ณ  ๊ทธ๋ž˜ํ”„ ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ๋ฅผ ํ™œ์šฉ, ์‹œ๊ฐ„/ํ†ต๊ณ„์  ์ •๋ณด๊ฐ€ ๊ฐ•ํ™”๋œ ์˜๋ฃŒ ์ฝ”๋“œ์˜ ํ‘œํ˜„ ๋ฒกํ„ฐ๋ฅผ ์–ป์—ˆ๋‹ค. ํš๋“ํ•œ ์˜๋ฃŒ ์ฝ”๋“œ ๋ฒกํ„ฐ๋ฅผ ํ†ตํ•ด ์‹œํŒ ์•ฝ๋ฌผ์˜ ์ž ์žฌ์ ์ธ ๋ถ€์ž‘์šฉ ์‹ ํ˜ธ๋ฅผ ํƒ์ง€ํ•˜๋Š” ๋ชจ๋ธ์„ ์ œ์•ˆํ•œ ๊ฒฐ๊ณผ, ๊ธฐ์กด์˜ ๋ถ€์ž‘์šฉ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์— ์กด์žฌํ•˜์ง€ ์•Š๋Š” ์‚ฌ๋ก€๊นŒ์ง€๋„ ์˜ˆ์ธกํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์˜€๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ ๋ถ„๋Ÿ‰์— ๋น„ํ•ด ์ฃผ์š” ์ •๋ณด๊ฐ€ ํฌ์†Œํ•˜๋‹ค๋Š” ์˜๋ฃŒ ๊ธฐ๋ก์˜ ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•ด ์ง€์‹๊ทธ๋ž˜ํ”„๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์‚ฌ์ „ ์˜ํ•™ ์ง€์‹์„ ๋ณด๊ฐ•ํ•˜์˜€๋‹ค. ์ด๋•Œ ํ™˜์ž์˜ ์˜๋ฃŒ ๊ธฐ๋ก์„ ๊ตฌ์„ฑํ•˜๋Š” ์ง€์‹๊ทธ๋ž˜ํ”„์˜ ๋ถ€๋ถ„๋งŒ์„ ์ถ”์ถœํ•˜์—ฌ ๊ฐœ์ธํ™”๋œ ์ง€์‹๊ทธ๋ž˜ํ”„๋ฅผ ๋งŒ๋“ค๊ณ  ๊ทธ๋ž˜ํ”„ ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ๋ฅผ ํ†ตํ•ด ๊ทธ๋ž˜ํ”„์˜ ํ‘œํ˜„ ๋ฒกํ„ฐ๋ฅผ ํš๋“ํ•˜์˜€๋‹ค. ์ตœ์ข…์ ์œผ๋กœ ์ˆœ์ฐจ์ ์ธ ์˜๋ฃŒ ๊ธฐ๋ก์„ ํ•จ์ถ•ํ•œ ํ™˜์ž ํ‘œํ˜„๊ณผ ๋”๋ถˆ์–ด ๊ฐœ์ธํ™”๋œ ์˜ํ•™ ์ง€์‹์„ ํ•จ์ถ•ํ•œ ํ‘œํ˜„์„ ํ•จ๊ป˜ ์‚ฌ์šฉํ•˜์—ฌ ํ–ฅํ›„ ์งˆ๋ณ‘ ๋ฐ ์ง„๋‹จ ์˜ˆ์ธก ๋ฌธ์ œ์— ํ™œ์šฉํ•˜์˜€๋‹ค.This dissertation proposes a deep neural network-based medical concept and patient representation learning methods using medical claims data to solve two healthcare tasks, i.e., clinical outcome prediction and post-marketing adverse drug reaction (ADR) signal detection. First, we propose SAF-RNN, a Recurrent Neural Network (RNN)-based model that learns a deep patient representation based on the clinical sequences and patient characteristics. Our proposed model fuses different types of patient records using feature-based gating and self-attention. We demonstrate that high-level associations between two heterogeneous records are effectively extracted by our model, thus achieving state-of-the-art performances for predicting the risk probability of cardiovascular disease. Secondly, based on the observation that the distributed medical code embeddings represent temporal proximity between the medical codes, we introduce a graph structure to enhance the code embeddings with such temporal information. We construct a graph using the distributed code embeddings and the statistical information from the claims data. We then propose the Graph Neural Network(GNN)-based representation learning for post-marketing ADR detection. Our model shows competitive performances and provides valid ADR candidates. Finally, rather than using patient records alone, we utilize a knowledge graph to augment the patient representation with prior medical knowledge. Using SAF-RNN and GNN, the deep patient representation is learned from the clinical sequences and the personalized medical knowledge. It is then used to predict clinical outcomes, i.e., next diagnosis prediction and CVD risk prediction, resulting in state-of-the-art performances.1 Introduction 1 2 Background 8 2.1 Medical Concept Embedding 8 2.2 Encoding Sequential Information in Clinical Records 11 3 Deep Patient Representation with Heterogeneous Information 14 3.1 Related Work 16 3.2 Problem Statement 19 3.3 Method 20 3.3.1 RNN-based Disease Prediction Model 20 3.3.2 Self-Attentive Fusion (SAF) Encoder 23 3.4 Dataset and Experimental Setup 24 3.4.1 Dataset 24 3.4.2 Experimental Design 26 ii 3.4.3 Implementation Details 27 3.5 Experimental Results 28 3.5.1 Evaluation of CVD Prediction 28 3.5.2 Sensitivity Analysis 28 3.5.3 Ablation Studies 31 3.6 Further Investigation 32 3.6.1 Case Study: Patient-Centered Analysis 32 3.6.2 Data-Driven CVD Risk Factors 32 3.7 Conclusion 33 4 Graph-Enhanced Medical Concept Embedding 40 4.1 Related Work 42 4.2 Problem Statement 43 4.3 Method 44 4.3.1 Code Embedding Learning with Skip-gram Model 44 4.3.2 Drug-disease Graph Construction 45 4.3.3 A GNN-based Method for Learning Graph Structure 47 4.4 Dataset and Experimental Setup 49 4.4.1 Dataset 49 4.4.2 Experimental Design 50 4.4.3 Implementation Details 52 4.5 Experimental Results 53 4.5.1 Evaluation of ADR Detection 53 4.5.2 Newly-Described ADR Candidates 54 4.6 Conclusion 55 5 Knowledge-Augmented Deep Patient Representation 57 5.1 Related Work 60 5.1.1 Incorporating Prior Medical Knowledge for Clinical Outcome Prediction 60 5.1.2 Inductive KGC based on Subgraph Learning 61 5.2 Method 61 5.2.1 Extracting Personalized KG 61 5.2.2 KA-SAF: Knowledge-Augmented Self-Attentive Fusion Encoder 64 5.2.3 KGC as a Pre-training Task 68 5.2.4 Subgraph Infomax: SGI 69 5.3 Dataset and Experimental Setup 72 5.3.1 Clinical Outcome Prediction 72 5.3.2 Next Diagnosis Prediction 72 5.4 Experimental Results 73 5.4.1 Cardiovascular Disease Prediction 73 5.4.2 Next Diagnosis Prediction 73 5.4.3 KGC on SemMed KG 73 5.5 Conclusion 74 6 Conclusion 77 Abstract (In Korean) 90 Acknowlegement 92๋ฐ•
    corecore