822 research outputs found

    ๋”ฅ ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ๋ฅผ ํ™œ์šฉํ•œ ์˜ํ•™ ๊ฐœ๋… ๋ฐ ํ™˜์ž ํ‘œํ˜„ ํ•™์Šต๊ณผ ์˜๋ฃŒ ๋ฌธ์ œ์—์˜ ์‘์šฉ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€, 2022. 8. ์ •๊ต๋ฏผ.๋ณธ ํ•™์œ„ ๋…ผ๋ฌธ์€ ์ „๊ตญ๋ฏผ ์˜๋ฃŒ ๋ณดํ—˜๋ฐ์ดํ„ฐ์ธ ํ‘œ๋ณธ์ฝ”ํ˜ธํŠธDB๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋”ฅ ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ ๊ธฐ๋ฐ˜์˜ ์˜ํ•™ ๊ฐœ๋… ๋ฐ ํ™˜์ž ํ‘œํ˜„ ํ•™์Šต ๋ฐฉ๋ฒ•๊ณผ ์˜๋ฃŒ ๋ฌธ์ œ ํ•ด๊ฒฐ ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๋จผ์ € ์ˆœ์ฐจ์ ์ธ ํ™˜์ž ์˜๋ฃŒ ๊ธฐ๋ก๊ณผ ๊ฐœ์ธ ํ”„๋กœํŒŒ์ผ ์ •๋ณด๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ™˜์ž ํ‘œํ˜„์„ ํ•™์Šตํ•˜๊ณ  ํ–ฅํ›„ ์งˆ๋ณ‘ ์ง„๋‹จ ๊ฐ€๋Šฅ์„ฑ์„ ์˜ˆ์ธกํ•˜๋Š” ์žฌ๊ท€์‹ ๊ฒฝ๋ง ๋ชจ๋ธ์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ์šฐ๋ฆฌ๋Š” ๋‹ค์–‘ํ•œ ์„ฑ๊ฒฉ์˜ ํ™˜์ž ์ •๋ณด๋ฅผ ํšจ์œจ์ ์œผ๋กœ ํ˜ผํ•ฉํ•˜๋Š” ๊ตฌ์กฐ๋ฅผ ๋„์ž…ํ•˜์—ฌ ํฐ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ์–ป์—ˆ๋‹ค. ๋˜ํ•œ ํ™˜์ž์˜ ์˜๋ฃŒ ๊ธฐ๋ก์„ ์ด๋ฃจ๋Š” ์˜๋ฃŒ ์ฝ”๋“œ๋“ค์„ ๋ถ„์‚ฐ ํ‘œํ˜„์œผ๋กœ ๋‚˜ํƒ€๋‚ด ์ถ”๊ฐ€ ์„ฑ๋Šฅ ๊ฐœ์„ ์„ ์ด๋ฃจ์—ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ์˜๋ฃŒ ์ฝ”๋“œ์˜ ๋ถ„์‚ฐ ํ‘œํ˜„์ด ์ค‘์š”ํ•œ ์‹œ๊ฐ„์  ์ •๋ณด๋ฅผ ๋‹ด๊ณ  ์žˆ์Œ์„ ํ™•์ธํ•˜์˜€๊ณ , ์ด์–ด์ง€๋Š” ์—ฐ๊ตฌ์—์„œ๋Š” ์ด๋Ÿฌํ•œ ์‹œ๊ฐ„์  ์ •๋ณด๊ฐ€ ๊ฐ•ํ™”๋  ์ˆ˜ ์žˆ๋„๋ก ๊ทธ๋ž˜ํ”„ ๊ตฌ์กฐ๋ฅผ ๋„์ž…ํ•˜์˜€๋‹ค. ์šฐ๋ฆฌ๋Š” ์˜๋ฃŒ ์ฝ”๋“œ์˜ ๋ถ„์‚ฐ ํ‘œํ˜„ ๊ฐ„์˜ ์œ ์‚ฌ๋„์™€ ํ†ต๊ณ„์  ์ •๋ณด๋ฅผ ๊ฐ€์ง€๊ณ  ๊ทธ๋ž˜ํ”„๋ฅผ ๊ตฌ์ถ•ํ•˜์˜€๊ณ  ๊ทธ๋ž˜ํ”„ ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ๋ฅผ ํ™œ์šฉ, ์‹œ๊ฐ„/ํ†ต๊ณ„์  ์ •๋ณด๊ฐ€ ๊ฐ•ํ™”๋œ ์˜๋ฃŒ ์ฝ”๋“œ์˜ ํ‘œํ˜„ ๋ฒกํ„ฐ๋ฅผ ์–ป์—ˆ๋‹ค. ํš๋“ํ•œ ์˜๋ฃŒ ์ฝ”๋“œ ๋ฒกํ„ฐ๋ฅผ ํ†ตํ•ด ์‹œํŒ ์•ฝ๋ฌผ์˜ ์ž ์žฌ์ ์ธ ๋ถ€์ž‘์šฉ ์‹ ํ˜ธ๋ฅผ ํƒ์ง€ํ•˜๋Š” ๋ชจ๋ธ์„ ์ œ์•ˆํ•œ ๊ฒฐ๊ณผ, ๊ธฐ์กด์˜ ๋ถ€์ž‘์šฉ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์— ์กด์žฌํ•˜์ง€ ์•Š๋Š” ์‚ฌ๋ก€๊นŒ์ง€๋„ ์˜ˆ์ธกํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์˜€๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ ๋ถ„๋Ÿ‰์— ๋น„ํ•ด ์ฃผ์š” ์ •๋ณด๊ฐ€ ํฌ์†Œํ•˜๋‹ค๋Š” ์˜๋ฃŒ ๊ธฐ๋ก์˜ ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•ด ์ง€์‹๊ทธ๋ž˜ํ”„๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์‚ฌ์ „ ์˜ํ•™ ์ง€์‹์„ ๋ณด๊ฐ•ํ•˜์˜€๋‹ค. ์ด๋•Œ ํ™˜์ž์˜ ์˜๋ฃŒ ๊ธฐ๋ก์„ ๊ตฌ์„ฑํ•˜๋Š” ์ง€์‹๊ทธ๋ž˜ํ”„์˜ ๋ถ€๋ถ„๋งŒ์„ ์ถ”์ถœํ•˜์—ฌ ๊ฐœ์ธํ™”๋œ ์ง€์‹๊ทธ๋ž˜ํ”„๋ฅผ ๋งŒ๋“ค๊ณ  ๊ทธ๋ž˜ํ”„ ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ๋ฅผ ํ†ตํ•ด ๊ทธ๋ž˜ํ”„์˜ ํ‘œํ˜„ ๋ฒกํ„ฐ๋ฅผ ํš๋“ํ•˜์˜€๋‹ค. ์ตœ์ข…์ ์œผ๋กœ ์ˆœ์ฐจ์ ์ธ ์˜๋ฃŒ ๊ธฐ๋ก์„ ํ•จ์ถ•ํ•œ ํ™˜์ž ํ‘œํ˜„๊ณผ ๋”๋ถˆ์–ด ๊ฐœ์ธํ™”๋œ ์˜ํ•™ ์ง€์‹์„ ํ•จ์ถ•ํ•œ ํ‘œํ˜„์„ ํ•จ๊ป˜ ์‚ฌ์šฉํ•˜์—ฌ ํ–ฅํ›„ ์งˆ๋ณ‘ ๋ฐ ์ง„๋‹จ ์˜ˆ์ธก ๋ฌธ์ œ์— ํ™œ์šฉํ•˜์˜€๋‹ค.This dissertation proposes a deep neural network-based medical concept and patient representation learning methods using medical claims data to solve two healthcare tasks, i.e., clinical outcome prediction and post-marketing adverse drug reaction (ADR) signal detection. First, we propose SAF-RNN, a Recurrent Neural Network (RNN)-based model that learns a deep patient representation based on the clinical sequences and patient characteristics. Our proposed model fuses different types of patient records using feature-based gating and self-attention. We demonstrate that high-level associations between two heterogeneous records are effectively extracted by our model, thus achieving state-of-the-art performances for predicting the risk probability of cardiovascular disease. Secondly, based on the observation that the distributed medical code embeddings represent temporal proximity between the medical codes, we introduce a graph structure to enhance the code embeddings with such temporal information. We construct a graph using the distributed code embeddings and the statistical information from the claims data. We then propose the Graph Neural Network(GNN)-based representation learning for post-marketing ADR detection. Our model shows competitive performances and provides valid ADR candidates. Finally, rather than using patient records alone, we utilize a knowledge graph to augment the patient representation with prior medical knowledge. Using SAF-RNN and GNN, the deep patient representation is learned from the clinical sequences and the personalized medical knowledge. It is then used to predict clinical outcomes, i.e., next diagnosis prediction and CVD risk prediction, resulting in state-of-the-art performances.1 Introduction 1 2 Background 8 2.1 Medical Concept Embedding 8 2.2 Encoding Sequential Information in Clinical Records 11 3 Deep Patient Representation with Heterogeneous Information 14 3.1 Related Work 16 3.2 Problem Statement 19 3.3 Method 20 3.3.1 RNN-based Disease Prediction Model 20 3.3.2 Self-Attentive Fusion (SAF) Encoder 23 3.4 Dataset and Experimental Setup 24 3.4.1 Dataset 24 3.4.2 Experimental Design 26 ii 3.4.3 Implementation Details 27 3.5 Experimental Results 28 3.5.1 Evaluation of CVD Prediction 28 3.5.2 Sensitivity Analysis 28 3.5.3 Ablation Studies 31 3.6 Further Investigation 32 3.6.1 Case Study: Patient-Centered Analysis 32 3.6.2 Data-Driven CVD Risk Factors 32 3.7 Conclusion 33 4 Graph-Enhanced Medical Concept Embedding 40 4.1 Related Work 42 4.2 Problem Statement 43 4.3 Method 44 4.3.1 Code Embedding Learning with Skip-gram Model 44 4.3.2 Drug-disease Graph Construction 45 4.3.3 A GNN-based Method for Learning Graph Structure 47 4.4 Dataset and Experimental Setup 49 4.4.1 Dataset 49 4.4.2 Experimental Design 50 4.4.3 Implementation Details 52 4.5 Experimental Results 53 4.5.1 Evaluation of ADR Detection 53 4.5.2 Newly-Described ADR Candidates 54 4.6 Conclusion 55 5 Knowledge-Augmented Deep Patient Representation 57 5.1 Related Work 60 5.1.1 Incorporating Prior Medical Knowledge for Clinical Outcome Prediction 60 5.1.2 Inductive KGC based on Subgraph Learning 61 5.2 Method 61 5.2.1 Extracting Personalized KG 61 5.2.2 KA-SAF: Knowledge-Augmented Self-Attentive Fusion Encoder 64 5.2.3 KGC as a Pre-training Task 68 5.2.4 Subgraph Infomax: SGI 69 5.3 Dataset and Experimental Setup 72 5.3.1 Clinical Outcome Prediction 72 5.3.2 Next Diagnosis Prediction 72 5.4 Experimental Results 73 5.4.1 Cardiovascular Disease Prediction 73 5.4.2 Next Diagnosis Prediction 73 5.4.3 KGC on SemMed KG 73 5.5 Conclusion 74 6 Conclusion 77 Abstract (In Korean) 90 Acknowlegement 92๋ฐ•

    Three Essays on Enhancing Clinical Trial Subject Recruitment Using Natural Language Processing and Text Mining

    Get PDF
    Patient recruitment and enrollment are critical factors for a successful clinical trial; however, recruitment tends to be the most common problem in most clinical trials. The success of a clinical trial depends on efficiently recruiting suitable patients to conduct the trial. Every clinical trial research has a protocol, which describes what will be done in the study and how it will be conducted. Also, the protocol ensures the safety of the trial subjects and the integrity of the data collected. The eligibility criteria section of clinical trial protocols is important because it specifies the necessary conditions that participants have to satisfy. Since clinical trial eligibility criteria are usually written in free text form, they are not computer interpretable. To automate the analysis of the eligibility criteria, it is therefore necessary to transform those criteria into a computer-interpretable format. Unstructured format of eligibility criteria additionally create search efficiency issues. Thus, searching and selecting appropriate clinical trials for a patient from relatively large number of available trials is a complex task. A few attempts have been made to automate the matching process between patients and clinical trials. However, those attempts have not fully integrated the entire matching process and have not exploited the state-of-the-art Natural Language Processing (NLP) techniques that may improve the matching performance. Given the importance of patient recruitment in clinical trial research, the objective of this research is to automate the matching process using NLP and text mining techniques and, thereby, improve the efficiency and effectiveness of the recruitment process. This dissertation research, which comprises three essays, investigates the issues of clinical trial subject recruitment using state-of-the-art NLP and text mining techniques. Essay 1: Building a Domain-Specific Lexicon for Clinical Trial Subject Eligibility Analysis Essay 2: Clustering Clinical Trials Using Semantic-Based Feature Expansion Essay 3: An Automatic Matching Process of Clinical Trial Subject Recruitment In essay1, I develop a domain-specific lexicon for n-gram Named Entity Recognition (NER) in the breast cancer domain. The domain-specific dictionary is used for selection and reduction of n-gram features in clustering in eassy2. The domain-specific dictionary was evaluated by comparing it with Systematized Nomenclature of Medicine--Clinical Terms (SNOMED CT). The results showed that it add significant number of new terms which is very useful in effective natural language processing In essay 2, I explore the clustering of similar clinical trials using the domain-specific lexicon and term expansion using synonym from the Unified Medical Language System (UMLS). I generate word n-gram features and modify the features with the domain-specific dictionary matching process. In order to resolve semantic ambiguity, a semantic-based feature expansion technique using UMLS is applied. A hierarchical agglomerative clustering algorithm is used to generate clinical trial clusters. The focus is on summarization of clinical trial information in order to enhance trial search efficiency. Finally, in essay 3, I investigate an automatic matching process of clinical trial clusters and patient medical records. The patient records collected from a prior study were used to test our approach. The patient records were pre-processed by tokenization and lemmatization. The pre-processed patient information were then further enhanced by matching with breast cancer custom dictionary described in essay 1 and semantic feature expansion using UMLS Metathesaurus. Finally, I matched the patient record with clinical trial clusters to select the best matched cluster(s) and then with trials within the clusters. The matching results were evaluated by internal expert as well as external medical expert

    Automated Injection of Curated Knowledge Into Real-Time Clinical Systems: CDS Architecture for the 21st Century

    Get PDF
    abstract: Clinical Decision Support (CDS) is primarily associated with alerts, reminders, order entry, rule-based invocation, diagnostic aids, and on-demand information retrieval. While valuable, these foci have been in production use for decades, and do not provide a broader, interoperable means of plugging structured clinical knowledge into live electronic health record (EHR) ecosystems for purposes of orchestrating the user experiences of patients and clinicians. To date, the gap between knowledge representation and user-facing EHR integration has been considered an โ€œimplementation concernโ€ requiring unscalable manual human efforts and governance coordination. Drafting a questionnaire engineered to meet the specifications of the HL7 CDS Knowledge Artifact specification, for example, carries no reasonable expectation that it may be imported and deployed into a live system without significant burdens. Dramatic reduction of the time and effort gap in the research and application cycle could be revolutionary. Doing so, however, requires both a floor-to-ceiling precoordination of functional boundaries in the knowledge management lifecycle, as well as formalization of the human processes by which this occurs. This research introduces ARTAKA: Architecture for Real-Time Application of Knowledge Artifacts, as a concrete floor-to-ceiling technological blueprint for both provider heath IT (HIT) and vendor organizations to incrementally introduce value into existing systems dynamically. This is made possible by service-ization of curated knowledge artifacts, then injected into a highly scalable backend infrastructure by automated orchestration through public marketplaces. Supplementary examples of client app integration are also provided. Compilation of knowledge into platform-specific form has been left flexible, in so far as implementations comply with ARTAKAโ€™s Context Event Service (CES) communication and Health Services Platform (HSP) Marketplace service packaging standards. Towards the goal of interoperable human processes, ARTAKAโ€™s treatment of knowledge artifacts as a specialized form of software allows knowledge engineers to operate as a type of software engineering practice. Thus, nearly a century of software development processes, tools, policies, and lessons offer immediate benefit: in some cases, with remarkable parity. Analyses of experimentation is provided with guidelines in how choice aspects of software development life cycles (SDLCs) apply to knowledge artifact development in an ARTAKA environment. Portions of this culminating document have been further initiated with Standards Developing Organizations (SDOs) intended to ultimately produce normative standards, as have active relationships with other bodies.Dissertation/ThesisDoctoral Dissertation Biomedical Informatics 201

    The Human Phenotype Ontology in 2024: phenotypes around the world

    Get PDF
    \ua9 The Author(s) 2023. Published by Oxford University Press on behalf of Nucleic Acids Research. The Human Phenotype Ontology (HPO) is a widely used resource that comprehensively organizes and defines the phenotypic features of human disease, enabling computational inference and supporting genomic and phenotypic analyses through semantic similarity and machine learning algorithms. The HPO has widespread applications in clinical diagnostics and translational research, including genomic diagnostics, gene-disease discovery, and cohort analytics. In recent years, groups around the world have developed translations of the HPO from English to other languages, and the HPO browser has been internationalized, allowing users to view HPO term labels and in many cases synonyms and definitions in ten languages in addition to English. Since our last report, a total of 2239 new HPO terms and 49235 new HPO annotations were developed, many in collaboration with external groups in the fields of psychiatry, arthrogryposis, immunology and cardiology. The Medical Action Ontology (MAxO) is a new effort to model treatments and other measures taken for clinical management. Finally, the HPO consortium is contributing to efforts to integrate the HPO and the GA4GH Phenopacket Schema into electronic health records (EHRs) with the goal of more standardized and computable integration of rare disease data in EHRs

    The Human Phenotype Ontology in 2024: phenotypes around the world.

    Get PDF
    The Human Phenotype Ontology (HPO) is a widely used resource that comprehensively organizes and defines the phenotypic features of human disease, enabling computational inference and supporting genomic and phenotypic analyses through semantic similarity and machine learning algorithms. The HPO has widespread applications in clinical diagnostics and translational research, including genomic diagnostics, gene-disease discovery, and cohort analytics. In recent years, groups around the world have developed translations of the HPO from English to other languages, and the HPO browser has been internationalized, allowing users to view HPO term labels and in many cases synonyms and definitions in ten languages in addition to English. Since our last report, a total of 2239 new HPO terms and 49235 new HPO annotations were developed, many in collaboration with external groups in the fields of psychiatry, arthrogryposis, immunology and cardiology. The Medical Action Ontology (MAxO) is a new effort to model treatments and other measures taken for clinical management. Finally, the HPO consortium is contributing to efforts to integrate the HPO and the GA4GH Phenopacket Schema into electronic health records (EHRs) with the goal of more standardized and computable integration of rare disease data in EHRs

    A Two-Level Information Modelling Translation Methodology and Framework to Achieve Semantic Interoperability in Constrained GeoObservational Sensor Systems

    Get PDF
    As geographical observational data capture, storage and sharing technologies such as in situ remote monitoring systems and spatial data infrastructures evolve, the vision of a Digital Earth, first articulated by Al Gore in 1998 is getting ever closer. However, there are still many challenges and open research questions. For example, data quality, provenance and heterogeneity remain an issue due to the complexity of geo-spatial data and information representation. Observational data are often inadequately semantically enriched by geo-observational information systems or spatial data infrastructures and so they often do not fully capture the true meaning of the associated datasets. Furthermore, data models underpinning these information systems are typically too rigid in their data representation to allow for the ever-changing and evolving nature of geo-spatial domain concepts. This impoverished approach to observational data representation reduces the ability of multi-disciplinary practitioners to share information in an interoperable and computable way. The health domain experiences similar challenges with representing complex and evolving domain information concepts. Within any complex domain (such as Earth system science or health) two categories or levels of domain concepts exist. Those concepts that remain stable over a long period of time, and those concepts that are prone to change, as the domain knowledge evolves, and new discoveries are made. Health informaticians have developed a sophisticated two-level modelling systems design approach for electronic health documentation over many years, and with the use of archetypes, have shown how data, information, and knowledge interoperability among heterogenous systems can be achieved. This research investigates whether two-level modelling can be translated from the health domain to the geo-spatial domain and applied to observing scenarios to achieve semantic interoperability within and between spatial data infrastructures, beyond what is possible with current state-of-the-art approaches. A detailed review of state-of-the-art SDIs, geo-spatial standards and the two-level modelling methodology was performed. A cross-domain translation methodology was developed, and a proof-of-concept geo-spatial two-level modelling framework was defined and implemented. The Open Geospatial Consortiumโ€™s (OGC) Observations & Measurements (O&M) standard was re-profiled to aid investigation of the two-level information modelling approach. An evaluation of the method was undertaken using II specific use-case scenarios. Information modelling was performed using the two-level modelling method to show how existing historical ocean observing datasets can be expressed semantically and harmonized using two-level modelling. Also, the flexibility of the approach was investigated by applying the method to an air quality monitoring scenario using a technologically constrained monitoring sensor system. This work has demonstrated that two-level modelling can be translated to the geospatial domain and then further developed to be used within a constrained technological sensor system; using traditional wireless sensor networks, semantic web technologies and Internet of Things based technologies. Domain specific evaluation results show that twolevel modelling presents a viable approach to achieve semantic interoperability between constrained geo-observational sensor systems and spatial data infrastructures for ocean observing and city based air quality observing scenarios. This has been demonstrated through the re-purposing of selected, existing geospatial data models and standards. However, it was found that re-using existing standards requires careful ontological analysis per domain concept and so caution is recommended in assuming the wider applicability of the approach. While the benefits of adopting a two-level information modelling approach to geospatial information modelling are potentially great, it was found that translation to a new domain is complex. The complexity of the approach was found to be a barrier to adoption, especially in commercial based projects where standards implementation is low on implementation road maps and the perceived benefits of standards adherence are low. Arising from this work, a novel set of base software components, methods and fundamental geo-archetypes have been developed. However, during this work it was not possible to form the required rich community of supporters to fully validate geoarchetypes. Therefore, the findings of this work are not exhaustive, and the archetype models produced are only indicative. The findings of this work can be used as the basis to encourage further investigation and uptake of two-level modelling within the Earth system science and geo-spatial domain. Ultimately, the outcomes of this work are to recommend further development and evaluation of the approach, building on the positive results thus far, and the base software artefacts developed to support the approach
    • โ€ฆ
    corecore