12 research outputs found

    Extracting diagnostic knowledge from MedLine Plus: a comparison between MetaMap and cTAKES Approaches

    Get PDF
    The development of diagnostic decision support systems (DDSS) requires having a reliable and consistent knowledge base about diseases and their symptoms, signs and diagnostic tests. Physicians are typically the source of this knowledge, but it is not always possible to obtain all the desired information from them. Other valuable sources are medical books and articles describing the diagnosis of diseases, but again, extracting this information is a hard and time-consuming task. In this paper we present the results of our research, in which we have used Web scraping, natural language processing techniques, a variety of publicly available sources of diagnostic knowledge and two widely known medical concept identifiers, MetaMap and cTAKES, to extract diagnostic criteria for infectious diseases from MedLine Plus articles. A performance comparison of MetaMap and cTAKES is also presented

    Disease Name Extraction from Clinical Text Using Conditional Random Fields

    Get PDF
    The aim of the research done in this thesis was to extract disease and disorder names from clinical texts. We utilized Conditional Random Fields (CRF) as the main method to label diseases and disorders in clinical sentences. We used some other tools such as MetaMap and Stanford Core NLP tool to extract some crucial features. MetaMap tool was used to identify names of diseases/disorders that are already in UMLS Metathesaurus. Some other important features such as lemmatized versions of words, and POS tags were extracted using the Stanford Core NLP tool. Some more features were extracted directly from UMLS Metathesaurus, including semantic types of words. We participated in the SemEval 2014 competition\u27s Task 7 and used its provided data to train and evaluate our system. Training data contained 199 clinical texts, development data contained 99 clinical texts, and the test data contained 133 clinical texts, these included discharge summaries, echocardiogram, radiology, and ECG reports. We obtained competitive results on the disease/disorder name extraction task. We found through ablation study that while all features contributed, MetaMap matches, POS tags, and previous and next words were the most effective features

    Improving Broad-Coverage Medical Entity Linking with Semantic Type Prediction and Large-Scale Datasets

    Get PDF
    Medical entity linking is the task of identifying and standardizing medical concepts referred to in an unstructured text. Most of the existing methods adopt a three-step approach of (1) detecting mentions, (2) generating a list of candidate concepts, and finally (3) picking the best concept among them. In this paper, we probe into alleviating the problem of overgeneration of candidate concepts in the candidate generation module, the most under-studied component of medical entity linking. For this, we present MedType, a fully modular system that prunes out irrelevant candidate concepts based on the predicted semantic type of an entity mention. We incorporate MedType into five off-the-shelf toolkits for medical entity linking and demonstrate that it consistently improves entity linking performance across several benchmark datasets. To address the dearth of annotated training data for medical entity linking, we present WikiMed and PubMedDS, two large-scale medical entity linking datasets, and demonstrate that pre-training MedType on these datasets further improves entity linking performance. We make our source code and datasets publicly available for medical entity linking research.Comment: 35 page

    A semantic-driven framework for IT support of clinical laboratory standards

    Get PDF
    The clinical laboratory plays a critical role in the delivery of care within the healthcare system by providing services that support accurate and timely diagnosis of diseases. The clinical laboratory relies on standard operating procedures (SOP) to provide information and guidance on the laboratory procedures. To ensure an excellent standard of clinical laboratory services, SOPs need to be of high quality, and practitioners need to have easy access to information contained within the SOPs. However, we argue in this thesis that there is a lack of standardization within clinical laboratory SOPs, and machines and human practitioners have difficulties accessing or using the content of SOPs. This thesis proposes a solution to challenges regarding the representation and use of SOPs in clinical laboratories (see Chapter 1). The research work in this thesis is based on the most up-to-date technological, theoretical, and empirical approaches (see Chapter 2). Additionally, external researchers have already utilized the outcome of this research for various purposes (see Chapter 5). In this thesis, we present the SmartSOP framework, a semantic-driven framework, that supports the representation of clinical laboratory procedure concepts in a standardised format for use within software applications. The SmartSOP framework consists of three main components, the Ontology for Clinical Laboratory SOP (OCL-SOP), the translation engine that converts free text SOPs to a standardised format, and a mobile application to provide lab practitioners with easy access to SOPs (see Chapters 3 and 4). We used the design science approach for the execution of this research work

    Improving broad-coverage medical entity linking with semantic type prediction and large-scale datasets

    Get PDF
    Objectives Biomedical natural language processing tools are increasingly being applied for broad-coverage information extraction—extracting medical information of all types in a scientific document or a clinical note. In such broad-coverage settings, linking mentions of medical concepts to standardized vocabularies requires choosing the best candidate concepts from large inventories covering dozens of types. This study presents a novel semantic type prediction module for biomedical NLP pipelines and two automatically-constructed, large-scale datasets with broad coverage of semantic types. Methods We experiment with five off-the-shelf biomedical NLP toolkits on four benchmark datasets for medical information extraction from scientific literature and clinical notes. All toolkits adopt a staged approach of mention detection followed by two stages of medical entity linking: (1) generating a list of candidate concepts, and (2) picking the best concept among them. We introduce a semantic type prediction module to alleviate the problem of overgeneration of candidate concepts by filtering out irrelevant candidate concepts based on the predicted semantic type of a mention. We present MedType, a fully modular semantic type prediction model which we integrate into the existing NLP toolkits. To address the dearth of broad-coverage training data for medical information extraction, we further present WikiMed and PubMedDS, two large-scale datasets for medical entity linking. Results Semantic type filtering improves medical entity linking performance across all toolkits and datasets, often by several percentage points of F-1. Further, pretraining MedType on our novel datasets achieves state-of-the-art performance for semantic type prediction in biomedical text. Conclusions Semantic type prediction is a key part of building accurate NLP pipelines for broad-coverage information extraction from biomedical text. We make our source code and novel datasets publicly available to foster reproducible research

    Transforming epilepsy research: A systematic review on natural language processing applications

    Get PDF
    Despite improved ancillary investigations in epilepsy care, patients' narratives remain indispensable for diagnosing and treatment monitoring. This wealth of information is typically stored in electronic health records and accumulated in medical journals in an unstructured manner, thereby restricting complete utilization in clinical decision-making. To this end, clinical researchers increasing apply natural language processing (NLP)—a branch of artificial intelligence—as it removes ambiguity, derives context, and imbues standardized meaning from free-narrative clinical texts. This systematic review presents an overview of the current NLP applications in epilepsy and discusses the opportunities and drawbacks of NLP alongside its future implications. We searched the PubMed and Embase databases with a “natural language processing” and “epilepsy” query (March 4, 2022) and included original research articles describing the application of NLP techniques for textual analysis in epilepsy. Twenty-six studies were included. Fifty-eight percent of these studies used NLP to classify clinical records into predefined categories, improving patient identification and treatment decisions. Other applications of NLP had structured clinical information retrieval from electronic health records, scientific papers, and online posts of patients. Challenges and opportunities of NLP applications for enhancing epilepsy care and research are discussed. The field could further benefit from NLP by replicating successes in other health care domains, such as NLP-aided quality evaluation for clinical decision-making, outcome prediction, and clinical record summarization

    Transforming epilepsy research: A systematic review on natural language processing applications

    Get PDF
    Despite improved ancillary investigations in epilepsy care, patients' narratives remain indispensable for diagnosing and treatment monitoring. This wealth of information is typically stored in electronic health records and accumulated in medical journals in an unstructured manner, thereby restricting complete utilization in clinical decision-making. To this end, clinical researchers increasing apply natural language processing (NLP)—a branch of artificial intelligence—as it removes ambiguity, derives context, and imbues standardized meaning from free-narrative clinical texts. This systematic review presents an overview of the current NLP applications in epilepsy and discusses the opportunities and drawbacks of NLP alongside its future implications. We searched the PubMed and Embase databases with a “natural language processing” and “epilepsy” query (March 4, 2022) and included original research articles describing the application of NLP techniques for textual analysis in epilepsy. Twenty-six studies were included. Fifty-eight percent of these studies used NLP to classify clinical records into predefined categories, improving patient identification and treatment decisions. Other applications of NLP had structured clinical information retrieval from electronic health records, scientific papers, and online posts of patients. Challenges and opportunities of NLP applications for enhancing epilepsy care and research are discussed. The field could further benefit from NLP by replicating successes in other health care domains, such as NLP-aided quality evaluation for clinical decision-making, outcome prediction, and clinical record summarization

    Methods and Techniques for Clinical Text Modeling and Analytics

    Get PDF
    Nowadays, a large portion of clinical data only exists in free text. The wide adoption of Electronic Health Records (EHRs) has enabled the increases in accessing to clinical documents, which provide challenges and opportunities for clinical Natural Language Processing (NLP) researchers. Given free-text clinical notes as input, an ideal system for clinical text understanding should have the ability to support clinical decisions. At corpus level, the system should recommend similar notes based on disease or patient types, and provide medication recommendation, or any other type of recommendations, based on patients' symptoms and other similar medical cases. At document level, it should return a list of important clinical concepts. Moreover, the system should be able to make diagnostic inferences over clinical concepts and output diagnosis. Unfortunately, current work has not systematically studied this system. This study focuses on developing and applying methods/techniques in different aspects of the system for clinical text understanding, at both corpus and document level. We deal with two major research questions: First, we explore the question of How to model the underlying relationships from clinical notes at corpus level? Documents clustering methods can group clinical notes into meaningful clusters, which can assist physicians and patients to understand medical conditions and diseases from clinical notes. We use Nonnegative Matrix Factorization (NMF) and Multi-view NMF to cluster clinical notes based on extracted medical concepts. The clustering results display latent patterns existed among clinical notes. Our method provides a feasible way to visualize a corpus of clinical documents. Based on extracted concepts, we further build a symptom-medication (Symp-Med) graph to model the Symp-Med relations in clinical notes corpus. We develop two Symp-Med matching algorithms to predict and recommend medications for patients based on their symptoms. Second, we want to solve the question of How to integrate structured knowledge with unstructured text to improve results for Clinical NLP tasks? On the one hand, the unstructured clinical text contains lots of information about medical conditions. On the other hand, structured Knowledge Bases (KBs) are frequently used for supporting clinical NLP tasks. We propose graph-regularized word embedding models to integrate knowledge from both KBs and free text. We evaluate our models on standard datasets and biomedical NLP tasks, and results showed encouraging improvements on both datasets. We further apply the graph-regularized word embedding models and present a novel approach to automatically infer the most probable diagnosis from a given clinical narrative.Ph.D., Information Studies -- Drexel University, 201

    Técnicas de anotación semántica orientadas a mejorar el acceso e interpretación de la información clínica

    Get PDF
    Hoy en día, los sistemas de salud incluyen como prioridades la prevención de enfermedades, el incremento de la esperanza de vida, la mejora de la calidad de vida y la reducción de las admisiones en los servicios de emergencia. Para alcanzar estos retos, es necesario adaptar los sistemas de información actuales, dado que la fragmentación de la información del paciente en diferentes lugares y formatos dificulta enormemente su acceso y procesamiento adecuados. Por ello, los sistemas informáticos sanitarios deben ser capaces, primero, de intercambiar datos entre todas las unidades que los integran y, segundo, de tener la habilidad para interpretar la información presente en los datos que intercambian, tanto en el contexto correcto como en un tiempo razonable. Para alcanzar tal fin, en esta tesis doctoral se propone anotar semánticamente las diferentes colecciones de información clínica, usando las terminologías más apropiadas. Para demostrar nuestra hipótesis, nos centramos en dos recursos: la historia clínica electrónica del paciente (HCE), que hoy en día se considera una pieza clave para la prestación eficiente y de calidad de los servicios sanitarios y da acceso a la información del paciente, y en las guías de práctica clínica, que constituyen una fuente importante de conocimiento sobre las recomendaciones diagnósticas y terapéuticas basadas en la evidencia. La tesis demuestra que es factible el desarrollo de técnicas automatizadas para anotar semánticamente, por una parte, los modelos clínicos que formalizan las HCE y que suponen una arquitectura para la comunicación e intercambio de datos de HCE y, por otra parte, los textos que describen las guías clínicas textuales y, por lo tanto, el conocimiento que consultará el clínico con una precisión y fiabilidad elevada
    corecore