12 research outputs found
Extracting diagnostic knowledge from MedLine Plus: a comparison between MetaMap and cTAKES Approaches
The development of diagnostic decision support systems (DDSS) requires having a reliable and
consistent knowledge base about diseases and their symptoms, signs and diagnostic tests. Physicians are
typically the source of this knowledge, but it is not always possible to obtain all the desired information from
them. Other valuable sources are medical books and articles describing the diagnosis of diseases, but again, extracting this
information is a hard and time-consuming task. In this paper we present the results of our research, in which we have used
Web scraping, natural language processing techniques, a variety of publicly available sources of diagnostic knowledge
and two widely known medical concept identifiers, MetaMap and cTAKES, to extract diagnostic criteria for infectious
diseases from MedLine Plus articles. A performance comparison of MetaMap and cTAKES is also presented
Disease Name Extraction from Clinical Text Using Conditional Random Fields
The aim of the research done in this thesis was to extract disease and disorder names from clinical texts. We utilized Conditional Random Fields (CRF) as the main method to label diseases and disorders in clinical sentences. We used some other tools such as MetaMap and Stanford Core NLP tool to extract some crucial features. MetaMap tool was used to identify names of diseases/disorders that are already in UMLS Metathesaurus. Some other important features such as lemmatized versions of words, and POS tags were extracted using the Stanford Core NLP tool. Some more features were extracted directly from UMLS Metathesaurus, including semantic types of words. We participated in the SemEval 2014 competition\u27s Task 7 and used its provided data to train and evaluate our system. Training data contained 199 clinical texts, development data contained 99 clinical texts, and the test data contained 133 clinical texts, these included discharge summaries, echocardiogram, radiology, and ECG reports. We obtained competitive results on the disease/disorder name extraction task. We found through ablation study that while all features contributed, MetaMap matches, POS tags, and previous and next words were the most effective features
Improving Broad-Coverage Medical Entity Linking with Semantic Type Prediction and Large-Scale Datasets
Medical entity linking is the task of identifying and standardizing medical
concepts referred to in an unstructured text. Most of the existing methods
adopt a three-step approach of (1) detecting mentions, (2) generating a list of
candidate concepts, and finally (3) picking the best concept among them. In
this paper, we probe into alleviating the problem of overgeneration of
candidate concepts in the candidate generation module, the most under-studied
component of medical entity linking. For this, we present MedType, a fully
modular system that prunes out irrelevant candidate concepts based on the
predicted semantic type of an entity mention. We incorporate MedType into five
off-the-shelf toolkits for medical entity linking and demonstrate that it
consistently improves entity linking performance across several benchmark
datasets. To address the dearth of annotated training data for medical entity
linking, we present WikiMed and PubMedDS, two large-scale medical entity
linking datasets, and demonstrate that pre-training MedType on these datasets
further improves entity linking performance. We make our source code and
datasets publicly available for medical entity linking research.Comment: 35 page
A semantic-driven framework for IT support of clinical laboratory standards
The clinical laboratory plays a critical role in the delivery of care within the healthcare system by providing services that support accurate and timely diagnosis of diseases. The clinical laboratory relies on standard operating procedures (SOP) to provide information and guidance on the laboratory procedures. To ensure an excellent standard of clinical laboratory services, SOPs need to be of high quality, and practitioners need to have easy access to information contained within the SOPs. However, we argue in this thesis that there is a lack of standardization within clinical laboratory SOPs, and machines and human practitioners have difficulties accessing or using the content of SOPs.
This thesis proposes a solution to challenges regarding the representation and use of SOPs in clinical laboratories (see Chapter 1). The research work in this thesis is based on the most up-to-date technological, theoretical, and empirical approaches (see Chapter 2). Additionally, external researchers have already utilized the outcome of this research for various purposes (see Chapter 5). In this thesis, we present the SmartSOP framework, a semantic-driven framework, that supports the representation of clinical laboratory procedure concepts in a standardised format for use within software applications. The SmartSOP framework consists of three main components, the Ontology for Clinical Laboratory SOP (OCL-SOP), the translation engine that converts free text SOPs to a standardised format, and a mobile application to provide lab practitioners with easy access to SOPs (see Chapters 3 and 4). We used the design science approach for the execution of this research work
Improving broad-coverage medical entity linking with semantic type prediction and large-scale datasets
Objectives
Biomedical natural language processing tools are increasingly being applied for broad-coverage information extraction—extracting medical information of all types in a scientific document or a clinical note. In such broad-coverage settings, linking mentions of medical concepts to standardized vocabularies requires choosing the best candidate concepts from large inventories covering dozens of types. This study presents a novel semantic type prediction module for biomedical NLP pipelines and two automatically-constructed, large-scale datasets with broad coverage of semantic types.
Methods
We experiment with five off-the-shelf biomedical NLP toolkits on four benchmark datasets for medical information extraction from scientific literature and clinical notes. All toolkits adopt a staged approach of mention detection followed by two stages of medical entity linking: (1) generating a list of candidate concepts, and (2) picking the best concept among them. We introduce a semantic type prediction module to alleviate the problem of overgeneration of candidate concepts by filtering out irrelevant candidate concepts based on the predicted semantic type of a mention. We present MedType, a fully modular semantic type prediction model which we integrate into the existing NLP toolkits. To address the dearth of broad-coverage training data for medical information extraction, we further present WikiMed and PubMedDS, two large-scale datasets for medical entity linking.
Results
Semantic type filtering improves medical entity linking performance across all toolkits and datasets, often by several percentage points of F-1. Further, pretraining MedType on our novel datasets achieves state-of-the-art performance for semantic type prediction in biomedical text.
Conclusions
Semantic type prediction is a key part of building accurate NLP pipelines for broad-coverage information extraction from biomedical text. We make our source code and novel datasets publicly available to foster reproducible research
Transforming epilepsy research: A systematic review on natural language processing applications
Despite improved ancillary investigations in epilepsy care, patients' narratives remain indispensable for diagnosing and treatment monitoring. This wealth of information is typically stored in electronic health records and accumulated in medical journals in an unstructured manner, thereby restricting complete utilization in clinical decision-making. To this end, clinical researchers increasing apply natural language processing (NLP)—a branch of artificial intelligence—as it removes ambiguity, derives context, and imbues standardized meaning from free-narrative clinical texts. This systematic review presents an overview of the current NLP applications in epilepsy and discusses the opportunities and drawbacks of NLP alongside its future implications. We searched the PubMed and Embase databases with a “natural language processing” and “epilepsy” query (March 4, 2022) and included original research articles describing the application of NLP techniques for textual analysis in epilepsy. Twenty-six studies were included. Fifty-eight percent of these studies used NLP to classify clinical records into predefined categories, improving patient identification and treatment decisions. Other applications of NLP had structured clinical information retrieval from electronic health records, scientific papers, and online posts of patients. Challenges and opportunities of NLP applications for enhancing epilepsy care and research are discussed. The field could further benefit from NLP by replicating successes in other health care domains, such as NLP-aided quality evaluation for clinical decision-making, outcome prediction, and clinical record summarization
Transforming epilepsy research: A systematic review on natural language processing applications
Despite improved ancillary investigations in epilepsy care, patients' narratives remain indispensable for diagnosing and treatment monitoring. This wealth of information is typically stored in electronic health records and accumulated in medical journals in an unstructured manner, thereby restricting complete utilization in clinical decision-making. To this end, clinical researchers increasing apply natural language processing (NLP)—a branch of artificial intelligence—as it removes ambiguity, derives context, and imbues standardized meaning from free-narrative clinical texts. This systematic review presents an overview of the current NLP applications in epilepsy and discusses the opportunities and drawbacks of NLP alongside its future implications. We searched the PubMed and Embase databases with a “natural language processing” and “epilepsy” query (March 4, 2022) and included original research articles describing the application of NLP techniques for textual analysis in epilepsy. Twenty-six studies were included. Fifty-eight percent of these studies used NLP to classify clinical records into predefined categories, improving patient identification and treatment decisions. Other applications of NLP had structured clinical information retrieval from electronic health records, scientific papers, and online posts of patients. Challenges and opportunities of NLP applications for enhancing epilepsy care and research are discussed. The field could further benefit from NLP by replicating successes in other health care domains, such as NLP-aided quality evaluation for clinical decision-making, outcome prediction, and clinical record summarization
Methods and Techniques for Clinical Text Modeling and Analytics
Nowadays, a large portion of clinical data only exists in free text. The wide adoption of Electronic Health Records (EHRs) has enabled the increases in accessing to clinical documents, which provide challenges and opportunities for clinical Natural Language Processing (NLP) researchers. Given free-text clinical notes as input, an ideal system for clinical text understanding should have the ability to support clinical decisions. At corpus level, the system should recommend similar notes based on disease or patient types, and provide medication recommendation, or any other type of recommendations, based on patients' symptoms and other similar medical cases. At document level, it should return a list of important clinical concepts. Moreover, the system should be able to make diagnostic inferences over clinical concepts and output diagnosis. Unfortunately, current work has not systematically studied this system. This study focuses on developing and applying methods/techniques in different aspects of the system for clinical text understanding, at both corpus and document level. We deal with two major research questions: First, we explore the question of How to model the underlying relationships from clinical notes at corpus level? Documents clustering methods can group clinical notes into meaningful clusters, which can assist physicians and patients to understand medical conditions and diseases from clinical notes. We use Nonnegative Matrix Factorization (NMF) and Multi-view NMF to cluster clinical notes based on extracted medical concepts. The clustering results display latent patterns existed among clinical notes. Our method provides a feasible way to visualize a corpus of clinical documents. Based on extracted concepts, we further build a symptom-medication (Symp-Med) graph to model the Symp-Med relations in clinical notes corpus. We develop two Symp-Med matching algorithms to predict and recommend medications for patients based on their symptoms. Second, we want to solve the question of How to integrate structured knowledge with unstructured text to improve results for Clinical NLP tasks? On the one hand, the unstructured clinical text contains lots of information about medical conditions. On the other hand, structured Knowledge Bases (KBs) are frequently used for supporting clinical NLP tasks. We propose graph-regularized word embedding models to integrate knowledge from both KBs and free text. We evaluate our models on standard datasets and biomedical NLP tasks, and results showed encouraging improvements on both datasets. We further apply the graph-regularized word embedding models and present a novel approach to automatically infer the most probable diagnosis from a given clinical narrative.Ph.D., Information Studies -- Drexel University, 201
Técnicas de anotación semántica orientadas a mejorar el acceso e interpretación de la información clínica
Hoy en día, los sistemas de salud incluyen como prioridades la
prevención de enfermedades, el incremento de la esperanza de vida, la
mejora de la calidad de vida y la reducción de las admisiones en los
servicios de emergencia. Para alcanzar estos retos, es necesario adaptar los
sistemas de información actuales, dado que la fragmentación de la
información del paciente en diferentes lugares y formatos dificulta
enormemente su acceso y procesamiento adecuados. Por ello, los sistemas
informáticos sanitarios deben ser capaces, primero, de intercambiar datos
entre todas las unidades que los integran y, segundo, de tener la habilidad
para interpretar la información presente en los datos que intercambian,
tanto en el contexto correcto como en un tiempo razonable. Para alcanzar
tal fin, en esta tesis doctoral se propone anotar semánticamente las
diferentes colecciones de información clínica, usando las terminologías
más apropiadas. Para demostrar nuestra hipótesis, nos centramos en dos
recursos: la historia clínica electrónica del paciente (HCE), que hoy en día
se considera una pieza clave para la prestación eficiente y de calidad de los
servicios sanitarios y da acceso a la información del paciente, y en las
guías de práctica clínica, que constituyen una fuente importante de
conocimiento sobre las recomendaciones diagnósticas y terapéuticas
basadas en la evidencia. La tesis demuestra que es factible el desarrollo de
técnicas automatizadas para anotar semánticamente, por una parte, los
modelos clínicos que formalizan las HCE y que suponen una arquitectura
para la comunicación e intercambio de datos de HCE y, por otra parte, los
textos que describen las guías clínicas textuales y, por lo tanto, el
conocimiento que consultará el clínico con una precisión y fiabilidad
elevada