167 research outputs found
Extracting diagnostic knowledge from MedLine Plus: a comparison between MetaMap and cTAKES Approaches
The development of diagnostic decision support systems (DDSS) requires having a reliable and
consistent knowledge base about diseases and their symptoms, signs and diagnostic tests. Physicians are
typically the source of this knowledge, but it is not always possible to obtain all the desired information from
them. Other valuable sources are medical books and articles describing the diagnosis of diseases, but again, extracting this
information is a hard and time-consuming task. In this paper we present the results of our research, in which we have used
Web scraping, natural language processing techniques, a variety of publicly available sources of diagnostic knowledge
and two widely known medical concept identifiers, MetaMap and cTAKES, to extract diagnostic criteria for infectious
diseases from MedLine Plus articles. A performance comparison of MetaMap and cTAKES is also presented
Clinical narrative analytics challenges
Precision medicine or evidence based medicine is based on
the extraction of knowledge from medical records to provide individuals
with the appropriate treatment in the appropriate moment according to
the patient features. Despite the efforts of using clinical narratives for
clinical decision support, many challenges have to be faced still today
such as multilinguarity, diversity of terms and formats in different services,
acronyms, negation, to name but a few. The same problems exist
when one wants to analyze narratives in literature whose analysis would
provide physicians and researchers with highlights. In this talk we will
analyze challenges, solutions and open problems and will analyze several
frameworks and tools that are able to perform NLP over free text to
extract medical entities by means of Named Entity Recognition process.
We will also analyze a framework we have developed to extract and validate
medical terms. In particular we present two uses cases: (i) medical
entities extraction of a set of infectious diseases description texts provided
by MedlinePlus and (ii) scales of stroke identification in clinical
narratives written in Spanish
NOBLE - Flexible concept recognition for large-scale biomedical natural language processing
Background: Natural language processing (NLP) applications are increasingly important in biomedical data analysis, knowledge engineering, and decision support. Concept recognition is an important component task for NLP pipelines, and can be either general-purpose or domain-specific. We describe a novel, flexible, and general-purpose concept recognition component for NLP pipelines, and compare its speed and accuracy against five commonly used alternatives on both a biological and clinical corpus. NOBLE Coder implements a general algorithm for matching terms to concepts from an arbitrary vocabulary set. The system's matching options can be configured individually or in combination to yield specific system behavior for a variety of NLP tasks. The software is open source, freely available, and easily integrated into UIMA or GATE. We benchmarked speed and accuracy of the system against the CRAFT and ShARe corpora as reference standards and compared it to MMTx, MGrep, Concept Mapper, cTAKES Dictionary Lookup Annotator, and cTAKES Fast Dictionary Lookup Annotator. Results: We describe key advantages of the NOBLE Coder system and associated tools, including its greedy algorithm, configurable matching strategies, and multiple terminology input formats. These features provide unique functionality when compared with existing alternatives, including state-of-the-art systems. On two benchmarking tasks, NOBLE's performance exceeded commonly used alternatives, performing almost as well as the most advanced systems. Error analysis revealed differences in error profiles among systems. Conclusion: NOBLE Coder is comparable to other widely used concept recognition systems in terms of accuracy and speed. Advantages of NOBLE Coder include its interactive terminology builder tool, ease of configuration, and adaptability to various domains and tasks. NOBLE provides a term-to-concept matching system suitable for general concept recognition in biomedical NLP pipelines
Disease Name Extraction from Clinical Text Using Conditional Random Fields
The aim of the research done in this thesis was to extract disease and disorder names from clinical texts. We utilized Conditional Random Fields (CRF) as the main method to label diseases and disorders in clinical sentences. We used some other tools such as MetaMap and Stanford Core NLP tool to extract some crucial features. MetaMap tool was used to identify names of diseases/disorders that are already in UMLS Metathesaurus. Some other important features such as lemmatized versions of words, and POS tags were extracted using the Stanford Core NLP tool. Some more features were extracted directly from UMLS Metathesaurus, including semantic types of words. We participated in the SemEval 2014 competition\u27s Task 7 and used its provided data to train and evaluate our system. Training data contained 199 clinical texts, development data contained 99 clinical texts, and the test data contained 133 clinical texts, these included discharge summaries, echocardiogram, radiology, and ECG reports. We obtained competitive results on the disease/disorder name extraction task. We found through ablation study that while all features contributed, MetaMap matches, POS tags, and previous and next words were the most effective features
The Impact of Automatic Pre-annotation in Clinical Note Data Element Extraction - the CLEAN Tool
Objective. Annotation is expensive but essential for clinical note review and
clinical natural language processing (cNLP). However, the extent to which
computer-generated pre-annotation is beneficial to human annotation is still an
open question. Our study introduces CLEAN (CLinical note rEview and
ANnotation), a pre-annotation-based cNLP annotation system to improve clinical
note annotation of data elements, and comprehensively compares CLEAN with the
widely-used annotation system Brat Rapid Annotation Tool (BRAT).
Materials and Methods. CLEAN includes an ensemble pipeline (CLEAN-EP) with a
newly developed annotation tool (CLEAN-AT). A domain expert and a novice
user/annotator participated in a comparative usability test by tagging 87 data
elements related to Congestive Heart Failure (CHF) and Kawasaki Disease (KD)
cohorts in 84 public notes.
Results. CLEAN achieved higher note-level F1-score (0.896) over BRAT (0.820),
with significant difference in correctness (P-value < 0.001), and the mostly
related factor being system/software (P-value < 0.001). No significant
difference (P-value 0.188) in annotation time was observed between CLEAN (7.262
minutes/note) and BRAT (8.286 minutes/note). The difference was mostly
associated with note length (P-value < 0.001) and system/software (P-value
0.013). The expert reported CLEAN to be useful/satisfactory, while the novice
reported slight improvements.
Discussion. CLEAN improves the correctness of annotation and increases
usefulness/satisfaction with the same level of efficiency. Limitations include
untested impact of pre-annotation correctness rate, small sample size, small
user size, and restrictedly validated gold standard.
Conclusion. CLEAN with pre-annotation can be beneficial for an expert to deal
with complex annotation tasks involving numerous and diverse target data
elements
Improving Broad-Coverage Medical Entity Linking with Semantic Type Prediction and Large-Scale Datasets
Medical entity linking is the task of identifying and standardizing medical
concepts referred to in an unstructured text. Most of the existing methods
adopt a three-step approach of (1) detecting mentions, (2) generating a list of
candidate concepts, and finally (3) picking the best concept among them. In
this paper, we probe into alleviating the problem of overgeneration of
candidate concepts in the candidate generation module, the most under-studied
component of medical entity linking. For this, we present MedType, a fully
modular system that prunes out irrelevant candidate concepts based on the
predicted semantic type of an entity mention. We incorporate MedType into five
off-the-shelf toolkits for medical entity linking and demonstrate that it
consistently improves entity linking performance across several benchmark
datasets. To address the dearth of annotated training data for medical entity
linking, we present WikiMed and PubMedDS, two large-scale medical entity
linking datasets, and demonstrate that pre-training MedType on these datasets
further improves entity linking performance. We make our source code and
datasets publicly available for medical entity linking research.Comment: 35 page
- …