Search CORE

3 research outputs found

Automation of a problem list using natural language processing

Author: AR Aronson
AR Aronson
AR Aronson
AT McCray
AT McCray
C Friedman
C Friedman
C Friedman
C Friedman
C Friedman
C Friedman
CA Knirsch
CA Sneiderman
CD Manning
D Zingmond
DL Ranum
E Bayegan
E Chi
G Hripcsak
G Hripcsak
G Paterson
G Shadow
GF Cooper
H Bludau
H Goldberg
H Goldberg
H Wasserman
H Xu
HJ Scherpbier
Institute of Medicine (U.S.)
International Organization for Standardization
J Nivre
J Starmer
J Zelingher
JC Reichert
JEF Friedl
JR Campbell
JR Campbell
JS Elkins
JW Hales
K Heitmann
K Thompson
L Christensen
LL Weed
LL Weed
LT Kohn
LW Wright
M Fiszman
M Fiszman
M Fiszman
M Weeber
ML Muller
MS Donaldson
MS Tuttle
N Sager
NL Jain
P Haug
P Nadkerni
P Spyns
Peter J Haug
PF Brennan
PG Mutalik
PJ Haug
PJ Haug
PJ Haug
PL Elkin
Q Zou
RH Dolin
S Meystre
SB Koehler
SC Kleene
SJ Wang
SM Huff
Stephane Meystre
T Payne
TC Rindflesch
TC Rindflesch
W Pratt
W Pratt
WW Chapman
Y Huang
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

BACKGROUND: The medical problem list is an important part of the electronic medical record in development in our institution. To serve the functions it is designed for, the problem list has to be as accurate and timely as possible. However, the current problem list is usually incomplete and inaccurate, and is often totally unused. To alleviate this issue, we are building an environment where the problem list can be easily and effectively maintained. METHODS: For this project, 80 medical problems were selected for their frequency of use in our future clinical field of evaluation (cardiovascular). We have developed an Automated Problem List system composed of two main components: a background and a foreground application. The background application uses Natural Language Processing (NLP) to harvest potential problem list entries from the list of 80 targeted problems detected in the multiple free-text electronic documents available in our electronic medical record. These proposed medical problems drive the foreground application designed for management of the problem list. Within this application, the extracted problems are proposed to the physicians for addition to the official problem list. RESULTS: The set of 80 targeted medical problems selected for this project covered about 5% of all possible diagnoses coded in ICD-9-CM in our study population (cardiovascular adult inpatients), but about 64% of all instances of these coded diagnoses. The system contains algorithms to detect first document sections, then sentences within these sections, and finally potential problems within the sentences. The initial evaluation of the section and sentence detection algorithms demonstrated a sensitivity and positive predictive value of 100% when detecting sections, and a sensitivity of 89% and a positive predictive value of 94% when detecting sentences. CONCLUSION: The global aim of our project is to automate the process of creating and maintaining a problem list for hospitalized patients and thereby help to guarantee the timeliness, accuracy and completeness of this information

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Enhancing clinical concept extraction with distributional semantics

Author: Cohen Trevor
Gonzalez Graciela
Jonnalagadda Siddhartha
Wu Stephen
Publication venue: Elsevier Inc.
Publication date: 01/02/2012
Field of study

AbstractExtracting concepts (such as drugs, symptoms, and diagnoses) from clinical narratives constitutes a basic enabling technology to unlock the knowledge within and support more advanced reasoning applications such as diagnosis explanation, disease progression modeling, and intelligent analysis of the effectiveness of treatment. The recent release of annotated training sets of de-identified clinical narratives has contributed to the development and refinement of concept extraction methods. However, as the annotation process is labor-intensive, training data are necessarily limited in the concepts and concept patterns covered, which impacts the performance of supervised machine learning applications trained with these data. This paper proposes an approach to minimize this limitation by combining supervised machine learning with empirical learning of semantic relatedness from the distribution of the relevant words in additional unannotated text.The approach uses a sequential discriminative classifier (Conditional Random Fields) to extract the mentions of medical problems, treatments and tests from clinical narratives. It takes advantage of all Medline abstracts indexed as being of the publication type “clinical trials” to estimate the relatedness between words in the i2b2/VA training and testing corpora. In addition to the traditional features such as dictionary matching, pattern matching and part-of-speech tags, we also used as a feature words that appear in similar contexts to the word in question (that is, words that have a similar vector representation measured with the commonly used cosine metric, where vector representations are derived using methods of distributional semantics). To the best of our knowledge, this is the first effort exploring the use of distributional semantics, the semantics derived empirically from unannotated text often using vector space models, for a sequence classification task such as concept extraction. Therefore, we first experimented with different sliding window models and found the model with parameters that led to best performance in a preliminary sequence labeling task.The evaluation of this approach, performed against the i2b2/VA concept extraction corpus, showed that incorporating features based on the distribution of words across a large unannotated corpus significantly aids concept extraction. Compared to a supervised-only approach as a baseline, the micro-averaged F-score for exact match increased from 80.3% to 82.3% and the micro-averaged F-score based on inexact match increased from 89.7% to 91.3%. These improvements are highly significant according to the bootstrap resampling method and also considering the performance of other systems. Thus, distributional semantic features significantly improve the performance of concept extraction from clinical narratives by taking advantage of word distribution information obtained from unannotated data

Elsevier - Publisher Connector

PubMed Central