Search CORE

8 research outputs found

ContextD: An algorithm to identify contextual properties of medical terms in a dutch clinical corpus

Author: Afzal M.Z. (Zubair)
Kang N. (Ning)
Kors J.A. (Jan)
Pons E. (Ewoud)
Schuemie M.J. (Martijn)
Sturkenboom M.C.J.M. (Miriam)
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 29/11/2014
Field of study

Background: In order to extract meaningful information from electronic medical records, such as signs and symptoms, diagnoses, and treatments, it is important to take into account the contextual properties of the identified information: negation, temporality, and experiencer. Most work on automatic identification of these contextual properties has been done on English clinical text. This study presents ContextD, an adaptation of the English ConText algorithm to the Dutch language, and a Dutch clinical corpus. Results: The ContextD algorithm utilized 41 unique triggers to identify the contextual properties in the clinical corpus. For the negation property, the algorithm obtained an F-score from 87% to 93% for the different document types. For the experiencer property, the F-score was 99% to 100%. For the historical and hypothetical values of the temporality property, F-scores ranged from 26% to 54% and from 13% to 44%, respectively. Conclusions: The ContextD showed good performance in identifying negation and experiencer property values across all Dutch clinical document types. Accurate identification of the temporality property proved to be difficult and requires further work. The anonymized and annotated Dutch clinical corpus can serve as a useful resource for further algorithm development

Erasmus University Digital Repository

ContextD: an algorithm to identify contextual properties of medical terms in a Dutch clinical corpus

Author: A Vlug
C Friedman
C Friedman
E Apostolova
E Velldal
Ewoud Pons
GK Savova
H Harkema
H Kilicoglu
H Xu
I Goldin
J Cohen
Jan A Kors
L Deléger
LM Christensen
M Light
M Skeppstedt
Martijn J Schuemie
Miriam CJM Sturkenboom
Ning Kang
NP Cruz Díaz
O Bodenreider
O Uzuner
PB Jensen
PG Mutalik
PL Elkin
QT Zeng
RM Reeves
S Agarwal
S Goryachev
U Hahn
V Vincze
W Sun
WW Chapman
WW Chapman
Y Huang
Zubair Afzal
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Improving sensitivity of machine learning methods for automated case identification from free-text electronic medical records

Author: Afzal M.Z. (Zubair)
Blijderveen J.C. (Nico) van
Kors J.A. (Jan)
Schuemie M.J. (Martijn)
Sen E.F. (Elif)
Sturkenboom M.C.J.M. (Miriam)
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 02/03/2013
Field of study

Background: Distinguishing cases from non-cases in free-text electronic medical records is an important initial step in observational epidemiological studies, but manual record validation is time-consuming and cumbersome. We compared different approaches to develop an automatic case identification system with high sensitivity to assist manual annotators. Methods. We used four different machine-learning algorithms to build case identification systems for two data sets, one comprising hepatobiliary disease patients, the other acute renal failure patients. To improve the sensitivity of the systems, we varied the imbalance ratio between positive cases and negative cases using under- and over-sampling techniques, and applied cost-sensitive learning with various misclassification costs. Results: For the hepatobiliary data set, we obtained a high sensitivity of 0.95 (on a par with manual annotators, as compared to 0.91 for a baseline classifier) with specificity 0.56. For the acute renal failure data set, sensitivity increased from 0.69 to 0.89, with specificity 0.59. Performance differences between the various machine-learning algorithms were not large. Classifiers performed best when trained on data sets with imbalance ratio below 10. Conclusions: We were able to achieve high sensitivity with moderate specificity for automatic case identification on two data sets of electronic medical records. Such a high-sensitive case identification system can be used as a pre-filter to significantly reduce the burden of manual record validation

Crossref

Springer - Publisher Connector

PubMed Central

Erasmus University Digital Repository

Improving sensitivity of machine learning methods for automated case identification from free-text electronic medical records

Author: A Cunningham
A Nicholson
A Vlug
C Chen
C Clark
C Drummond
C Hsu
C-C Chang
CP Chung
CX Ling
D Mease
E Apostolova
EA Garcia
Elif F Sen
FS Roque
GK Savova
GK Savova
GM Weiss
GN Norén
H Harkema
J Cohen
J Friedlin
J Van Hulse
J Van Hulse
JA Linder
JA Singh
Jan A Kors
Jan C van Blijderveen
JF Hurdle
JR Quinlan
K McCarthy
KP Liao
KS Boockvar
LM Taft
M Hall
Martijn J Schuemie
MH Stanfill
Miriam CJM Sturkenboom
MJ Schuemie
N Japkowicz
N Japkowicz
NV Chawla
NV Chawla
P Domingos
P Ruch
PK Chan
PL Elkin
R Akbani
R Farkas
R Setiono
RH Perlis
S Pakhomov
SD Persell
SL Salzberg
SM Meystre
T Wang
W Adler
WW Chapman
WW Cohen
X Liu
Y Sun
Y Sun
Z Wang
Z Zhou
Zubair Afzal
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Text Mining to Support Knowledge Discovery from Electronic Health Records

Author: Afzal M.Z. (Zubair)
Publication venue: The use of electronic health records (EHRs) has grown rapidly in the last decade. The EHRs are no longer being used only for storing information for clinical purposes but the secondary use of the data in the healthcare research has increased rapidly as well. The data in EHRs are recorded in a structured manner as much as possible, however, many EHRs often also contain large amount of unstructured free‐text. The structured and unstructured clinical data presents several challenges to the researchers since the data are not primarily collected for research purposes. The issues related to structured data can be missing data, noise, and inconsistency. The unstructured free-text is even more challenging to use since they often have no fixed format and may vary from clinician to clinician and from database to database. Text and data mining techniques are increasingly being used to effectively and efficiently process large EHRs for research purposes. Most of the met
Publication date: 03/07/2018
Field of study

The use of electronic health records (EHRs) has grown rapidly in the last decade. The EHRs are no longer being used only for storing information for clinical purposes but the secondary use of the data in the healthcare research has increased rapidly as well. The data in EHRs are recorded in a structured manner as much as possible, however, many EHRs often also contain large amount of unstructured free‐text. The structured and unstructured clinical data presents several challenges to the researchers since the data are not primarily collected for research purposes. The issues related to structured data can be missing data, noise, and inconsistency. The unstructured free-text is even more challenging to use since they often have no fixed format and may vary from clinician to clinician and from database to database. Text and data mining techniques are increasingly being used to effectively and efficiently process large EHRs for research purposes. Most of the me

EUR Research Repository

Erasmus University Digital Repository

NEGATION TRIGGERS AND THEIR SCOPE

Author: Rosenberg Sabine
Publication venue
Publication date: 10/09/2013
Field of study

Recent interest in negation has resulted in a variety of different annotation schemes for different application tasks, several vetted in shared task competitions. Current negation detection systems are trained and tested for a specific application task within a particular domain. The availability of a robust, general negation detection module that can be added to any text processing pipeline is still missing. In this work we propose a linguistically motivated trigger and scope approach for negation detection in general. The system, NEGATOR, introduces two baseline modules: the scope module to identify the syntactic scope for different negation triggers and a variety of trigger lists evaluated for that purpose, ranging from minimal to extensive. The scope module consists of a set of specialized transformation rules that determine the scope of a negation trigger using dependency graphs from parser output. NEGATOR is evaluated on different corpora from different genres with different annotation schemes to establish general usefulness and robustness. The NEGATOR system also participated in two shared task competitions which address specific issues related to negation. Both these tasks presented an opportunity to demonstrate that the NEGATOR system can be easily adapted and extended to meet specific task requirements. The parallel, comparative evaluations suggest that NEGATOR is indeed a robust baseline system that is domain and task independent

Concordia University Research Repository