8 research outputs found

    ContextD: An algorithm to identify contextual properties of medical terms in a dutch clinical corpus

    Get PDF
    Background: In order to extract meaningful information from electronic medical records, such as signs and symptoms, diagnoses, and treatments, it is important to take into account the contextual properties of the identified information: negation, temporality, and experiencer. Most work on automatic identification of these contextual properties has been done on English clinical text. This study presents ContextD, an adaptation of the English ConText algorithm to the Dutch language, and a Dutch clinical corpus. Results: The ContextD algorithm utilized 41 unique triggers to identify the contextual properties in the clinical corpus. For the negation property, the algorithm obtained an F-score from 87% to 93% for the different document types. For the experiencer property, the F-score was 99% to 100%. For the historical and hypothetical values of the temporality property, F-scores ranged from 26% to 54% and from 13% to 44%, respectively. Conclusions: The ContextD showed good performance in identifying negation and experiencer property values across all Dutch clinical document types. Accurate identification of the temporality property proved to be difficult and requires further work. The anonymized and annotated Dutch clinical corpus can serve as a useful resource for further algorithm development

    Improving sensitivity of machine learning methods for automated case identification from free-text electronic medical records

    Get PDF
    Background: Distinguishing cases from non-cases in free-text electronic medical records is an important initial step in observational epidemiological studies, but manual record validation is time-consuming and cumbersome. We compared different approaches to develop an automatic case identification system with high sensitivity to assist manual annotators. Methods. We used four different machine-learning algorithms to build case identification systems for two data sets, one comprising hepatobiliary disease patients, the other acute renal failure patients. To improve the sensitivity of the systems, we varied the imbalance ratio between positive cases and negative cases using under- and over-sampling techniques, and applied cost-sensitive learning with various misclassification costs. Results: For the hepatobiliary data set, we obtained a high sensitivity of 0.95 (on a par with manual annotators, as compared to 0.91 for a baseline classifier) with specificity 0.56. For the acute renal failure data set, sensitivity increased from 0.69 to 0.89, with specificity 0.59. Performance differences between the various machine-learning algorithms were not large. Classifiers performed best when trained on data sets with imbalance ratio below 10. Conclusions: We were able to achieve high sensitivity with moderate specificity for automatic case identification on two data sets of electronic medical records. Such a high-sensitive case identification system can be used as a pre-filter to significantly reduce the burden of manual record validation

    Text Mining to Support Knowledge Discovery from Electronic Health Records

    Get PDF
    The use of electronic health records (EHRs) has grown rapidly in the last decade. The EHRs are no longer being used only for storing information for clinical purposes but the secondary use of the data in the healthcare research has increased rapidly as well. The data in EHRs are recorded in a structured manner as much as possible, however, many EHRs often also contain large amount of unstructured free‐text. The structured and unstructured clinical data presents several challenges to the researchers since the data are not primarily collected for research purposes. The issues related to structured data can be missing data, noise, and inconsistency. The unstructured free-text is even more challenging to use since they often have no fixed format and may vary from clinician to clinician and from database to database. Text and data mining techniques are increasingly being used to effectively and efficiently process large EHRs for research purposes. Most of the me

    NEGATION TRIGGERS AND THEIR SCOPE

    Get PDF
    Recent interest in negation has resulted in a variety of different annotation schemes for different application tasks, several vetted in shared task competitions. Current negation detection systems are trained and tested for a specific application task within a particular domain. The availability of a robust, general negation detection module that can be added to any text processing pipeline is still missing. In this work we propose a linguistically motivated trigger and scope approach for negation detection in general. The system, NEGATOR, introduces two baseline modules: the scope module to identify the syntactic scope for different negation triggers and a variety of trigger lists evaluated for that purpose, ranging from minimal to extensive. The scope module consists of a set of specialized transformation rules that determine the scope of a negation trigger using dependency graphs from parser output. NEGATOR is evaluated on different corpora from different genres with different annotation schemes to establish general usefulness and robustness. The NEGATOR system also participated in two shared task competitions which address specific issues related to negation. Both these tasks presented an opportunity to demonstrate that the NEGATOR system can be easily adapted and extended to meet specific task requirements. The parallel, comparative evaluations suggest that NEGATOR is indeed a robust baseline system that is domain and task independent
    corecore