Search CORE

23 research outputs found

Recommended from our members

Negation’s Not Solved: Generalizability Versus Optimizability in Clinical Natural Language Processing

Author: Carrell David
Clark Cheryl
Coarr Matt
Halgrim Scott
Masanz James
Miller Timothy
Wu Stephen
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2014
Field of study

A review of published work in clinical natural language processing (NLP) may suggest that the negation detection task has been “solved.” This work proposes that an optimizable solution does not equal a generalizable solution. We introduce a new machine learning-based Polarity Module for detecting negation in clinical text, and extensively compare its performance across domains. Using four manually annotated corpora of clinical text, we show that negation detection performance suffers when there is no in-domain development (for manual methods) or training data (for machine learning-based methods). Various factors (e.g., annotation guidelines, named entity characteristics, the amount of data, and lexical and syntactic context) play a role in making generalizability difficult, but none completely explains the phenomenon. Furthermore, generalizability remains challenging because it is unclear whether to use a single source for accurate data, combine all sources into a single model, or apply domain adaptation methods. The most reliable means to improve negation detection is to manually annotate in-domain training data (or, perhaps, manually modify rules); this is a strategy for optimizing performance, rather than generalizing it. These results suggest a direction for future work in domain-adaptive and task-adaptive methods for clinical NLP

Harvard University - DASH

Directory of Open Access Journals

PubMed Central

Recommended from our members

A common type system for clinical natural language processing

Author: Becker Lee
Chapman Wendy W
Chen Pei
Chute Christopher G
Dligach Dmitriy
Kaggal Vinod C
Liu Hongfang
Masanz James J
Savova Guergana Kirilova
Wu Stephen T
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 03/01/2013
Field of study

Background: One challenge in reusing clinical data stored in electronic medical records is that these data are heterogenous. Clinical Natural Language Processing (NLP) plays an important role in transforming information in clinical text to a standard representation that is comparable and interoperable. Information may be processed and shared when a type system specifies the allowable data structures. Therefore, we aim to define a common type system for clinical NLP that enables interoperability between structured and unstructured data generated in different clinical settings. Results: We describe a common type system for clinical NLP that has an end target of deep semantics based on Clinical Element Models (CEMs), thus interoperating with structured data and accommodating diverse NLP approaches. The type system has been implemented in UIMA (Unstructured Information Management Architecture) and is fully functional in a popular open-source clinical NLP system, cTAKES (clinical Text Analysis and Knowledge Extraction System) versions 2.0 and later. Conclusions: We have created a type system that targets deep semantics, thereby allowing for NLP systems to encapsulate knowledge from text and share it alongside heterogenous clinical data sources. Rather than surface semantics that are typically the end product of NLP algorithms, CEM-based semantics explicitly build in deep clinical semantics as the point of interoperability with more structured data types

Harvard University - DASH

Springer - Publisher Connector

University of Melbourne Institutional Repository