Automated Coding of Under-Studied Medical Concept Domains: Linking
Physical Activity Reports to the International Classification of Functioning,
Disability, and Health
Linking clinical narratives to standardized vocabularies and coding systems
is a key component of unlocking the information in medical text for analysis.
However, many domains of medical concepts lack well-developed terminologies
that can support effective coding of medical text. We present a framework for
developing natural language processing (NLP) technologies for automated coding
of under-studied types of medical information, and demonstrate its
applicability via a case study on physical mobility function. Mobility is a
component of many health measures, from post-acute care and surgical outcomes
to chronic frailty and disability, and is coded in the International
Classification of Functioning, Disability, and Health (ICF). However, mobility
and other types of functional activity remain under-studied in medical
informatics, and neither the ICF nor commonly-used medical terminologies
capture functional status terminology in practice. We investigated two
data-driven paradigms, classification and candidate selection, to link
narrative observations of mobility to standardized ICF codes, using a dataset
of clinical narratives from physical therapy encounters. Recent advances in
language modeling and word embedding were used as features for established
machine learning models and a novel deep learning approach, achieving a macro
F-1 score of 84% on linking mobility activity reports to ICF codes. Both
classification and candidate selection approaches present distinct strengths
for automated coding in under-studied domains, and we highlight that the
combination of (i) a small annotated data set; (ii) expert definitions of codes
of interest; and (iii) a representative text corpus is sufficient to produce
high-performing automated coding systems. This study has implications for the
ongoing growth of NLP tools for a variety of specialized applications in
clinical care and research.Comment: Updated final version, published in Frontiers in Digital Health,
https://doi.org/10.3389/fdgth.2021.620828. 34 pages (23 text + 11
references); 9 figures, 2 table