1,474 research outputs found

    The Benefits of Word Embeddings Features for Active Learning in Clinical Information Extraction

    Get PDF
    This study investigates the use of unsupervised word embeddings and sequence features for sample representation in an active learning framework built to extract clinical concepts from clinical free text. The objective is to further reduce the manual annotation effort while achieving higher effectiveness compared to a set of baseline features. Unsupervised features are derived from skip-gram word embeddings and a sequence representation approach. The comparative performance of unsupervised features and baseline hand-crafted features in an active learning framework are investigated using a wide range of selection criteria including least confidence, information diversity, information density and diversity, and domain knowledge informativeness. Two clinical datasets are used for evaluation: the i2b2/VA 2010 NLP challenge and the ShARe/CLEF 2013 eHealth Evaluation Lab. Our results demonstrate significant improvements in terms of effectiveness as well as annotation effort savings across both datasets. Using unsupervised features along with baseline features for sample representation lead to further savings of up to 9% and 10% of the token and concept annotation rates, respectively

    A recurrent neural network architecture for biomedical event trigger classification

    Get PDF
    A “biomedical event” is a broad term used to describe the roles and interactions between entities (such as proteins, genes and cells) in a biological system. The task of biomedical event extraction aims at identifying and extracting these events from unstructured texts. An important component in the early stage of the task is biomedical trigger classification which involves identifying and classifying words/phrases that indicate an event. In this thesis, we present our work on biomedical trigger classification developed using the multi-level event extraction dataset. We restrict the scope of our classification to 19 biomedical event types grouped under four broad categories - Anatomical, Molecular, General and Planned. While most of the existing approaches are based on traditional machine learning algorithms which require extensive feature engineering, our model relies on neural networks to implicitly learn important features directly from the text. We use natural language processing techniques to transform the text into vectorized inputs that can be used in a neural network architecture. As per our knowledge, this is the first time neural attention strategies are being explored in the area of biomedical trigger classification. Our best results were obtained from an ensemble of 50 models which produced a micro F-score of 79.82%, an improvement of 1.3% over the previous best score

    Automated Coding of Under-Studied Medical Concept Domains: Linking Physical Activity Reports to the International Classification of Functioning, Disability, and Health

    Get PDF
    Linking clinical narratives to standardized vocabularies and coding systems is a key component of unlocking the information in medical text for analysis. However, many domains of medical concepts lack well-developed terminologies that can support effective coding of medical text. We present a framework for developing natural language processing (NLP) technologies for automated coding of under-studied types of medical information, and demonstrate its applicability via a case study on physical mobility function. Mobility is a component of many health measures, from post-acute care and surgical outcomes to chronic frailty and disability, and is coded in the International Classification of Functioning, Disability, and Health (ICF). However, mobility and other types of functional activity remain under-studied in medical informatics, and neither the ICF nor commonly-used medical terminologies capture functional status terminology in practice. We investigated two data-driven paradigms, classification and candidate selection, to link narrative observations of mobility to standardized ICF codes, using a dataset of clinical narratives from physical therapy encounters. Recent advances in language modeling and word embedding were used as features for established machine learning models and a novel deep learning approach, achieving a macro F-1 score of 84% on linking mobility activity reports to ICF codes. Both classification and candidate selection approaches present distinct strengths for automated coding in under-studied domains, and we highlight that the combination of (i) a small annotated data set; (ii) expert definitions of codes of interest; and (iii) a representative text corpus is sufficient to produce high-performing automated coding systems. This study has implications for the ongoing growth of NLP tools for a variety of specialized applications in clinical care and research.Comment: Updated final version, published in Frontiers in Digital Health, https://doi.org/10.3389/fdgth.2021.620828. 34 pages (23 text + 11 references); 9 figures, 2 table
    • …
    corecore