336 research outputs found

    Named Entity Recognition in Electronic Health Records Using Transfer Learning Bootstrapped Neural Networks

    Full text link
    Neural networks (NNs) have become the state of the art in many machine learning applications, especially in image and sound processing [1]. The same, although to a lesser extent [2,3], could be said in natural language processing (NLP) tasks, such as named entity recognition. However, the success of NNs remains dependent on the availability of large labelled datasets, which is a significant hurdle in many important applications. One such case are electronic health records (EHRs), which are arguably the largest source of medical data, most of which lies hidden in natural text [4,5]. Data access is difficult due to data privacy concerns, and therefore annotated datasets are scarce. With scarce data, NNs will likely not be able to extract this hidden information with practical accuracy. In our study, we develop an approach that solves these problems for named entity recognition, obtaining 94.6 F1 score in I2B2 2009 Medical Extraction Challenge [6], 4.3 above the architecture that won the competition. Beyond the official I2B2 challenge, we further achieve 82.4 F1 on extracting relationships between medical terms. To reach this state-of-the-art accuracy, our approach applies transfer learning to leverage on datasets annotated for other I2B2 tasks, and designs and trains embeddings that specially benefit from such transfer.Comment: 11 pages, 4 figures, 8 table

    Multitask prediction of organ dysfunction in the intensive care unit using sequential subnetwork routing.

    Get PDF
    OBJECTIVE: Multitask learning (MTL) using electronic health records allows concurrent prediction of multiple endpoints. MTL has shown promise in improving model performance and training efficiency; however, it often suffers from negative transfer - impaired learning if tasks are not appropriately selected. We introduce a sequential subnetwork routing (SeqSNR) architecture that uses soft parameter sharing to find related tasks and encourage cross-learning between them. MATERIALS AND METHODS: Using the MIMIC-III (Medical Information Mart for Intensive Care-III) dataset, we train deep neural network models to predict the onset of 6 endpoints including specific organ dysfunctions and general clinical outcomes: acute kidney injury, continuous renal replacement therapy, mechanical ventilation, vasoactive medications, mortality, and length of stay. We compare single-task (ST) models with naive multitask and SeqSNR in terms of discriminative performance and label efficiency. RESULTS: SeqSNR showed a modest yet statistically significant performance boost across 4 of 6 tasks compared with ST and naive multitasking. When the size of the training dataset was reduced for a given task (label efficiency), SeqSNR outperformed ST for all cases showing an average area under the precision-recall curve boost of 2.1%, 2.9%, and 2.1% for tasks using 1%, 5%, and 10% of labels, respectively. CONCLUSIONS: The SeqSNR architecture shows superior label efficiency compared with ST and naive multitasking, suggesting utility in scenarios in which endpoint labels are difficult to ascertain

    Maximizing the use of social and behavioural information from secondary care mental health electronic health records

    Get PDF
    Purpose The contribution of social and behavioural factors in the development of mental health conditions and treatment effectiveness is widely supported, yet there are weak population level data sources on social and behavioural determinants of mental health. Enriching these data gaps will be crucial to accelerating precision medicine. Some have suggested the broader use of electronic health records (EHR) as a source of non-clinical determinants, although social and behavioural information are not systematically collected metrics in EHRs, internationally. Objective In this commentary, we highlight the nature and quality of key available structured and unstructured social and behavioural data using a case example of value counts from secondary mental health data available in the UK from the UK Clinical Record Interactive Search (CRIS) database; highlight the methodological challenges in the use of such data; and possible solutions and opportunities involving the use of natural language processing (NLP) of unstructured EHR text. Conclusions Most structured non-clinical data fields within secondary care mental health EHR data have too much missing data for adequate use. The utility of other non-clinical fields reported semi-consistently (e.g., ethnicity and marital status) is entirely dependent on treating them appropriately in analyses, quantifying the many reasons behind missingness in consideration of selection biases. Advancements in NLP offer new opportunities in the exploitation of unstructured text from secondary care EHR data particularly given that clinical notes and attachments are available in large volumes of patients and are more routinely completed by clinicians. Tackling ways to re-use, harmonize, and improve our existing and future secondary care mental health data, leveraging advanced analytics such as NLP is worth the effort in an attempt to fill the data gap on social and behavioural contributors to mental health conditions and will be necessary to fulfill all of the domains needed to inform personalized interventions

    Limitations of Transformers on Clinical Text Classification

    Get PDF
    Bidirectional Encoder Representations from Transformers (BERT) and BERT-based approaches are the current state-of-the-art in many natural language processing (NLP) tasks; however, their application to document classification on long clinical texts is limited. In this work, we introduce four methods to scale BERT, which by default can only handle input sequences up to approximately 400 words long, to perform document classification on clinical texts several thousand words long. We compare these methods against two much simpler architectures -- a word-level convolutional neural network and a hierarchical self-attention network -- and show that BERT often cannot beat these simpler baselines when classifying MIMIC-III discharge summaries and SEER cancer pathology reports. In our analysis, we show that two key components of BERT -- pretraining and WordPiece tokenization -- may actually be inhibiting BERT\u27s performance on clinical text classification tasks where the input document is several thousand words long and where correctly identifying labels may depend more on identifying a few key words or phrases rather than understanding the contextual meaning of sequences of text

    Validation of UK Biobank data for mental health outcomes : a pilot study using secondary care electronic health records

    Get PDF
    The study was funded by the MRC Pathfinder Grant (MC_PC_17215); the National Institute for Health Research’s (NIHR) Oxford Health Biomedical Research Centre (BRC-1215-20005) and the Virtual Brain Cloud from European Commission (grant no. H2020SC1-DTH-2018-1). This work was supported by the UK Clinical Record Interactive Search (UK-CRIS) system funded by the National Institute for Health Research (NIHR) and the Medical Research Council, with the University of Oxford, using data and systems of the NIHR Oxford Health Biomedical Research Centre (BRC-1215-20005).UK Biobank (UKB) is widely employed to investigate mental health disorders and related exposures; however, its applicability and relevance in a clinical setting and the assumptions required have not been sufficiently and systematically investigated. Here, we present the first validation study using secondary care mental health data with linkage to UKB from Oxford - Clinical Record Interactive Search (CRIS) focusing on comparison of demographic information, diagnostic outcome, medication record and cognitive test results, with missing data and the implied bias from both resources depicted. We applied a natural language processing model to extract information embedded in unstructured text from clinical notes and attachments. Using a contingency table we compared the demographic information recorded in UKB and CRIS. We calculated the positive predictive value (PPV, proportion of true positives cases detected) for mental health diagnosis and relevant medication. Amongst the cohort of 854 subjects, PPVs for any mental health diagnosis for dementia, depression, bipolar disorder and schizophrenia were 41.6%, and were 59.5%, 12.5%, 50.0% and 52.6%, respectively. Self-reported medication records in UKB had general PPV of 47.0%, with the prevalence of frequently prescribed medicines to each typical mental health disorder considerably different from the information provided by CRIS. UKB is highly multimodal, but with limited follow-up records, whereas CRIS offers a longitudinal high-resolution clinical picture with more than ten years of observations. The linkage of both datasets will reduce the self-report bias and synergistically augment diverse modalities into a unified resource to facilitate more robust research in mental health.Peer reviewe

    Clinical information extraction for preterm birth risk prediction

    Get PDF
    This paper contributes to the pursuit of leveraging unstructured medical notes to structured clinical decision making. In particular, we present a pipeline for clinical information extraction from medical notes related to preterm birth, and discuss the main challenges as well as its potential for clinical practice. A large collection of medical notes, created by staff during hospitalizations of patients who were at risk of delivering preterm, was gathered and analyzed. Based on an annotated collection of notes, we trained and evaluated information extraction components to discover clinical entities such as symptoms, events, anatomical sites and procedures, as well as attributes linked to these clinical entities. In a retrospective study, we show that these are highly informative for clinical decision support models that are trained to predict whether delivery is likely to occur within specific time windows, in combination with structured information from electronic health records
    corecore