706 research outputs found

    Comparing High Dimensional Word Embeddings Trained on Medical Text to Bag-of-Words For Predicting Medical Codes

    Get PDF
    Word embeddings are a useful tool for extracting knowledge from the free-form text contained in electronic health records, but it has become commonplace to train such word embeddings on data that do not accurately reflect how language is used in a healthcare context. We use prediction of medical codes as an example application to compare the accuracy of word embeddings trained on health corpora to those trained on more general collections of text. It is shown that both an increase in embedding dimensionality and an increase in the volume of health-related training data improves prediction accuracy. We also present a comparison to the traditional bag-of-words feature representation, demonstrating that in many cases, this conceptually simple method for representing text results in superior accuracy to that of word embeddings

    Prediction of ICU Readmission Using Clinical Notes

    Get PDF
    Unplanned readmissions to the ICU result in higher medical costs and an increase in the likelihood of adverse events, extended hospital stays, and mortality. Machine learning models can leverage the large amount of data stored in electronic health records to predict these cases and provide physicians with more information about patient risk at the time of ICU discharge. Most prior work in this area has focused on developing models using only the structured data found in electronic health records and neglects the large amount of unstructured information stored in clinical notes. This work applies deep learning techniques to these notes to predict ICU readmission and develops models that outperform prior work that focuses only on structured data

    Toward More Predictive Models by Leveraging Multimodal Data

    Get PDF
    Data is often composed of structured and unstructured data. Both forms of data have information that can be exploited by machine learning models to increase their prediction performance on a task. However, integrating the features from both these data forms is a hard, complicated task. This is all the more true for models which operate on time-constraints. Time-constrained models are machine learning models that work on input where time causality has to be maintained such as predicting something in the future based on past data. Most previous work does not have a dedicated pipeline that is generalizable to different tasks and domains, especially under time-constraints. In this work, we present a systematic, domain-agnostic pipeline for integrating features from structured and unstructured data while maintaining time causality for building models. We focus on the healthcare and consumer market domain and perform experiments, preprocess data, and build models to demonstrate the generalizability of the pipeline. More specifically, we focus on the task of identifying patients who are at risk of an imminent ICU admission. We use our pipeline to solve this task and show how augmenting unstructured data with structured data improves model performance. We found that by combining structured and unstructured data we can get a performance improvement of up to 8.5
    corecore