10,555 research outputs found

    Automatic Prediction of Recurrence of Major Cardiovascular Events: A Text Mining Study Using Chest X-Ray Reports

    Get PDF
    Background and Objective. Electronic health records (EHRs) contain free-text information on symptoms, diagnosis, treatment, and prognosis of diseases. However, this potential goldmine of health information cannot be easily accessed and used unless proper text mining techniques are applied. The aim of this project was to develop and evaluate a text mining pipeline in a multimodal learning architecture to demonstrate the value of medical text classification in chest radiograph reports for cardiovascular risk prediction. We sought to assess the integration of various text representation approaches and clinical structured data with state-of-the-art deep learning methods in the process of medical text mining. Methods. We used EHR data of patients included in the Second Manifestations of ARTerial disease (SMART) study. We propose a deep learning-based multimodal architecture for our text mining pipeline that integrates neural text representation with preprocessed clinical predictors for the prediction of recurrence of major cardiovascular events in cardiovascular patients. Text preprocessing, including cleaning and stemming, was first applied to filter out the unwanted texts from X-ray radiology reports. Thereafter, text representation methods were used to numerically represent unstructured radiology reports with vectors. Subsequently, these text representation methods were added to prediction models to assess their clinical relevance. In this step, we applied logistic regression, support vector machine (SVM), multilayer perceptron neural network, convolutional neural network, long short-term memory (LSTM), and bidirectional LSTM deep neural network (BiLSTM). Results. We performed various experiments to evaluate the added value of the text in the prediction of major cardiovascular events. The two main scenarios were the integration of radiology reports (1) with classical clinical predictors and (2) with only age and sex in the case of unavailable clinical predictors. In total, data of 5603 patients were used with 5-fold cross-validation to train the models. In the first scenario, the multimodal BiLSTM (MI-BiLSTM) model achieved an area under the curve (AUC) of 84.7%, misclassification rate of 14.3%, and F1 score of 83.8%. In this scenario, the SVM model, trained on clinical variables and bag-of-words representation, achieved the lowest misclassification rate of 12.2%. In the case of unavailable clinical predictors, the MI-BiLSTM model trained on radiology reports and demographic (age and sex) variables reached an AUC, F1 score, and misclassification rate of 74.5%, 70.8%, and 20.4%, respectively. Conclusions. Using the case study of routine care chest X-ray radiology reports, we demonstrated the clinical relevance of integrating text features and classical predictors in our text mining pipeline for cardiovascular risk prediction. The MI-BiLSTM model with word embedding representation appeared to have a desirable performance when trained on text data integrated with the clinical variables from the SMART study. Our results mined from chest X-ray reports showed that models using text data in addition to laboratory values outperform those using only known clinical predictors

    Text mining as a tool for assessment of informational quality of electronic mammographic reports

    Get PDF
    OBJETIVO: Investigação do uso da técnica de mineração de texto como forma de avaliar a qualidade informacional de laudos eletrônicos de mamografia, tendo como parâmetro de qualidade a adesão ao léxico BI-RADS®. MATERIAIS E MÉTODOS: Foram extraídos 22.247 laudos de mamografia do banco de dados do sistema de informação em radiologia do Hospital das Clínicas da Faculdade de Medicina de Ribeirão Preto, no período de janeiro de 2000 até junho de 2006. Foram realizados dois experimentos, um buscando-se verificar a utilização mais correta dos termos do léxico - experimento 1 (especificidade do método de mineração), e outro buscando-se verificar toda e qualquer tentativa de uso ou alusão ao léxico - experimento 2 (sensibilidade do método de mineração). RESULTADOS: Experimento 1: variação entre 11% e 61% de laudos contendo termos do léxico em sua conclusão, distribuída de forma aleatória ao longo do tempo, a partir do ano de 2001. Experimento 2: variação entre 44% e 100% de laudos que se referem de alguma forma ao léxico em sua conclusão. CONCLUSÃO: Os resultados indicam um bom potencial da aplicação da ferramenta de mineração de texto para a avaliação da qualidade das informações contidas em laudos eletrônicos de mamografia.OBJECTIVE: To investigate the utilization of text mining technique for evaluating the informational quality of electronic mammographic reports considering adherence to the BI-RADS® lexicon as a quality parameter. MATERIALS AND METHODS: A total of 22,247 mammography reports of the period between January, 2000 and June, 2006 were collected from the radiology information database of Hospital das Clínicas da Faculdade de Medicina de Ribeirão Preto, SP, Brazil. Two experiments were undertaken - experiment 1 to evaluate the accuracy in the adoption of the lexicon terms (text mining method specificity), and experiment 2 to identify all and any attempt to utilize or refer to the lexicon (text mining method sensitivity). RESULTS: Experiment 1: variation between 11% and 61% in reports including lexicon terms in their conclusion, randomly distributed over time since 2001. Experiment 2: variation between 44% and 100% in reports that somehow refer to the lexicon in their conclusion. CONCLUSION: Results indicate a good potential for text mining tool application for assessing the quality of information included in electronic mammography reports.Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)(FAEPA) Fundação de Apoio ao Ensino Pesquisa e Assistênci

    Classification of Radiology Reports Using Neural Attention Models

    Full text link
    The electronic health record (EHR) contains a large amount of multi-dimensional and unstructured clinical data of significant operational and research value. Distinguished from previous studies, our approach embraces a double-annotated dataset and strays away from obscure "black-box" models to comprehensive deep learning models. In this paper, we present a novel neural attention mechanism that not only classifies clinically important findings. Specifically, convolutional neural networks (CNN) with attention analysis are used to classify radiology head computed tomography reports based on five categories that radiologists would account for in assessing acute and communicable findings in daily practice. The experiments show that our CNN attention models outperform non-neural models, especially when trained on a larger dataset. Our attention analysis demonstrates the intuition behind the classifier's decision by generating a heatmap that highlights attended terms used by the CNN model; this is valuable when potential downstream medical decisions are to be performed by human experts or the classifier information is to be used in cohort construction such as for epidemiological studies
    corecore