17 research outputs found

    Automatic Prediction of Recurrence of Major Cardiovascular Events: A Text Mining Study Using Chest X-Ray Reports

    Get PDF
    Background and Objective. Electronic health records (EHRs) contain free-text information on symptoms, diagnosis, treatment, and prognosis of diseases. However, this potential goldmine of health information cannot be easily accessed and used unless proper text mining techniques are applied. The aim of this project was to develop and evaluate a text mining pipeline in a multimodal learning architecture to demonstrate the value of medical text classification in chest radiograph reports for cardiovascular risk prediction. We sought to assess the integration of various text representation approaches and clinical structured data with state-of-the-art deep learning methods in the process of medical text mining. Methods. We used EHR data of patients included in the Second Manifestations of ARTerial disease (SMART) study. We propose a deep learning-based multimodal architecture for our text mining pipeline that integrates neural text representation with preprocessed clinical predictors for the prediction of recurrence of major cardiovascular events in cardiovascular patients. Text preprocessing, including cleaning and stemming, was first applied to filter out the unwanted texts from X-ray radiology reports. Thereafter, text representation methods were used to numerically represent unstructured radiology reports with vectors. Subsequently, these text representation methods were added to prediction models to assess their clinical relevance. In this step, we applied logistic regression, support vector machine (SVM), multilayer perceptron neural network, convolutional neural network, long short-term memory (LSTM), and bidirectional LSTM deep neural network (BiLSTM). Results. We performed various experiments to evaluate the added value of the text in the prediction of major cardiovascular events. The two main scenarios were the integration of radiology reports (1) with classical clinical predictors and (2) with only age and sex in the case of unavailable clinical predictors. In total, data of 5603 patients were used with 5-fold cross-validation to train the models. In the first scenario, the multimodal BiLSTM (MI-BiLSTM) model achieved an area under the curve (AUC) of 84.7%, misclassification rate of 14.3%, and F1 score of 83.8%. In this scenario, the SVM model, trained on clinical variables and bag-of-words representation, achieved the lowest misclassification rate of 12.2%. In the case of unavailable clinical predictors, the MI-BiLSTM model trained on radiology reports and demographic (age and sex) variables reached an AUC, F1 score, and misclassification rate of 74.5%, 70.8%, and 20.4%, respectively. Conclusions. Using the case study of routine care chest X-ray radiology reports, we demonstrated the clinical relevance of integrating text features and classical predictors in our text mining pipeline for cardiovascular risk prediction. The MI-BiLSTM model with word embedding representation appeared to have a desirable performance when trained on text data integrated with the clinical variables from the SMART study. Our results mined from chest X-ray reports showed that models using text data in addition to laboratory values outperform those using only known clinical predictors

    Creating HIV risk profiles for men in South Africa: A latent class approach using cross-sectional survey data

    No full text
    Introduction: Engaging at‐risk men in HIV prevention programs and services is a current priority, yet there are few effective ways to identify which men are at highest risk or how to best reach them. In this study we generated multi‐factor profiles of HIV acquisition/transmission risk for men in Durban, South Africa, to help inform targeted programming and service delivery. Methods: Data come from surveys with 947 men ages 20 to 40 conducted in two informal settlements from May to September 2017. Using latent class analysis (LCA), which detects a small set of underlying groups based on multiple dimensions, we identified classes based on nine HIV risk factors and socio‐demographic characteristics. We then compared HIV service use between the classes. Results: We identified four latent classes, with good model fit statistics. The older high‐risk class (20% of the sample; mean age 36) were more likely married/cohabiting and employed, with multiple sexual partners, substantial age‐disparity with partners (eight years younger on‐average), transactional relationships (including more resource‐intensive forms like paying for partner’s rent), and hazardous drinking. The younger high‐risk class (24%; mean age 27) were likely unmarried and employed, with the highest probability of multiple partners in the last year (including 42% with 5+ partners), transactional relationships (less resource‐intensive, e.g., clothes/transportation), hazardous drinking, and inequitable gender views. The younger moderate‐risk class (36%; mean age 23) were most likely unmarried, unemployed technical college/university students/graduates. They had a relatively high probability of multiple partners and transactional relationships (less resource‐intensive), and moderate hazardous drinking. Finally, the older low‐risk class (20%; mean age 29) were more likely married/cohabiting, employed, and highly gender‐equitable, with few partners and limited transactional relationships. Circumcision (status) was higher among the younger moderate‐risk class than either high‐risk class (p \u3c 0.001). HIV testing and treatment literacy score were suboptimal and did not differ across classes. Conclusions: Distinct HIV risk profiles among men were identified. Interventions should focus on reaching the highest‐risk profiles who, despite their elevated risk, were less or no more likely than the lower‐risk to use HIV services. By enabling a more synergistic understanding of subgroups, LCA has potential to enable more strategic, data‐driven programming and evaluation

    Nation Binding: How Public Service Broadcasting Mitigates Political Selective Exposure

    Get PDF
    Recent research suggests that more and more citizens select news and information that is congruent with their existing political preferences. This increase in political selective exposure (PSE) has allegedly led to an increase in polarization. The vast majority of studies stem from the US case with a particular media and political system. We contend that there are good reasons to believe PSE is less prevalent in other systems. We test this using latent profile analysis with national survey data from the Netherlands (n = 2,833). We identify four types of media use profiles and indeed only find partial evidence of PSE. In particular, we find that public broadcasting news cross-cuts all cleavages. This research note offers an important antidote in what is considered a universal phenomenon. We do find, however, a relatively large segment of citizens opting out of news consumption despite the readily available news in today’s media landscape
    corecore