665 research outputs found

    Risk assessment for progression of Diabetic Nephropathy based on patient history analysis

    Get PDF
    A nefropatia diabética (ND) é uma das complicações mais comuns em doentes com diabetes. Trata-se de uma doença crónica que afeta progressivamente os rins, podendo resultar numa insuficiência renal. A digitalização permitiu aos hospitais armazenar as informações dos doentes em registos de saúde eletrónicos (RSE). A aplicação de algoritmos de Machine Learning (ML) a estes dados pode permitir a previsão do risco na evolução destes doentes, conduzindo a uma melhor gestão da doença. O principal objetivo deste trabalho é criar um modelo preditivo que tire partido do historial do doente presente nos RSE. Foi aplicado neste trabalho o maior conjunto de dados de doentes portugueses com DN, seguidos durante 22 anos pela Associação Protetora dos Diabéticos de Portugal (APDP). Foi desenvolvida uma abordagem longitudinal na fase de pré-processamento de dados, permitindo que estes fossem servidos como entrada para dezasseis algoritmos de ML distintos. Após a avaliação e análise dos respetivos resultados, o Light Gradient Boosting Machine foi identificado como o melhor modelo, apresentando boas capacidades de previsão. Esta conclusão foi apoiada não só pela avaliação de várias métricas de classificação em dados de treino, teste e validação, mas também pela avaliação do seu desempenho por cada estádio da doença. Para além disso, os modelos foram analisados utilizando gráficos de feature ranking e através de análise estatística. Como complemento, são ainda apresentados a interpretabilidade dos resultados através do método SHAP, assim como a distribuição do modelo utilizando o Gradio e os servidores da Hugging Face. Através da integração de técnicas ML, de um método de interpretação e de uma aplicação Web que fornece acesso ao modelo, este estudo oferece uma abordagem potencialmente eficaz para antecipar a evolução da ND, permitindo que os profissionais de saúde tomem decisões informadas para a prestação de cuidados personalizados e gestão da doença

    Owning Attention: Applying Human Factors Principles to Support Clinical Decision Support

    Get PDF
    In the best examples, clinical decision support (CDS) systems guide clinician decision-making and actions, prevent errors, improve quality, reduce costs, save time, and promote the use of evidence-based recommendations. However, the potential solution that CDS represents are limited by problems associated with improper design, implementation, and local customization. Despite an emphasis on electronic health record usability, little progress has been made to protect end-users from inadequately designed workflows and unnecessary interruptions. Intelligent and personalized design creates an opportunity to tailor CDS not just at the patient level but specific to the disease condition, provider experience, and available resources at the healthcare system level. This chapter leverages the Five Rights of CDS framework to demonstrate the application of human factors engineering principles and emerging trends to optimize data analytics, usability, workflow, and design

    Utilizing Temporal Information in The EHR for Developing a Novel Continuous Prediction Model

    Get PDF
    Type 2 diabetes mellitus (T2DM) is a nation-wide prevalent chronic condition, which includes direct and indirect healthcare costs. T2DM, however, is a preventable chronic condition based on previous clinical research. Many prediction models were based on the risk factors identified by clinical trials. One of the major tasks of the T2DM prediction models is to estimate the risks for further testing by HbA1c or fasting plasma glucose to determine whether the patient has or does not have T2DM because nation-wide screening is not cost-effective. Those models had substantial limitations on data quality, such as missing values. In this dissertation, I tested the conventional models which were based on the most widely used risk factors to predict the possibility of developing T2DM. The AUC was an average of 0.5, which implies the conventional model cannot be used to screen for T2DM risks. Based on this result, I further implemented three types of temporal representations, including non-temporal representation, interval-temporal representation, and continuous-temporal representation for building the T2DM prediction model. According to the results, continuous-temporal representation had the best performance. Continuous-temporal representation was based on deep learning methods. The result implied that the deep learning method could overcome the data quality issue and could achieve better performance. This dissertation also contributes to a continuous risk output model based on the seq2seq model. This model can generate a monotonic increasing function for a given patient to predict the future probability of developing T2DM. The model is workable but still has many limitations to overcome. Finally, this dissertation demonstrates some risks factors which are underestimated and are worthy for further research to revise the current T2DM screening guideline. The results were still preliminary. I need to collaborate with an epidemiologist and other fields to verify the findings. In the future, the methods for building a T2DM prediction model can also be used for other prediction models of chronic conditions

    A novel kernel based approach to arbitrary length symbolic data with application to type 2 diabetes risk

    Get PDF
    Predictive modeling of clinical data is fraught with challenges arising from the manner in which events are recorded. Patients typically fall ill at irregular intervals and experience dissimilar intervention trajectories. This results in irregularly sampled and uneven length data which poses a problem for standard multivariate tools. The alternative of feature extraction into equal-length vectors via methods like Bag-of-Words (BoW) potentially discards useful information. We propose an approach based on a kernel framework in which data is maintained in its native form: discrete sequences of symbols. Kernel functions derived from the edit distance between pairs of sequences may then be utilized in conjunction with support vector machines to classify the data. Our method is evaluated in the context of the prediction task of determining patients likely to develop type 2 diabetes following an earlier episode of elevated blood pressure of 130/80 mmHg. Kernels combined via multi kernel learning achieved an F1-score of 0.96, outperforming classification with SVM 0.63, logistic regression 0.63, Long Short Term Memory 0.61 and Multi-Layer Perceptron 0.54 applied to a BoW representation of the data. We achieved an F1-score of 0.97 on MKL on external dataset. The proposed approach is consequently able to overcome limitations associated with feature-based classification in the context of clinical data

    A Knowledge Distillation Ensemble Framework for Predicting Short and Long-term Hospitalisation Outcomes from Electronic Health Records Data

    Get PDF
    The ability to perform accurate prognosis of patients is crucial for proactive clinical decision making, informed resource management and personalised care. Existing outcome prediction models suffer from a low recall of infrequent positive outcomes. We present a highly-scalable and robust machine learning framework to automatically predict adversity represented by mortality and ICU admission from time-series vital signs and laboratory results obtained within the first 24 hours of hospital admission. The stacked platform comprises two components: a) an unsupervised LSTM Autoencoder that learns an optimal representation of the time-series, using it to differentiate the less frequent patterns which conclude with an adverse event from the majority patterns that do not, and b) a gradient boosting model, which relies on the constructed representation to refine prediction, incorporating static features of demographics, admission details and clinical summaries. The model is used to assess a patient's risk of adversity over time and provides visual justifications of its prediction based on the patient's static features and dynamic signals. Results of three case studies for predicting mortality and ICU admission show that the model outperforms all existing outcome prediction models, achieving PR-AUC of 0.891 (95% CI: 0.878 - 0.969) in predicting mortality in ICU and general ward settings and 0.908 (95% CI: 0.870-0.935) in predicting ICU admission.Comment: 14 page

    Machine Learning Framework for Real-World Electronic Health Records Regarding Missingness, Interpretability, and Fairness

    Get PDF
    Machine learning (ML) and deep learning (DL) techniques have shown promising results in healthcare applications using Electronic Health Records (EHRs) data. However, their adoption in real-world healthcare settings is hindered by three major challenges. Firstly, real-world EHR data typically contains numerous missing values. Secondly, traditional ML/DL models are typically considered black-boxes, whereas interpretability is required for real-world healthcare applications. Finally, differences in data distributions may lead to unfairness and performance disparities, particularly in subpopulations. This dissertation proposes methods to address missing data, interpretability, and fairness issues. The first work proposes an ensemble prediction framework for EHR data with large missing rates using multiple subsets with lower missing rates. The second method introduces the integration of medical knowledge graphs and double attention mechanism with the long short-term memory (LSTM) model to enhance interpretability by providing knowledge-based model interpretation. The third method develops an LSTM variant that integrates medical knowledge graphs and additional time-aware gates to handle multi-variable temporal missing issues and interpretability concerns. Finally, a transformer-based model is proposed to learn unbiased and fair representations of diverse subpopulations using domain classifiers and three attention mechanisms

    Using Case-Level Context to Classify Cancer Pathology Reports

    Get PDF
    Individual electronic health records (EHRs) and clinical reports are often part of a larger sequence-for example, a single patient may generate multiple reports over the trajectory of a disease. In applications such as cancer pathology reports, it is necessary not only to extract information from individual reports, but also to capture aggregate information regarding the entire cancer case based off case-level context from all reports in the sequence. In this paper, we introduce a simple modular add-on for capturing case-level context that is designed to be compatible with most existing deep learning architectures for text classification on individual reports. We test our approach on a corpus of 431,433 cancer pathology reports, and we show that incorporating case-level context significantly boosts classification accuracy across six classification tasks-site, subsite, laterality, histology, behavior, and grade. We expect that with minimal modifications, our add-on can be applied towards a wide range of other clinical text-based tasks

    Landmark Models for Optimizing the Use of Repeated Measurements of Risk Factors in Electronic Health Records to Predict Future Disease Risk.

    Get PDF
    The benefits of using electronic health records (EHRs) for disease risk screening and personalized health-care decisions are being increasingly recognized. Here we present a computationally feasible statistical approach with which to address the methodological challenges involved in utilizing historical repeat measures of multiple risk factors recorded in EHRs to systematically identify patients at high risk of future disease. The approach is principally based on a 2-stage dynamic landmark model. The first stage estimates current risk factor values from all available historical repeat risk factor measurements via landmark-age-specific multivariate linear mixed-effects models with correlated random intercepts, which account for sporadically recorded repeat measures, unobserved data, and measurement errors. The second stage predicts future disease risk from a sex-stratified Cox proportional hazards model, with estimated current risk factor values from the first stage. We exemplify these methods by developing and validating a dynamic 10-year cardiovascular disease risk prediction model using primary-care EHRs for age, diabetes status, hypertension treatment, smoking status, systolic blood pressure, total cholesterol, and high-density lipoprotein cholesterol in 41,373 persons from 10 primary-care practices in England and Wales contributing to The Health Improvement Network (1997-2016). Using cross-validation, the model was well-calibrated (Brier score = 0.041, 95% confidence interval: 0.039, 0.042) and had good discrimination (C-index = 0.768, 95% confidence interval: 0.759, 0.777)
    corecore