249 research outputs found

    Unsupervised learning methods for identifying and evaluating disease clusters in electronic health records

    Get PDF
    Introduction Clustering algorithms are a class of algorithms that can discover groups of observations in complex data and are often used to identify subtypes of heterogeneous diseases in electronic health records (EHR). Evaluating clustering experiments for biological and clinical significance is a vital but challenging task due to the lack of consensus on best practices. As a result, the translation of findings from clustering experiments to clinical practice is limited. Aim The aim of this thesis was to investigate and evaluate approaches that enable the evaluation of clustering experiments using EHR. Methods We conducted a scoping review of clustering studies in EHR to identify common evaluation approaches. We systematically investigated the performance of the identified approaches using a cohort of Alzheimer's Disease (AD) patients as an exemplar comparing four different clustering methods (K-means, Kernel K-means, Affinity Propagation and Latent Class Analysis.). Using the same population, we developed and evaluated a method (MCHAMMER) that tested whether clusterable structures exist in EHR. To develop this method we tested several cluster validation indexes and methods of generating null data to see which are the best at discovering clusters. In order to enable the robust benchmarking of evaluation approaches, we created a tool that generated synthetic EHR data that contain known cluster labels across a range of clustering scenarios. Results Across 67 EHR clustering studies, the most popular internal evaluation metric was comparing cluster results across multiple algorithms (30% of studies). We examined this approach conducting a clustering experiment on AD patients using a population of 10,065 AD patients and 21 demographic, symptom and comorbidity features. K-means found 5 clusters, Kernel K means found 2 clusters, Affinity propagation found 5 and latent class analysis found 6. K-means 4 was found to have the best clustering solution with the highest silhouette score (0.19) and was more predictive of outcomes. The five clusters found were: typical AD (n=2026), non-typical AD (n=1640), cardiovascular disease cluster (n=686), a cancer cluster (n=1710) and a cluster of mental health issues, smoking and early disease onset (n=1528), which has been found in previous research as well as in the results of other clustering methods. We created a synthetic data generation tool which allows for the generation of realistic EHR clusters that can vary in separation and number of noise variables to alter the difficulty of the clustering problem. We found that decreasing cluster separation did increase cluster difficulty significantly whereas noise variables increased cluster difficulty but not significantly. To develop the tool to assess clusters existence we tested different methods of null dataset generation and cluster validation indices, the best performing null dataset method was the min max method and the best performing indices we Calinksi Harabasz index which had an accuracy of 94%, Davies Bouldin index (97%) silhouette score ( 93%) and BWC index (90%). We further found that when clusters were identified using the Calinski Harabasz index they were more likely to have significantly different outcomes between clusters. Lastly we repeated the initial clustering experiment, comparing 10 different pre-processing methods. The three best performing methods were RBF kernel (2 clusters), MCA (4 clusters) and MCA and PCA (6 clusters). The MCA approach gave the best results highest silhouette score (0.23) and meaningful clusters, producing 4 clusters; heart and circulatory( n=1379), early onset mental health (n=1761), male cluster with memory loss (n = 1823), female with more problem (n=2244). Conclusion We have developed and tested a series of methods and tools to enable the evaluation of EHR clustering experiments. We developed and proposed a novel cluster evaluation metric and provided a tool for benchmarking evaluation approaches in synthetic but realistic EHR

    Processing of Electronic Health Records using Deep Learning: A review

    Full text link
    Availability of large amount of clinical data is opening up new research avenues in a number of fields. An exciting field in this respect is healthcare, where secondary use of healthcare data is beginning to revolutionize healthcare. Except for availability of Big Data, both medical data from healthcare institutions (such as EMR data) and data generated from health and wellbeing devices (such as personal trackers), a significant contribution to this trend is also being made by recent advances on machine learning, specifically deep learning algorithms

    Social and behavioral determinants of health in the era of artificial intelligence with electronic health records: A scoping review

    Full text link
    Background: There is growing evidence that social and behavioral determinants of health (SBDH) play a substantial effect in a wide range of health outcomes. Electronic health records (EHRs) have been widely employed to conduct observational studies in the age of artificial intelligence (AI). However, there has been little research into how to make the most of SBDH information from EHRs. Methods: A systematic search was conducted in six databases to find relevant peer-reviewed publications that had recently been published. Relevance was determined by screening and evaluating the articles. Based on selected relevant studies, a methodological analysis of AI algorithms leveraging SBDH information in EHR data was provided. Results: Our synthesis was driven by an analysis of SBDH categories, the relationship between SBDH and healthcare-related statuses, and several NLP approaches for extracting SDOH from clinical literature. Discussion: The associations between SBDH and health outcomes are complicated and diverse; several pathways may be involved. Using Natural Language Processing (NLP) technology to support the extraction of SBDH and other clinical ideas simplifies the identification and extraction of essential concepts from clinical data, efficiently unlocks unstructured data, and aids in the resolution of unstructured data-related issues. Conclusion: Despite known associations between SBDH and disease, SBDH factors are rarely investigated as interventions to improve patient outcomes. Gaining knowledge about SBDH and how SBDH data can be collected from EHRs using NLP approaches and predictive models improves the chances of influencing health policy change for patient wellness, and ultimately promoting health and health equity. Keywords: Social and Behavioral Determinants of Health, Artificial Intelligence, Electronic Health Records, Natural Language Processing, Predictive ModelComment: 32 pages, 5 figure

    Improved Alzheimer’s disease detection by MRI using multimodal machine learning algorithms

    Get PDF
    Dementia is one of the huge medical problems that have challenged the public health sector around the world. Moreover, it generally occurred in older adults (age > 60). Shockingly, there are no legitimate drugs to fix this sickness, and once in a while it will directly influence individual memory abilities and diminish the human capacity to perform day by day exercises. Many health experts and computing scientists were performing research works on this issue for the most recent twenty years. All things considered, there is an immediate requirement for finding the relative characteristics that can figure out the identification of dementia. The motive behind the works presented in this thesis is to propose the sophisticated supervised machine learning model in the prediction and classification of AD in elder people. For that, we conducted different experiments on open access brain image information including demographic MRI data of 373 scan sessions of 150 patients. In the first two works, we applied single ML models called support vectors and pruned decision trees for the prediction of dementia on the same dataset. In the first experiment with SVM, we achieved 70% of the prediction accuracy of late-stage dementia. Classification of true dementia subjects (precision) is calculated as 75%. Similarly, in the second experiment with J48 pruned decision trees, the accuracy was improved to the value of 88.73%. Classification of true dementia cases with this model was comprehensively done and achieved 92.4% of precision. To enhance this work, rather than single modelling we employed multi-modelling approaches. In the comparative analysis of the machine learning study, we applied the feature reduction technique called principal component analysis. This approach identifies the high correlated features in the dataset that are closely associated with dementia type. By doing the simultaneous application of three models such as KNN, LR, and SVM, it has been possible to identify an ideal model for the classification of dementia subjects. When compared with support vectors, KNN and LR models comprehensively classified AD subjects with 97.6% and 98.3% of accuracy respectively. These values are relatively higher than the previous experiments. However, because of the AD severity in older adults, it should be mandatory to not leave true AD positives. For the classification of true AD subjects among total subjects, we enhanced the model accuracy by introducing three independent experiments. In this work, we incorporated two new models called Naïve Bayes and Artificial Neural Networks along support vectors and KNN. In the first experiment, models were independently developed with manual feature selection. The experimental outcome suggested that KNN 3 is the optimal model solution because of 91.32% of classification accuracy. In the second experiment, the same models were tested with limited features (with high correlation). SVM was produced a high 96.12% of classification accuracy and NB produced a 98.21% classification rate of true AD subjects. Ultimately, in the third experiment, we mixed these four models and created a new model called hybrid type modelling. Hybrid model performance is validated AU-ROC curve value which is 0.991 (i.e., 99.1% of classification accuracy) has achieved. All these experimental results suggested that the ensemble modelling approach with wrapping is an optimal solution in the classification of AD subjects

    Quantifying cognitive and mortality outcomes in older patients following acute illness using epidemiological and machine learning approaches

    Get PDF
    Introduction: Cognitive and functional decompensation during acute illness in older people are poorly understood. It remains unclear how delirium, an acute confusional state reflective of cognitive decompensation, is contextualised by baseline premorbid cognition and relates to long-term adverse outcomes. High-dimensional machine learning offers a novel, feasible and enticing approach for stratifying acute illness in older people, improving treatment consistency while optimising future research design. Methods: Longitudinal associations were analysed from the Delirium and Population Health Informatics Cohort (DELPHIC) study, a prospective cohort ≥70 years resident in Camden, with cognitive and functional ascertainment at baseline and 2-year follow-up, and daily assessments during incident hospitalisation. Second, using routine clinical data from UCLH, I constructed an extreme gradient-boosted trees predicting 600-day mortality for unselected acute admissions of oldest-old patients with mechanistic inferences. Third, hierarchical agglomerative clustering was performed to demonstrate structure within DELPHIC participants, with predictive implications for survival and length of stay. Results: i. Delirium is associated with increased rates of cognitive decline and mortality risk, in a dose-dependent manner, with an interaction between baseline cognition and delirium exposure. Those with highest delirium exposure but also best premorbid cognition have the “most to lose”. ii. High-dimensional multimodal machine learning models can predict mortality in oldest-old populations with 0.874 accuracy. The anterior cingulate and angular gyri, and extracranial soft tissue, are the highest contributory intracranial and extracranial features respectively. iii. Clinically useful acute illness subtypes in older people can be described using longitudinal clinical, functional, and biochemical features. Conclusions: Interactions between baseline cognition and delirium exposure during acute illness in older patients result in divergent long-term adverse outcomes. Supervised machine learning can robustly predict mortality in in oldest-old patients, producing a valuable prognostication tool using routinely collected data, ready for clinical deployment. Preliminary findings suggest possible discernible subtypes within acute illness in older people

    Predictive analytics applied to Alzheimer’s disease : a data visualisation framework for understanding current research and future challenges

    Get PDF
    Dissertation as a partial requirement for obtaining a master’s degree in information management, with a specialisation in Business Intelligence and Knowledge Management.Big Data is, nowadays, regarded as a tool for improving the healthcare sector in many areas, such as in its economic side, by trying to search for operational efficiency gaps, and in personalised treatment, by selecting the best drug for the patient, for instance. Data science can play a key role in identifying diseases in an early stage, or even when there are no signs of it, track its progress, quickly identify the efficacy of treatments and suggest alternative ones. Therefore, the prevention side of healthcare can be enhanced with the usage of state-of-the-art predictive big data analytics and machine learning methods, integrating the available, complex, heterogeneous, yet sparse, data from multiple sources, towards a better disease and pathology patterns identification. It can be applied for the diagnostic challenging neurodegenerative disorders; the identification of the patterns that trigger those disorders can make possible to identify more risk factors, biomarkers, in every human being. With that, we can improve the effectiveness of the medical interventions, helping people to stay healthy and active for a longer period. In this work, a review of the state of science about predictive big data analytics is done, concerning its application to Alzheimer’s Disease early diagnosis. It is done by searching and summarising the scientific articles published in respectable online sources, putting together all the information that is spread out in the world wide web, with the goal of enhancing knowledge management and collaboration practices about the topic. Furthermore, an interactive data visualisation tool to better manage and identify the scientific articles is develop, delivering, in this way, a holistic visual overview of the developments done in the important field of Alzheimer’s Disease diagnosis.Big Data é hoje considerada uma ferramenta para melhorar o sector da saúde em muitas áreas, tais como na sua vertente mais económica, tentando encontrar lacunas de eficiência operacional, e no tratamento personalizado, selecionando o melhor medicamento para o paciente, por exemplo. A ciência de dados pode desempenhar um papel fundamental na identificação de doenças em um estágio inicial, ou mesmo quando não há sinais dela, acompanhar o seu progresso, identificar rapidamente a eficácia dos tratamentos indicados ao paciente e sugerir alternativas. Portanto, o lado preventivo dos cuidados de saúde pode ser bastante melhorado com o uso de métodos avançados de análise preditiva com big data e de machine learning, integrando os dados disponíveis, geralmente complexos, heterogéneos e esparsos provenientes de múltiplas fontes, para uma melhor identificação de padrões patológicos e da doença. Estes métodos podem ser aplicados nas doenças neurodegenerativas que ainda são um grande desafio no seu diagnóstico; a identificação dos padrões que desencadeiam esses distúrbios pode possibilitar a identificação de mais fatores de risco, biomarcadores, em todo e qualquer ser humano. Com isso, podemos melhorar a eficácia das intervenções médicas, ajudando as pessoas a permanecerem saudáveis e ativas por um período mais longo. Neste trabalho, é feita uma revisão do estado da arte sobre a análise preditiva com big data, no que diz respeito à sua aplicação ao diagnóstico precoce da Doença de Alzheimer. Isto foi realizado através da pesquisa exaustiva e resumo de um grande número de artigos científicos publicados em fontes online de referência na área, reunindo a informação que está amplamente espalhada na world wide web, com o objetivo de aprimorar a gestão do conhecimento e as práticas de colaboração sobre o tema. Além disso, uma ferramenta interativa de visualização de dados para melhor gerir e identificar os artigos científicos foi desenvolvida, fornecendo, desta forma, uma visão holística dos avanços científico feitos no importante campo do diagnóstico da Doença de Alzheimer

    Artificial intelligence for dementia research methods optimization

    Get PDF
    Artificial intelligence (AI) and machine learning (ML) approaches are increasingly being used in dementia research. However, several methodological challenges exist that may limit the insights we can obtain from high-dimensional data and our ability to translate these findings into improved patient outcomes. To improve reproducibility and replicability, researchers should make their well-documented code and modeling pipelines openly available. Data should also be shared where appropriate. To enhance the acceptability of models and AI-enabled systems to users, researchers should prioritize interpretable methods that provide insights into how decisions are generated. Models should be developed using multiple, diverse datasets to improve robustness, generalizability, and reduce potentially harmful bias. To improve clarity and reproducibility, researchers should adhere to reporting guidelines that are co-produced with multiple stakeholders. If these methodological challenges are overcome, AI and ML hold enormous promise for changing the landscape of dementia research and care. HIGHLIGHTS: Machine learning (ML) can improve diagnosis, prevention, and management of dementia. Inadequate reporting of ML procedures affects reproduction/replication of results. ML models built on unrepresentative datasets do not generalize to new datasets. Obligatory metrics for certain model structures and use cases have not been defined. Interpretability and trust in ML predictions are barriers to clinical translation