249 research outputs found
Unsupervised learning methods for identifying and evaluating disease clusters in electronic health records
Introduction
Clustering algorithms are a class of algorithms that can discover groups of observations in
complex data and are often used to identify subtypes of heterogeneous diseases in electronic
health records (EHR). Evaluating clustering experiments for biological and clinical significance is
a vital but challenging task due to the lack of consensus on best practices. As a result, the
translation of findings from clustering experiments to clinical practice is limited.
Aim
The aim of this thesis was to investigate and evaluate approaches that enable the evaluation of
clustering experiments using EHR.
Methods
We conducted a scoping review of clustering studies in EHR to identify common evaluation
approaches. We systematically investigated the performance of the identified approaches using
a cohort of Alzheimer's Disease (AD) patients as an exemplar comparing four different
clustering methods (K-means, Kernel K-means, Affinity Propagation and Latent Class
Analysis.). Using the same population, we developed and evaluated a method (MCHAMMER)
that tested whether clusterable structures exist in EHR. To develop this method we tested
several cluster validation indexes and methods of generating null data to see which are the best
at discovering clusters. In order to enable the robust benchmarking of evaluation approaches,
we created a tool that generated synthetic EHR data that contain known cluster labels across a
range of clustering scenarios.
Results
Across 67 EHR clustering studies, the most popular internal evaluation metric was comparing
cluster results across multiple algorithms (30% of studies). We examined this approach
conducting a clustering experiment on AD patients using a population of 10,065 AD patients and
21 demographic, symptom and comorbidity features. K-means found 5 clusters, Kernel K means found 2 clusters, Affinity propagation found 5 and latent class analysis found 6. K-means
4
was found to have the best clustering solution with the highest silhouette score (0.19) and was
more predictive of outcomes. The five clusters found were: typical AD (n=2026), non-typical AD
(n=1640), cardiovascular disease cluster (n=686), a cancer cluster (n=1710) and a cluster of
mental health issues, smoking and early disease onset (n=1528), which has been found in
previous research as well as in the results of other clustering methods. We created a synthetic
data generation tool which allows for the generation of realistic EHR clusters that can vary in
separation and number of noise variables to alter the difficulty of the clustering problem. We
found that decreasing cluster separation did increase cluster difficulty significantly whereas
noise variables increased cluster difficulty but not significantly. To develop the tool to assess
clusters existence we tested different methods of null dataset generation and cluster validation
indices, the best performing null dataset method was the min max method and the best
performing indices we Calinksi Harabasz index which had an accuracy of 94%, Davies Bouldin
index (97%) silhouette score ( 93%) and BWC index (90%). We further found that when clusters
were identified using the Calinski Harabasz index they were more likely to have significantly
different outcomes between clusters. Lastly we repeated the initial clustering experiment,
comparing 10 different pre-processing methods. The three best performing methods were RBF
kernel (2 clusters), MCA (4 clusters) and MCA and PCA (6 clusters). The MCA approach gave
the best results highest silhouette score (0.23) and meaningful clusters, producing 4 clusters;
heart and circulatory( n=1379), early onset mental health (n=1761), male cluster with memory
loss (n = 1823), female with more problem (n=2244).
Conclusion
We have developed and tested a series of methods and tools to enable the evaluation of EHR
clustering experiments. We developed and proposed a novel cluster evaluation metric and
provided a tool for benchmarking evaluation approaches in synthetic but realistic EHR
Processing of Electronic Health Records using Deep Learning: A review
Availability of large amount of clinical data is opening up new research
avenues in a number of fields. An exciting field in this respect is healthcare,
where secondary use of healthcare data is beginning to revolutionize
healthcare. Except for availability of Big Data, both medical data from
healthcare institutions (such as EMR data) and data generated from health and
wellbeing devices (such as personal trackers), a significant contribution to
this trend is also being made by recent advances on machine learning,
specifically deep learning algorithms
Recommended from our members
Artificial intelligence approaches to predicting and detecting cognitive decline in older adults: A conceptual review.
Preserving cognition and mental capacity is critical to aging with autonomy. Early detection of pathological cognitive decline facilitates the greatest impact of restorative or preventative treatments. Artificial Intelligence (AI) in healthcare is the use of computational algorithms that mimic human cognitive functions to analyze complex medical data. AI technologies like machine learning (ML) support the integration of biological, psychological, and social factors when approaching diagnosis, prognosis, and treatment of disease. This paper serves to acquaint clinicians and other stakeholders with the use, benefits, and limitations of AI for predicting, diagnosing, and classifying mild and major neurocognitive impairments, by providing a conceptual overview of this topic with emphasis on the features explored and AI techniques employed. We present studies that fell into six categories of features used for these purposes: (1) sociodemographics; (2) clinical and psychometric assessments; (3) neuroimaging and neurophysiology; (4) electronic health records and claims; (5) novel assessments (e.g., sensors for digital data); and (6) genomics/other omics. For each category we provide examples of AI approaches, including supervised and unsupervised ML, deep learning, and natural language processing. AI technology, still nascent in healthcare, has great potential to transform the way we diagnose and treat patients with neurocognitive disorders
Social and behavioral determinants of health in the era of artificial intelligence with electronic health records: A scoping review
Background: There is growing evidence that social and behavioral determinants
of health (SBDH) play a substantial effect in a wide range of health outcomes.
Electronic health records (EHRs) have been widely employed to conduct
observational studies in the age of artificial intelligence (AI). However,
there has been little research into how to make the most of SBDH information
from EHRs. Methods: A systematic search was conducted in six databases to find
relevant peer-reviewed publications that had recently been published. Relevance
was determined by screening and evaluating the articles. Based on selected
relevant studies, a methodological analysis of AI algorithms leveraging SBDH
information in EHR data was provided. Results: Our synthesis was driven by an
analysis of SBDH categories, the relationship between SBDH and
healthcare-related statuses, and several NLP approaches for extracting SDOH
from clinical literature. Discussion: The associations between SBDH and health
outcomes are complicated and diverse; several pathways may be involved. Using
Natural Language Processing (NLP) technology to support the extraction of SBDH
and other clinical ideas simplifies the identification and extraction of
essential concepts from clinical data, efficiently unlocks unstructured data,
and aids in the resolution of unstructured data-related issues. Conclusion:
Despite known associations between SBDH and disease, SBDH factors are rarely
investigated as interventions to improve patient outcomes. Gaining knowledge
about SBDH and how SBDH data can be collected from EHRs using NLP approaches
and predictive models improves the chances of influencing health policy change
for patient wellness, and ultimately promoting health and health equity.
Keywords: Social and Behavioral Determinants of Health, Artificial
Intelligence, Electronic Health Records, Natural Language Processing,
Predictive ModelComment: 32 pages, 5 figure
Improved Alzheimer’s disease detection by MRI using multimodal machine learning algorithms
Dementia is one of the huge medical problems that have challenged the public health
sector around the world. Moreover, it generally occurred in older adults (age > 60).
Shockingly, there are no legitimate drugs to fix this sickness, and once in a while it will
directly influence individual memory abilities and diminish the human capacity to perform
day by day exercises. Many health experts and computing scientists were performing
research works on this issue for the most recent twenty years. All things considered,
there is an immediate requirement for finding the relative characteristics that can figure
out the identification of dementia.
The motive behind the works presented in this thesis is to propose the sophisticated
supervised machine learning model in the prediction and classification of AD in elder
people. For that, we conducted different experiments on open access brain image
information including demographic MRI data of 373 scan sessions of 150 patients. In the
first two works, we applied single ML models called support vectors and pruned decision
trees for the prediction of dementia on the same dataset. In the first experiment with
SVM, we achieved 70% of the prediction accuracy of late-stage dementia. Classification
of true dementia subjects (precision) is calculated as 75%. Similarly, in the second
experiment with J48 pruned decision trees, the accuracy was improved to the value of
88.73%. Classification of true dementia cases with this model was comprehensively done
and achieved 92.4% of precision.
To enhance this work, rather than single modelling we employed multi-modelling
approaches. In the comparative analysis of the machine learning study, we applied the
feature reduction technique called principal component analysis. This approach identifies
the high correlated features in the dataset that are closely associated with dementia
type. By doing the simultaneous application of three models such as KNN, LR, and SVM,
it has been possible to identify an ideal model for the classification of dementia subjects.
When compared with support vectors, KNN and LR models comprehensively classified
AD subjects with 97.6% and 98.3% of accuracy respectively. These values are relatively
higher than the previous experiments.
However, because of the AD severity in older adults, it should be mandatory to not leave
true AD positives. For the classification of true AD subjects among total subjects, we
enhanced the model accuracy by introducing three independent experiments. In this
work, we incorporated two new models called Naïve Bayes and Artificial Neural Networks
along support vectors and KNN. In the first experiment, models were independently
developed with manual feature selection. The experimental outcome suggested that KNN
3
is the optimal model solution because of 91.32% of classification accuracy. In the second
experiment, the same models were tested with limited features (with high correlation).
SVM was produced a high 96.12% of classification accuracy and NB produced a 98.21%
classification rate of true AD subjects. Ultimately, in the third experiment, we mixed
these four models and created a new model called hybrid type modelling. Hybrid model
performance is validated AU-ROC curve value which is 0.991 (i.e., 99.1% of classification
accuracy) has achieved. All these experimental results suggested that the ensemble
modelling approach with wrapping is an optimal solution in the classification of AD
subjects
Quantifying cognitive and mortality outcomes in older patients following acute illness using epidemiological and machine learning approaches
Introduction:
Cognitive and functional decompensation during acute illness in older people are poorly understood. It remains unclear how delirium, an acute confusional state reflective of cognitive decompensation, is contextualised by baseline premorbid cognition and relates to long-term adverse outcomes. High-dimensional machine learning offers a novel, feasible and enticing approach for stratifying acute illness in older people, improving treatment consistency while optimising future research design.
Methods:
Longitudinal associations were analysed from the Delirium and Population Health Informatics Cohort (DELPHIC) study, a prospective cohort ≥70 years resident in Camden, with cognitive and functional ascertainment at baseline and 2-year follow-up, and daily assessments during incident hospitalisation. Second, using routine clinical data from UCLH, I constructed an extreme gradient-boosted trees predicting 600-day mortality for unselected acute admissions of oldest-old patients with mechanistic inferences. Third, hierarchical agglomerative clustering was performed to demonstrate structure within DELPHIC participants, with predictive implications for survival and length of stay.
Results:
i. Delirium is associated with increased rates of cognitive decline and mortality risk, in a dose-dependent manner, with an interaction between baseline cognition and delirium exposure. Those with highest delirium exposure but also best premorbid cognition have the “most to lose”.
ii. High-dimensional multimodal machine learning models can predict mortality in oldest-old populations with 0.874 accuracy. The anterior cingulate and angular gyri, and extracranial soft tissue, are the highest contributory intracranial and extracranial features respectively.
iii. Clinically useful acute illness subtypes in older people can be described using longitudinal clinical, functional, and biochemical features.
Conclusions:
Interactions between baseline cognition and delirium exposure during acute illness in older patients result in divergent long-term adverse outcomes. Supervised machine learning can robustly predict mortality in in oldest-old patients, producing a valuable prognostication tool using routinely collected data, ready for clinical deployment. Preliminary findings suggest possible discernible subtypes within acute illness in older people
Predictive analytics applied to Alzheimer’s disease : a data visualisation framework for understanding current research and future challenges
Dissertation as a partial requirement for obtaining a master’s degree in information management, with a specialisation in Business Intelligence and Knowledge Management.Big Data is, nowadays, regarded as a tool for improving the healthcare sector in many areas, such as in its economic side, by trying to search for operational efficiency gaps, and in personalised treatment, by selecting the best drug for the patient, for instance. Data science can play a key role in identifying diseases in an early stage, or even when there are no signs of it, track its progress, quickly identify the efficacy of treatments and suggest alternative ones. Therefore, the prevention side of healthcare can be enhanced with the usage of state-of-the-art predictive big data analytics and machine learning methods, integrating the available, complex, heterogeneous, yet sparse, data from multiple sources, towards a better disease and pathology patterns identification. It can be applied for the diagnostic challenging neurodegenerative disorders; the identification of the patterns that trigger those disorders can make possible to identify more risk factors, biomarkers, in every human being. With that, we can improve the effectiveness of the medical interventions, helping people to stay healthy and active for a longer period. In this work, a review of the state of science about predictive big data analytics is done, concerning its application to Alzheimer’s Disease early diagnosis. It is done by searching and summarising the scientific articles published in respectable online sources, putting together all the information that is spread out in the world wide web, with the goal of enhancing knowledge management and collaboration practices about the topic. Furthermore, an interactive data visualisation tool to better manage and identify the scientific articles is develop, delivering, in this way, a holistic visual overview of the developments done in the important field of Alzheimer’s Disease diagnosis.Big Data é hoje considerada uma ferramenta para melhorar o sector da saúde em muitas áreas, tais como na sua vertente mais económica, tentando encontrar lacunas de eficiência operacional, e no tratamento personalizado, selecionando o melhor medicamento para o paciente, por exemplo. A ciência de dados pode desempenhar um papel fundamental na identificação de doenças em um estágio inicial, ou mesmo quando não há sinais dela, acompanhar o seu progresso, identificar rapidamente a eficácia dos tratamentos indicados ao paciente e sugerir alternativas. Portanto, o lado preventivo dos cuidados de saúde pode ser bastante melhorado com o uso de métodos avançados de análise preditiva com big data e de machine learning, integrando os dados disponíveis, geralmente complexos, heterogéneos e esparsos provenientes de múltiplas fontes, para uma melhor identificação de padrões patológicos e da doença. Estes métodos podem ser aplicados nas doenças neurodegenerativas que ainda são um grande desafio no seu diagnóstico; a identificação dos padrões que desencadeiam esses distúrbios pode possibilitar a identificação de mais fatores de risco, biomarcadores, em todo e qualquer ser humano. Com isso, podemos melhorar a eficácia das intervenções médicas, ajudando as pessoas a permanecerem saudáveis e ativas por um período mais longo. Neste trabalho, é feita uma revisão do estado da arte sobre a análise preditiva com big data, no que diz respeito à sua aplicação ao diagnóstico precoce da Doença de Alzheimer. Isto foi realizado através da pesquisa exaustiva e resumo de um grande número de artigos científicos publicados em fontes online de referência na área, reunindo a informação que está amplamente espalhada na world wide web, com o objetivo de aprimorar a gestão do conhecimento e as práticas de colaboração sobre o tema. Além disso, uma ferramenta interativa de visualização de dados para melhor gerir e identificar os artigos científicos foi desenvolvida, fornecendo, desta forma, uma visão holística dos avanços científico feitos no importante campo do diagnóstico da Doença de Alzheimer
Artificial intelligence for dementia research methods optimization
Artificial intelligence (AI) and machine learning (ML) approaches are increasingly being used in dementia research. However, several methodological challenges exist that may limit the insights we can obtain from high-dimensional data and our ability to translate these findings into improved patient outcomes. To improve reproducibility and replicability, researchers should make their well-documented code and modeling pipelines openly available. Data should also be shared where appropriate. To enhance the acceptability of models and AI-enabled systems to users, researchers should prioritize interpretable methods that provide insights into how decisions are generated. Models should be developed using multiple, diverse datasets to improve robustness, generalizability, and reduce potentially harmful bias. To improve clarity and reproducibility, researchers should adhere to reporting guidelines that are co-produced with multiple stakeholders. If these methodological challenges are overcome, AI and ML hold enormous promise for changing the landscape of dementia research and care. HIGHLIGHTS: Machine learning (ML) can improve diagnosis, prevention, and management of dementia. Inadequate reporting of ML procedures affects reproduction/replication of results. ML models built on unrepresentative datasets do not generalize to new datasets. Obligatory metrics for certain model structures and use cases have not been defined. Interpretability and trust in ML predictions are barriers to clinical translation
- …