1,572 research outputs found
Synthetic Observational Health Data with GANs: from slow adoption to a boom in medical research and ultimately digital twins?
After being collected for patient care, Observational Health Data (OHD) can
further benefit patient well-being by sustaining the development of health
informatics and medical research. Vast potential is unexploited because of the
fiercely private nature of patient-related data and regulations to protect it.
Generative Adversarial Networks (GANs) have recently emerged as a
groundbreaking way to learn generative models that produce realistic synthetic
data. They have revolutionized practices in multiple domains such as
self-driving cars, fraud detection, digital twin simulations in industrial
sectors, and medical imaging.
The digital twin concept could readily apply to modelling and quantifying
disease progression. In addition, GANs posses many capabilities relevant to
common problems in healthcare: lack of data, class imbalance, rare diseases,
and preserving privacy. Unlocking open access to privacy-preserving OHD could
be transformative for scientific research. In the midst of COVID-19, the
healthcare system is facing unprecedented challenges, many of which of are data
related for the reasons stated above.
Considering these facts, publications concerning GAN applied to OHD seemed to
be severely lacking. To uncover the reasons for this slow adoption, we broadly
reviewed the published literature on the subject. Our findings show that the
properties of OHD were initially challenging for the existing GAN algorithms
(unlike medical imaging, for which state-of-the-art model were directly
transferable) and the evaluation synthetic data lacked clear metrics.
We find more publications on the subject than expected, starting slowly in
2017, and since then at an increasing rate. The difficulties of OHD remain, and
we discuss issues relating to evaluation, consistency, benchmarking, data
modelling, and reproducibility.Comment: 31 pages (10 in previous version), not including references and
glossary, 51 in total. Inclusion of a large number of recent publications and
expansion of the discussion accordingl
The effect of google drive distance and duration in residential property in Sydney, Australia
© 2016 by World Scientific Publishing Co. Pte. Ltd. Predicting the market value of a residential property accurately without inspection by professional valuer could be beneficial for vary of organization and people. Building an Automated Valuation Model could be beneficial if it will be accurate adequately. This paper examined 47 machine learning models (linear and non-linear). These models are fitted on 1967 records of units from 19 suburbs of Sydney, Australia. The main aim of this paper is to compare the performance of these techniques using this data set and investigate the effect of spatial information on valuation accuracy. The results demonstrated that tree models named eXtreme Gradient Boosting Linear, eXtreme Gradient Boosting Tree and Random Forest respectively have best performance among other techniques and spatial information such drive distance and duration to CBD increase the predictive model performance significantly
Recommended from our members
Predicting visual function from the measurements of retinal nerve fiber layer structure
Purpose: To develop and validate a method for predicting visual function from retinal nerve fibre layer (RNFL) structure in glaucoma.
Methods: RNFL thickness (RNFLT) measurements from GDxVCC scanning laser polarimetry (SLP) and visual field (VF) sensitivity from standard automated perimetry were made available from 535 eyes from three centres. In a training dataset, structure-function relationships were characterized using linear regression and a type of neural network: Radial Basis Function customised under a Bayesian framework (BRBF). These two models were used in a test dataset to 1) predict sensitivity values at individual VF locations from RNFLT measurements and 2) predict the spatial relationship between VF locations and positions at a peripapillary RNFLT measurement annulus. Predicted spatial relationships were compared with a published anatomical structure-function map.
Results: Compared with linear regression, BRBF yielded a nearly two-fold improvement (P<0.001; paired t-test) in performance of predicting VF sensitivity in the test dataset (mean absolute prediction error of 2.9dB (standard deviation (SD) 3.7dB) versus 4.9dB (SD 4.0dB)). The predicted spatial structure-function relationship accorded better (P<0.001; paired t-test) with anatomical prior knowledge when the BRBF was compared with the linear regression (median absolute angular difference of 15° versus 62°).
Conclusions: The BRBF generates clinically useful relationships that relate topographical maps of RNFL measurement to VF locations and allows the VF sensitivity to be predicted from structural measurements. This method may allow clinicians to evaluate structural and functional measures in the same domain. It could also be generalized to use other structural measures
Improved Alzheimer’s disease detection by MRI using multimodal machine learning algorithms
Dementia is one of the huge medical problems that have challenged the public health
sector around the world. Moreover, it generally occurred in older adults (age > 60).
Shockingly, there are no legitimate drugs to fix this sickness, and once in a while it will
directly influence individual memory abilities and diminish the human capacity to perform
day by day exercises. Many health experts and computing scientists were performing
research works on this issue for the most recent twenty years. All things considered,
there is an immediate requirement for finding the relative characteristics that can figure
out the identification of dementia.
The motive behind the works presented in this thesis is to propose the sophisticated
supervised machine learning model in the prediction and classification of AD in elder
people. For that, we conducted different experiments on open access brain image
information including demographic MRI data of 373 scan sessions of 150 patients. In the
first two works, we applied single ML models called support vectors and pruned decision
trees for the prediction of dementia on the same dataset. In the first experiment with
SVM, we achieved 70% of the prediction accuracy of late-stage dementia. Classification
of true dementia subjects (precision) is calculated as 75%. Similarly, in the second
experiment with J48 pruned decision trees, the accuracy was improved to the value of
88.73%. Classification of true dementia cases with this model was comprehensively done
and achieved 92.4% of precision.
To enhance this work, rather than single modelling we employed multi-modelling
approaches. In the comparative analysis of the machine learning study, we applied the
feature reduction technique called principal component analysis. This approach identifies
the high correlated features in the dataset that are closely associated with dementia
type. By doing the simultaneous application of three models such as KNN, LR, and SVM,
it has been possible to identify an ideal model for the classification of dementia subjects.
When compared with support vectors, KNN and LR models comprehensively classified
AD subjects with 97.6% and 98.3% of accuracy respectively. These values are relatively
higher than the previous experiments.
However, because of the AD severity in older adults, it should be mandatory to not leave
true AD positives. For the classification of true AD subjects among total subjects, we
enhanced the model accuracy by introducing three independent experiments. In this
work, we incorporated two new models called Naïve Bayes and Artificial Neural Networks
along support vectors and KNN. In the first experiment, models were independently
developed with manual feature selection. The experimental outcome suggested that KNN
3
is the optimal model solution because of 91.32% of classification accuracy. In the second
experiment, the same models were tested with limited features (with high correlation).
SVM was produced a high 96.12% of classification accuracy and NB produced a 98.21%
classification rate of true AD subjects. Ultimately, in the third experiment, we mixed
these four models and created a new model called hybrid type modelling. Hybrid model
performance is validated AU-ROC curve value which is 0.991 (i.e., 99.1% of classification
accuracy) has achieved. All these experimental results suggested that the ensemble
modelling approach with wrapping is an optimal solution in the classification of AD
subjects
Machine Learning for Diabetes and Mortality Risk Prediction From Electronic Health Records
Data science can provide invaluable tools to better exploit healthcare data to improve patient outcomes and increase cost-effectiveness. Today, electronic health records (EHR) systems provide a fascinating array of data that data science applications can use to revolutionise the healthcare industry. Utilising EHR data to improve the early diagnosis of a variety of medical conditions/events is a rapidly developing area that, if successful, can help to improve healthcare services across the board. Specifically, as Type-2 Diabetes Mellitus (T2DM) represents one of the most serious threats to health across the globe, analysing the huge volumes of data provided by EHR systems to investigate approaches for early accurately predicting the onset of T2DM, and medical events such as in-hospital mortality, are two of the most important challenges data science currently faces. The present thesis addresses these challenges by examining the research gaps in the existing literature, pinpointing the un-investigated areas, and proposing a novel machine learning modelling given the difficulties inherent in EHR data.
To achieve these aims, the present thesis firstly introduces a unique and large EHR dataset collected from Saudi Arabia. Then we investigate the use of a state-of-the-art machine learning predictive models that exploits this dataset for diabetes diagnosis and the early identification of patients with pre-diabetes by predicting the blood levels of one of the main indicators of diabetes and pre-diabetes: elevated Glycated Haemoglobin (HbA1c) levels. A novel collaborative denoising autoencoder (Col-DAE) framework is adopted to predict the diabetes (high) HbA1c levels. We also employ several machine learning approaches (random forest, logistic regression, support vector machine, and multilayer perceptron) for the identification of patients with pre-diabetes (elevated HbA1c levels). The models employed demonstrate that a patient's risk of diabetes/pre-diabetes can be reliably predicted from EHR records.
We then extend this work to include pioneering adoption of recent technologies to investigate the outcomes of the predictive models employed by using recent explainable methods. This work also investigates the effect of using longitudinal data and more of the features available in the EHR systems on the performance and features ranking of the employed machine learning models for predicting elevated HbA1c levels in non-diabetic patients. This work demonstrates that longitudinal data and available EHR features can improve the performance of the machine learning models and can affect the relative order of importance of the features.
Secondly, we develop a machine learning model for the early and accurate prediction all in-hospital mortality events for such patients utilising EHR data. This work investigates a novel application of the Stacked Denoising Autoencoder (SDA) to predict in-hospital patient mortality risk. In doing so, we demonstrate how our approach uniquely overcomes the issues associated with imbalanced datasets to which existing solutions are subject. The proposed model –– using clinical patient data on a variety of health conditions and without intensive feature engineering –– is demonstrated to achieve robust and promising results using EHR patient data recorded during the first 24 hours after admission
Developing an ML pipeline for asthma and COPD: The case of a Dutch primary care service
A complex combination of clinical, demographic and lifestyle parameters determines the correct diagnosis and the most effective treatment for asthma and Chronic Obstructive Pulmonary Disease patients. Artificial Intelligence techniques help clinicians in devising the correct diagnosis and designing the most suitable clinical pathway accordingly, tailored to the specific patient conditions. In the case of machine learning (ML) approaches, availability of real-world patient clinical data to train and evaluate the ML pipeline deputed to assist clinicians in their daily practice is crucial. However, it is common practice to exploit either synthetic data sets or heavily preprocessed collections cleaning and merging different data sources. In this paper, we describe an automated ML pipeline designed for a real-world data set including patients from a Dutch primary care service, and provide a performance comparison of different prediction models for (i) assessing various clinical parameters, (ii) designing interventions, and (iii) defining the diagnosis
A Bayesian graph embedding model for link-based classification problems
In recent years, the analysis of human interaction data has led to the rapid development of graph embedding methods. For link-based classification problems, topological information typically appears in various machine learning tasks in the form of embedded vectors or convolution kernels. This paper introduces a Bayesian graph embedding model for such problems, integrating network reconstruction, link prediction, and behavior prediction into a unified framework. Unlike the existing graph embedding methods, this model does not embed the topology of nodes or links into a low-dimensional space but sorts the probabilities of upcoming links and fuses the information of node topology and data domain via sorting. The new model integrates supervised transaction predictors with unsupervised link prediction models, summarizing local and global topological information. The experimental results on a financial trading dataset and a retweet network dataset demonstrate that the proposed feature fusion model outperforms the tested benchmarked machine learning algorithms in precision, recall, and F1-measure. The proposed learning structure has a fundamental methodological contribution and can be extended and applied to various link-based classification problems in different fields
- …