329 research outputs found
Comparison of Different Machine Learning and Self-Learning Methods for Predicting Obesity on Generalized and Gender-Segregated Data
Obesity is a global health concern with long-term implications. Our research applies numerous Machine Learning models consisting of Random Forest model, XGBT(Extreme Gradient Boosting) model, Decision Tree model, k-Nearest Neighbors technique, Support Vector Machine model, Linear Regression model, Naïve Bayes classifier and a neural network named Multilayer Perceptron on an obesity dataset so that we can predict obesity and reduce it. The models are evaluated on recall, accuracy, F1-score, and precision. The findings reveal the performance of the algorithms on generalised and gender-segregated data providing insights concerning feature selection and early obesity identification. This research aims to demonstrate the comparative study of obesity prediction for gender-neutral and gender-specific datasets
Identifying undetected dementia in UK primary care patients: a retrospective case-control study comparing machine-learning and standard epidemiological approaches
Background
Identifying dementia early in time, using real world data, is a public health challenge. As only two-thirds of people with dementia now ultimately receive a formal diagnosis in United Kingdom health systems and many receive it late in the disease process, there is ample room for improvement. The policy of the UK government and National Health Service (NHS) is to increase rates of timely dementia diagnosis. We used data from general practice (GP) patient records to create a machine-learning model to identify patients who have or who are developing dementia, but are currently undetected as having the condition by the GP.
Methods
We used electronic patient records from Clinical Practice Research Datalink (CPRD). Using a case-control design, we selected patients aged >65y with a diagnosis of dementia (cases) and matched them 1:1 by sex and age to patients with no evidence of dementia (controls). We developed a list of 70 clinical entities related to the onset of dementia and recorded in the 5 years before diagnosis. After creating binary features, we trialled machine learning classifiers to discriminate between cases and controls (logistic regression, naïve Bayes, support vector machines, random forest and neural networks). We examined the most important features contributing to discrimination.
Results
The final analysis included data on 93,120 patients, with a median age of 82.6 years; 64.8% were female. The naïve Bayes model performed least well. The logistic regression, support vector machine, neural network and random forest performed very similarly with an AUROC of 0.74. The top features retained in the logistic regression model were disorientation and wandering, behaviour change, schizophrenia, self-neglect, and difficulty managing.
Conclusions
Our model could aid GPs or health service planners with the early detection of dementia. Future work could improve the model by exploring the longitudinal nature of patient data and modelling decline in function over time
Review of Wearable Devices and Data Collection Considerations for Connected Health
Wearable sensor technology has gradually extended its usability into a wide range of well-known applications. Wearable sensors can typically assess and quantify the wearer’s physiology and are commonly employed for human activity detection and quantified self-assessment. Wearable sensors are increasingly utilised to monitor patient health, rapidly assist with disease diagnosis, and help predict and often improve patient outcomes. Clinicians use various self-report questionnaires and well-known tests to report patient symptoms and assess their functional ability. These assessments are time consuming and costly and depend on subjective patient recall. Moreover, measurements may not accurately demonstrate the patient’s functional ability whilst at home. Wearable sensors can be used to detect and quantify specific movements in different applications. The volume of data collected by wearable sensors during long-term assessment of ambulatory movement can become immense in tuple size. This paper discusses current techniques used to track and record various human body movements, as well as techniques used to measure activity and sleep from long-term data collected by wearable technology devices
Machine learning in the social and health sciences
The uptake of machine learning (ML) approaches in the social and health
sciences has been rather slow, and research using ML for social and health
research questions remains fragmented. This may be due to the separate
development of research in the computational/data versus social and health
sciences as well as a lack of accessible overviews and adequate training in ML
techniques for non data science researchers. This paper provides a meta-mapping
of research questions in the social and health sciences to appropriate ML
approaches, by incorporating the necessary requirements to statistical analysis
in these disciplines. We map the established classification into description,
prediction, and causal inference to common research goals, such as estimating
prevalence of adverse health or social outcomes, predicting the risk of an
event, and identifying risk factors or causes of adverse outcomes. This
meta-mapping aims at overcoming disciplinary barriers and starting a fluid
dialogue between researchers from the social and health sciences and
methodologically trained researchers. Such mapping may also help to fully
exploit the benefits of ML while considering domain-specific aspects relevant
to the social and health sciences, and hopefully contribute to the acceleration
of the uptake of ML applications to advance both basic and applied social and
health sciences research
Supervised Learning Models for the Preliminary Detection of COVID-19 in Patients Using Demographic and Epidemiological Parameters
The World Health Organization labelled the new COVID-19 breakout a public health crisis of worldwide concern on 30 January 2020, and it was named the new global pandemic in March 2020. It has had catastrophic consequences on the world economy and well-being of people and has put a tremendous strain on already-scarce healthcare systems globally, particularly in underdeveloped countries. Over 11 billion vaccine doses have already been administered worldwide, and the benefits of these vaccinations will take some time to appear. Today, the only practical approach to diagnosing COVID-19 is through the RT-PCR and RAT tests, which have sometimes been known to give unreliable results. Timely diagnosis and implementation of precautionary measures will likely improve the survival outcome and decrease the fatality rates. In this study, we propose an innovative way to predict COVID-19 with the help of alternative non-clinical methods such as supervised machine learning models to identify the patients at risk based on their characteristic parameters and underlying comorbidities. Medical records of patients from Mexico admitted between 23 January 2020 and 26 March 2022, were chosen for this purpose. Among several supervised machine learning approaches tested, the XGBoost model achieved the best results with an accuracy of 92%. It is an easy, non-invasive, inexpensive, instant and accurate way of forecasting those at risk of contracting the virus. However, it is pretty early to deduce that this method can be used as an alternative in the clinical diagnosis of coronavirus cases
Data extraction methods for systematic review (semi)automation: A living systematic review [version 1; peer review: awaiting peer review]
Background: The reliable and usable (semi)automation of data
extraction can support the field of systematic review by reducing the
workload required to gather information about the conduct and
results of the included studies. This living systematic review examines
published approaches for data extraction from reports of clinical
studies.
Methods: We systematically and continually search MEDLINE,
Institute of Electrical and Electronics Engineers (IEEE), arXiv, and the
dblp computer science bibliography databases. Full text screening and
data extraction are conducted within an open-source living systematic
review application created for the purpose of this review. This
iteration of the living review includes publications up to a cut-off date
of 22 April 2020.
Results: In total, 53 publications are included in this version of our
review. Of these, 41 (77%) of the publications addressed extraction of
data from abstracts, while 14 (26%) used full texts. A total of 48 (90%)
publications developed and evaluated classifiers that used
randomised controlled trials as the main target texts. Over 30 entities
were extracted, with PICOs (population, intervention, comparator,
outcome) being the most frequently extracted. A description of their
datasets was provided by 49 publications (94%), but only seven (13%)
made the data publicly available. Code was made available by 10 (19%)
publications, and five (9%) implemented publicly available tools.
Conclusions: This living systematic review presents an overview of
(semi)automated data-extraction literature of interest to different
types of systematic review. We identified a broad evidence base of
publications describing data extraction for interventional reviews and
a small number of publications extracting epidemiological or diagnostic accuracy data. The lack of publicly available gold-standard
data for evaluation, and lack of application thereof, makes it difficult
to draw conclusions on which is the best-performing system for each
data extraction target. With this living review we aim to review the
literature continually
Machine Learning for Diabetes and Mortality Risk Prediction From Electronic Health Records
Data science can provide invaluable tools to better exploit healthcare data to improve patient outcomes and increase cost-effectiveness. Today, electronic health records (EHR) systems provide a fascinating array of data that data science applications can use to revolutionise the healthcare industry. Utilising EHR data to improve the early diagnosis of a variety of medical conditions/events is a rapidly developing area that, if successful, can help to improve healthcare services across the board. Specifically, as Type-2 Diabetes Mellitus (T2DM) represents one of the most serious threats to health across the globe, analysing the huge volumes of data provided by EHR systems to investigate approaches for early accurately predicting the onset of T2DM, and medical events such as in-hospital mortality, are two of the most important challenges data science currently faces. The present thesis addresses these challenges by examining the research gaps in the existing literature, pinpointing the un-investigated areas, and proposing a novel machine learning modelling given the difficulties inherent in EHR data.
To achieve these aims, the present thesis firstly introduces a unique and large EHR dataset collected from Saudi Arabia. Then we investigate the use of a state-of-the-art machine learning predictive models that exploits this dataset for diabetes diagnosis and the early identification of patients with pre-diabetes by predicting the blood levels of one of the main indicators of diabetes and pre-diabetes: elevated Glycated Haemoglobin (HbA1c) levels. A novel collaborative denoising autoencoder (Col-DAE) framework is adopted to predict the diabetes (high) HbA1c levels. We also employ several machine learning approaches (random forest, logistic regression, support vector machine, and multilayer perceptron) for the identification of patients with pre-diabetes (elevated HbA1c levels). The models employed demonstrate that a patient's risk of diabetes/pre-diabetes can be reliably predicted from EHR records.
We then extend this work to include pioneering adoption of recent technologies to investigate the outcomes of the predictive models employed by using recent explainable methods. This work also investigates the effect of using longitudinal data and more of the features available in the EHR systems on the performance and features ranking of the employed machine learning models for predicting elevated HbA1c levels in non-diabetic patients. This work demonstrates that longitudinal data and available EHR features can improve the performance of the machine learning models and can affect the relative order of importance of the features.
Secondly, we develop a machine learning model for the early and accurate prediction all in-hospital mortality events for such patients utilising EHR data. This work investigates a novel application of the Stacked Denoising Autoencoder (SDA) to predict in-hospital patient mortality risk. In doing so, we demonstrate how our approach uniquely overcomes the issues associated with imbalanced datasets to which existing solutions are subject. The proposed model –– using clinical patient data on a variety of health conditions and without intensive feature engineering –– is demonstrated to achieve robust and promising results using EHR patient data recorded during the first 24 hours after admission
A review of arthritis diagnosis techniques in artificial intelligence era: Current trends and research challenges
Deep learning, a branch of artificial intelligence, has achieved unprecedented performance in several domains including medicine to assist with efficient diagnosis of diseases, prediction of disease progression and pre-screening step for physicians. Due to its significant breakthroughs, deep learning is now being used for the diagnosis of arthritis, which is a chronic disease affecting young to aged population. This paper provides a survey of recent and the most representative deep learning techniques (published between 2018 to 2020) for the diagnosis of osteoarthritis and rheumatoid arthritis. The paper also reviews traditional machine learning methods (published 2015 onward) and their application for the diagnosis of these diseases. The paper identifies open problems and research gaps. We believe that deep learning can assist general practitioners and consultants to predict the course of the disease, make treatment propositions and appraise their potential benefits
- …