1,842 research outputs found

    Application and Extension of Weighted Quantile Sum Regression for the Development of a Clinical Risk Prediction Tool

    Get PDF
    In clinical settings, the diagnosis of medical conditions is often aided by measurement of various serum biomarkers through the use of laboratory tests. These biomarkers provide information about different aspects of a patient’s health and the overall function of different organs. In this dissertation, we develop and validate a weighted composite index that aggregates the information from a variety of health biomarkers covering multiple organ systems. The index can be used for predicting all-cause mortality and could also be used as a holistic measure of overall physiological health status. We refer to it as the Health Status Metric (HSM). Validation analysis shows that the HSM is predictive of long-term mortality risk and exhibits a robust association with concurrent chronic conditions, recent hospital utilization, and self-rated health. We develop the HSM using Weighted Quantile Sum (WQS) regression (Gennings et al., 2013; Carrico, 2013), a novel penalized regression technique that imposes nonnegativity and unit-sum constraints on the coefficients used to weight index components. In this dissertation, we develop a number of extensions to the WQS regression technique and apply them to the construction of the HSM. We introduce a new guided approach for the standardization of index components which accounts for potential nonlinear relationships with the outcome of interest. An extended version of the WQS that accommodates interaction effects among index components is also developed and implemented. In addition, we demonstrate that ensemble learning methods borrowed from the field of machine learning can be used to improve the predictive power of the WQS index. Specifically, we show that the use of techniques such as weighted bagging, the random subspace method and stacked generalization in conjunction with the WQS model can produce an index with substantially enhanced predictive accuracy. Finally, practical applications of the HSM are explored. A comparative study is performed to evaluate the feasibility and effectiveness of a number of ‘real-time’ imputation strategies in potential software applications for computing the HSM. In addition, the efficacy of the HSM as a predictor of hospital readmission is assessed in a cohort of emergency department patients

    An intelligent recommender system based on short-term disease risk prediction for patients with chronic diseases in a telehealth environment

    Get PDF
    Clinical decisions are usually made based on the practitioners' experiences with limited support from data-centric analytic processes from medical databases. This often leads to undesirable biases, human errors and high medical costs affecting the quality of services provided to patients. Recently, the use of intelligent technologies in clinical decision making in the telehealth environment has begun to play a vital role in improving the quality of patients' lives and reducing the costs and workload involved in their daily healthcare. In the telehealth environment, patients suffering from chronic diseases such as heart disease or diabetes have to take various medical tests such as measuring blood pressure, blood sugar and blood oxygen, etc. This practice adversely affects the overall convenience and quality of their everyday living. In this PhD thesis, an effective recommender system is proposed utilizing a set of innovative disease risk prediction algorithms and models for short-term disease risk prediction to provide chronic disease patients with appropriate recommendations regarding the need to take a medical test on the coming day. The input sequence of sliding windows based on the patient's time series data, is analyzed in both the time domain and the frequency domain. The time series medical data obtained for each chronicle disease patient is partitioned into consecutive sliding windows for analysis in both the time and the frequency domains. The available time series data are readily available in time domains which can be used for analysis without any further conversion. For data analysis in the frequency domain, Fast Fourier Transformation (FFT) and Dual-Tree Complex Wavelet Transformation (DTCWT) are applied to convert the data into the frequency domain and extract the frequency information. In the time domain, four innovative predictive algorithms, Basic Heuristic Algorithm (BHA), Regression-Based Algorithm (RBA) and Hybrid Algorithm (HA) as well as a structural graph-based method (SG), are proposed to study the time series data for producing recommendations. While, in the frequency domain, three predictive classifiers, Artificial Neural Network, Least Squares-Support Vector Machine, and Naïve Bayes, are used to produce the recommendations. An ensemble machine learning model is utilized to combine all the used predictive models and algorithms in both the time and frequency domains to produce the final recommendation. Two real-life telehealth datasets collected from chronic disease patients (i.e., heart disease and diabetes patients) are utilized for a comprehensive experimental evaluation in this study. The results show that the proposed system is effective in analysing time series medical data and providing accurate and reliable (very low risk) recommendations to patients suffering from chronic diseases such as heart disease and diabetes. This research work will help provide high-quality evidence-based intelligent decision support to clinical disease patients that significantly reduces workload associated with medical checkups would otherwise have to be conducted every day in a telehealth environment

    Deep Learning in Cardiology

    Full text link
    The medical field is creating large amount of data that physicians are unable to decipher and use efficiently. Moreover, rule-based expert systems are inefficient in solving complicated medical tasks or for creating insights using big data. Deep learning has emerged as a more accurate and effective technology in a wide range of medical problems such as diagnosis, prediction and intervention. Deep learning is a representation learning method that consists of layers that transform the data non-linearly, thus, revealing hierarchical relationships and structures. In this review we survey deep learning application papers that use structured data, signal and imaging modalities from cardiology. We discuss the advantages and limitations of applying deep learning in cardiology that also apply in medicine in general, while proposing certain directions as the most viable for clinical use.Comment: 27 pages, 2 figures, 10 table

    Estimating selected disaggregated socio-economic indicators using small area estimation techniques

    Get PDF
    In 2015, the United Nations (UN) set up 17 Sustainable Development Goals (SDGs) to be achieved by 2030 (General Assembly, 2015). The goals encompass indicators of various socioeconomic characteristics (General Assembly, 2015). To reach them, there is a need to reliably measure the indicators, especially at disaggregated levels. National Statistical Institutes (NSI) collect data on various socio-economic indicators by conducting censuses or sample surveys. Although a census provides data on the entire population, it is only carried out every 10 years in most countries and it requires enormous financial resources. Sample surveys on the other hand are commonly used because they are cheaper and require a shorter time to collect (Sarndal et al., 2003; Cochran, 2007). They are, therefore, essential sources of data on the country’s key socio-economic indicators, which are necessary for policy-making, allocating resources, and determining interventions necessary. Surveys are mostly designed for the national level and specific planned areas or domains. Therefore, the drawback is sample surveys are not adequate for data dis-aggregation due to small sample sizes (Rao and Molina, 2015). In this thesis, geographical divisions will be called areas, while other sub-divisions such as age-sex-ethnicity will be called domains in line with (Pfeffermann, 2013; Rao and Molina, 2015). One solution to obtain reliable estimates at disaggregated levels is to use small area estimation (SAE) techniques. SAE increases the precision of survey estimates by combining the survey data and another source of data, for example, a previous census, administrative data or other passively recorded data such as mobile phone data as used in Schmid et al. (2017). The results obtained using the survey data only are called direct estimates, while those obtained using SAE models will be called model-based estimates. The auxiliary data are covariates related to the response variable of interest (Rao and Molina, 2015). According to Rao and Molina (2015), an area or domain is regarded as small if the area or domain sample size is inadequate to estimate the desired accuracy. The field of SAE has grown substantially over the years mainly due to the demand from governments and private sectors. Currently, it is possible to estimate several linear and non-linear target statistics such as the mean and the Gini coefficient (Gini, 1912), respectively. This thesis contributes to the wide literature on SAE by presenting three important applications using Kenyan data sources. Chapter 1 is an application to estimate poverty and inequality in Kenya. The Empirical Best Predictor (EBP) of Molina and Rao (2010) and the M-quantile model of Chambers and Tzavidis (2006) are used to estimate poverty and inequality in Kenya. Four indicators are estimated, i.e. the mean, the Head Count Ratio, the Poverty Gap and the Gini coefficient. Three transformations are explored: the logarithmic, log-shift and the Box-Cox to mitigate the requirement for normality of model errors. The M-quantile model is used as a robust alternative to the EBP. The mean squared errors are estimated using bootstrap procedures. Chapter 2 is an application to estimate health insurance coverage in Kenyan counties using a binary M-quantile SAE model (Chambers et al., 2016) for women and men aged 15 to 49 years old. This has the advantage that we avoid specifying the distribution of the random effects and distributional robustness is automatically achieved. The MSE is estimated using an analytical approach based on Taylor series linearization. Chapter 3 presents the estimation of overweight prevalence at the county level in Kenya. In this application, the Fay-Herriot model (Fay and Herriot, 1979) is explored with arcsine square-root transformation. This is to stabilize the variance and meet the assumption of normality. To transform back to the original scale, we use a bias-corrected back transformation. For this model, the design variance is smoothed using Generalized Variance Functions as in (Pratesi, 2016, Chapter 11). The mean squared error is estimated using a bootstrap procedure. In summary, this thesis contributes to the vast literature on small area estimation from an applied perspective by; (a) Presenting for the first time regional disaggregated SAE results for selected indicators for Kenya. (b) Combining data sources to improve the estimation of the selected disaggregated socioeconomic indicators. (c) Exploring data-driven transformations to mitigate the assumption of normality in linear and linear mixed-effects models. (d) Presenting a robust approach to small area estimation based on the M-quantile model. (e) Estimating the mean squared error to access uncertainty using bootstrap procedures

    Machine Learning Classification of Females Susceptibility to Visceral Fat Associated Diseases

    Get PDF
    The problem of classifying subjects into risk categories is a common challenge in medical research. Machine Learning (ML) methods are widely used in the areas of risk prediction and classification. The primary objective of these algorithms is to predict dichotomous responses (e.g. healthy/at risk) based on several features. Similarly to statistical inference models, also ML models are subject to the common problem of class imbalance. Therefore, they are affected by the majority class increasing the false-negative rate. In this paper, we built and evaluated eighteen ML models classifying approximately 4300 female participants from the UK Biobank into three categorical risk statuses based on responses for the discretised visceral adipose tissue values from magnetic resonance imaging. We also examined the effect of sampling techniques on classification modelling when dealing with class imbalance. Results showed that the use of sampling techniques had a significant impact. They not only drove an improvement in predicting patients risk status but also facilitated an increase in the information contained within each variable. Based on domain experts criteria, the three best models for classification were finally identified. These encouraging results will guide further developments of classification models for predicting visceral adipose tissue without the need for a costly scan

    An introduction to explainable artificial intelligence with LIME and SHAP

    Full text link
    Treballs Finals de Grau de Matemàtiques, Facultat de Matemàtiques, Universitat de Barcelona, Any: 2022, Director: Albert Clapés i Sintes i Sergio Escalera Guerrero[en] Artificial intelligence (AI) and more specifically machine learning (ML) have shown their potential by approaching or even exceeding human levels of accuracy for a variety of real-world problems. However, the highest accuracy for large modern datasets is often achieved by complex models that even experts struggle to interpret, creating a tradeoff between accuracy and interpretability. These models are known for being "black box" and opaque, which is especially problematic in industries like healthcare. Therefore, understanding the reasons behind predictions is crucial in establishing trust, which is fundamental if one plans to take action based on a prediction, or when deciding whether or not to implement a new model. Here is where explainable artificial intelligence (XAI) comes in by helping humans to comprehend and trust the results and output created by a machine learning model. This project is organised in 3 chapters with the aim of introducing the reader to the field of explainable artificial intelligence. Machine learning and some related concepts are introduced in the first chapter. The second chapter focuses on the theory of the random forest model in detail. Finally, in the third chapter, the theory behind two contemporary and influential XAI methods, LIME and SHAP, is formalised. Additionally, a public diabetes tabular dataset is used to illustrate an application of these two methods in the medical sector. The project concludes with a discussion of its possible future works

    Multiple Imputation Ensembles (MIE) for dealing with missing data

    Get PDF
    Missing data is a significant issue in many real-world datasets, yet there are no robust methods for dealing with it appropriately. In this paper, we propose a robust approach to dealing with missing data in classification problems: Multiple Imputation Ensembles (MIE). Our method integrates two approaches: multiple imputation and ensemble methods and compares two types of ensembles: bagging and stacking. We also propose a robust experimental set-up using 20 benchmark datasets from the UCI machine learning repository. For each dataset, we introduce increasing amounts of data Missing Completely at Random. Firstly, we use a number of single/multiple imputation methods to recover the missing values and then ensemble a number of different classifiers built on the imputed data. We assess the quality of the imputation by using dissimilarity measures. We also evaluate the MIE performance by comparing classification accuracy on the complete and imputed data. Furthermore, we use the accuracy of simple imputation as a benchmark for comparison. We find that our proposed approach combining multiple imputation with ensemble techniques outperform others, particularly as missing data increases

    Functional Regression

    Full text link
    Functional data analysis (FDA) involves the analysis of data whose ideal units of observation are functions defined on some continuous domain, and the observed data consist of a sample of functions taken from some population, sampled on a discrete grid. Ramsay and Silverman's 1997 textbook sparked the development of this field, which has accelerated in the past 10 years to become one of the fastest growing areas of statistics, fueled by the growing number of applications yielding this type of data. One unique characteristic of FDA is the need to combine information both across and within functions, which Ramsay and Silverman called replication and regularization, respectively. This article will focus on functional regression, the area of FDA that has received the most attention in applications and methodological development. First will be an introduction to basis functions, key building blocks for regularization in functional regression methods, followed by an overview of functional regression methods, split into three types: [1] functional predictor regression (scalar-on-function), [2] functional response regression (function-on-scalar) and [3] function-on-function regression. For each, the role of replication and regularization will be discussed and the methodological development described in a roughly chronological manner, at times deviating from the historical timeline to group together similar methods. The primary focus is on modeling and methodology, highlighting the modeling structures that have been developed and the various regularization approaches employed. At the end is a brief discussion describing potential areas of future development in this field

    Machine Learning of Lifestyle Data for Diabetes

    Get PDF
    Self-Monitoring of Blood Glucose (SMBG) for Type-2 Diabetes (T2D) remains highly challenging for both patients and doctors due to the complexities of diabetic lifestyle data logging and insufficient short-term and personalized recommendations/advice. The recent mobile diabetes management systems have been proved clinically effective to facilitate self-management. However, most such systems have poor usability and are limited in data analytic functionalities. These two challenges are connected and affected by each other. The ease of data recording brings better data for applicable data analytic algorithms. On the other hand, the irrelevant or inaccurate data input will certainly commit errors and noises. The output of data analysis, as potentially valuable patterns or knowledge, could be the incentives for users to contribute more data. We believe that the incorporation of machine learning technologies in mobile diabetes management could tackle these challenge simultaneously. In this thesis, we propose, build, and evaluate an intelligent mobile diabetes management system, called GlucoGuide for T2D patients. GlucoGuide conveniently aggregates varieties of lifestyle data collected via mobile devices, analyzes the data with machine learning models, and outputs recommendations. The most complicated part of SMBG is diet management. GlucoGuide aims to address this crucial issue using classification models and camera-based automatic data logging. The proposed model classifies each food item into three recommendation classes using its nutrient and textual features. Empirical studies show that the food classification task is effective. A lifestyle-data-driven recommendations framework in GlucoGuide can output short-term and personalized recommendations of lifestyle changes to help patients stabilize their blood glucose level. To evaluate performance and clinical effectiveness of this framework, we conduct a three-month clinical trial on human subjects, in collaboration with Dr. Petrella (MD). Due to the high cost and complexity of trials on humans, a small but representative subject group is involved. Two standard laboratory blood tests for diabetes are used before and after the trial. The results are quite remarkable. Generally speaking, GlucoGuide amounted to turning an early diabetic patient to be pre-diabetic, and pre-diabetic to non-diabetic, in only 3-months, depending on their before-trial diabetic conditions. cThis clinical dataset has also been expanded and enhanced to generate scientifically controlled artificial datasets. Such datasets can be used for varieties of machine learning empirical studies, as our on-going and future research works. GlucoGuide now is a university spin-off, allowing us to collect a large scale of practical diabetic lifestyle data and make potential impact on diabetes treatment and management
    • …
    corecore