205 research outputs found

    Sudden Cardiac Arrest Prediction through Heart Rate Variability Analysis

    Get PDF
    The increase in popularity for wearable technologies (see: Apple Watch and Microsoft Band) has opened the door for an Internet of Things solution to healthcare. One of the most prevalent healthcare problems today is the poor survival rate of out-of hospital sudden cardiac arrests (9.5% on 360,000 cases in the USA in 2013). It has been proven that heart rate derived features can give an early indicator of sudden cardiac arrest, and that providing an early warning has the potential to save many lives. Many of these new wearable devices are capable of providing this warning through their heart rate sensors. This thesis paper introduces a prospective dataset of physical activity heart rates collected via Microsoft Band. This dataset is indicative of the heart rates that would be observed in the proposed Internet of Things solution. This dataset is combined with public heart rate datasets to provide a dataset larger than many of the ones used in related works and more indicative of out-of-hospital heart rates. This paper introduces the use of LogitBoost as a classifier for sudden cardiac arrest prediction. Using this technique, a five minute warning of sudden cardiac arrest is provided with 96.36% accuracy and F-score of 0.9375. These results are better than existing solutions that only include in-hospital data

    Studies of Boosted Decision Trees for MiniBooNE Particle Identification

    Full text link
    Boosted decision trees are applied to particle identification in the MiniBooNE experiment operated at Fermi National Accelerator Laboratory (Fermilab) for neutrino oscillations. Numerous attempts are made to tune the boosted decision trees, to compare performance of various boosting algorithms, and to select input variables for optimal performance.Comment: 28 pages, 22 figures, submitted to Nucl. Inst & Meth.

    Boosted Classification Trees and Class Probability/Quantile Estimation

    Get PDF
    The standard by which binary classifiers are usually judged, misclassification error, assumes equal costs of misclassifying the two classes or, equivalently, classifying at the 1/2 quantile of the conditional class probability function P[y = 1|x]. Boosted classification trees are known to perform quite well for such problems. In this article we consider the use of standard, off-the-shelf boosting for two more general problems: 1) classification with unequal costs or, equivalently, classification at quantiles other than 1/2, and 2) estimation of the conditional class probability function P[y = 1|x]. We first examine whether the latter problem, estimation of P[y = 1|x], can be solved with Logit- Boost, and with AdaBoost when combined with a natural link function. The answer is negative: both approaches are often ineffective because they overfit P[y = 1|x] even though they perform well as classifiers. A major negative point of the present article is the disconnect between class probability estimation and classification. Next we consider the practice of over/under-sampling of the two classes. We present an algorithm that uses AdaBoost in conjunction with Over/Under-Sampling and Jittering of the data (“JOUS-Boost”). This algorithm is simple, yet successful, and it preserves the advantage of relative protection against overfitting, but for arbitrary misclassification costs and, equivalently, arbitrary quantile boundaries. We then use collections of classifiers obtained from a grid of quantiles to form estimators of class probabilities. The estimates of the class probabilities compare favorably to those obtained by a variety of methods across both simulated and real data sets

    Heart Disease Prediction Using Stacking Model With Balancing Techniques and Dimensionality Reduction

    Get PDF
    Heart disease is a serious worldwide health issue with wide-reaching effects. Since heart disease is one of the leading causes of mortality worldwide, early detection is crucial. Emerging technologies like Machine Learning (ML) are currently being actively used by the biomedical, healthcare, and health prediction industries. PaRSEL, a new stacking model is proposed in this research, that combines four classifiers, Passive Aggressive Classifier (PAC), Ridge Classifier (RC), Stochastic Gradient Descent Classifier (SGDC), and eXtreme Gradient Boosting (XGBoost), at the base layer, and LogitBoost is deployed for the final predictions at the meta layer. The imbalanced and irrelevant features in the data increase the complexity of the classification models. The dimensionality reduction and data balancing approaches are considered very important for lowering costs and increasing the accuracy of the model. In PaRSEL, three dimensionality reduction techniques, Recursive Feature Elimination (RFE), Linear Discriminant Analysis (LDA), and Factor Analysis (FA), are used to reduce the dimensionality and select the most relevant features for the diagnosis of heart disease. Furthermore, eight balancing techniques, Proximity Weighted Random Affine Shadowsampling (ProWRAS), Localized Randomized Affine Shadowsampling (LoRAS), Random Over Sampling (ROS), Adaptive Synthetic (ADASYN), Synthetic Minority Oversampling Technique (SMOTE), Borderline SMOTE (B-SMOTE), Majority Weighted Minority Oversampling Technique (MWMOTE) and Random Walk Oversampling (RWOS), are used to deal with the imbalanced nature of the dataset. The performance of PaRSEL is compared with the other standalone classifiers using different performance measures like accuracy, F1-score, precision, recall and AUC-ROC score. Our proposed model achieves 97% accuracy, 80% F1-score, precision is greater than 90%, 67% recall, and 98% AUC-ROC score. This shows that PaRSEL outperforms other standalone classifiers in terms of heart disease prediction. Additionally, we deploy SHapley Additive exPlanations (SHAP) on our proposed model. It helps to understand the internal working of the model. It illustrates how much influence a classifier has on the final prediction outcome

    Boosted Classification Trees and Class Probability/Quantile Estimation

    Get PDF
    The standard by which binary classifiers are usually judged, misclassification error, assumes equal costs of misclassifying the two classes or, equivalently, classifying at the 1/2 quantile of the conditional class probability function P[y = 1jx]. Boosted classification trees are known to perform quite well for such problems. In this article we consider the use of standard, off-the-shelf boosting for two more general problems: 1) classification with unequal costs or, equivalently, classification at quantiles other than 1/2, and 2) estimation of the conditional class probability function P[y = 1jx]. We first examine whether the latter problem, estimation of P[y = 1jx], can be solved with Logit- Boost, and with AdaBoost when combined with a natural link function. The answer is negative: both approaches are often ineffective because they overfit P[y = 1jx] even though they perform well as classifiers. A major negative point of the present article is the disconnect between class probability estimation and classification. Next we consider the practice of over/under-sampling of the two classes. We present an algorithm that uses AdaBoost in conjunction with Over/Under-Sampling and Jittering of the data (“JOUS-Boost”). This algorithm is simple, yet successful, and it preserves the advantage of relative protection against overfitting, but for arbitrary misclassification costs and, equivalently, arbitrary quantile boundaries. We then use collections of classifiers obtained from a grid of quantiles to form estimators of class probabilities. The estimates of the class probabilities compare favorably to those obtained by a variety of methods across both simulated and real data sets

    Automated reliability assessment for spectroscopic redshift measurements

    Get PDF
    We present a new approach to automate the spectroscopic redshift reliability assessment based on machine learning (ML) and characteristics of the redshift probability density function (PDF). We propose to rephrase the spectroscopic redshift estimation into a Bayesian framework, in order to incorporate all sources of information and uncertainties related to the redshift estimation process, and produce a redshift posterior PDF that will be the starting-point for ML algorithms to provide an automated assessment of a redshift reliability. As a use case, public data from the VIMOS VLT Deep Survey is exploited to present and test this new methodology. We first tried to reproduce the existing reliability flags using supervised classification to describe different types of redshift PDFs, but due to the subjective definition of these flags, soon opted for a new homogeneous partitioning of the data into distinct clusters via unsupervised classification. After assessing the accuracy of the new clusters via resubstitution and test predictions, unlabelled data from preliminary mock simulations for the Euclid space mission are projected into this mapping to predict their redshift reliability labels.Comment: Submitted on 02 June 2017 (v1). Revised on 08 September 2017 (v2). Latest version 28 September 2017 (this version v3

    Risk Analytics in Econometrics

    Get PDF
    [eng] This thesis addresses the framework of risk analytics as a compendium of four main pillars: (i) big data, (ii) intensive programming, (iii) advanced analytics and machine learning, and (iv) risk analysis. Under the latter mainstay, this PhD dissertation reviews potential hazards known as “extreme events” that could negatively impact the wellbeing of people, profitability of firms, or the economic stability of a country, but which also have been underestimated or incorrectly treated by traditional modelling techniques. The objective of this thesis is to develop econometric and machine learning algorithms that can improve the predictive capacity of those extreme events and improve the comprehension of the phenomena contrary to some modern advanced methods which are black boxes in terms of interpretation. This thesis presents seven chapters that provide a methodological contribution to the existing literature by building techniques that transform the new valuable insights of big data into more accurate predictions that support decisions under risk, and increase robustness for more reliable and real results. This PhD thesis focuses uniquely on extremal events which are trigged into a binary variable, mostly known as class-imbalanced data and rare events in binary response, in other words, whose classes that are not equally distributed. The scope of research tackle real cases studies in the field of risk and insurance, where it is highly important to specify a level of claims of an event in order to foresee its impact and to provide a personalized treatment. After Chapter 1 corresponding to the introduction, Chapter 2 proposes a weighting mechanism to incorporated in the weighted likelihood estimation of a generalized linear model to improve the predictive performance of the highest and lowest deciles of prediction. Chapter 3 proposes two different weighting procedures for a logistic regression model with complex survey data or specific sampling designed data. Its objective is to control the randomness of data and provide more sensitivity to the estimated model. Chapter 4 proposes a rigorous review of trials with modern and classical predictive methods to uncover and discuss the efficiency of certain methods over others, and which and how gaps in machine learning literature can be addressed efficiently. Chapter 5 proposes a novel boosting-based method that overcomes certain existing methods in terms of predictive accuracy and also, recovers some interpretation of the model with imbalanced data. Chapter 6 develops another boosting-based algorithm which is able to improve the predictive capacity of rare events and get approximated as a generalized linear model in terms of interpretation. And finally, Chapter 7 includes the conclusions and final remarks. The present thesis highlights the importance of developing alternative modelling algorithms that reduces uncertainty, especially when there are potential limitations that impede to know all the previous factors that influence on the presence of a rare event or imbalanced-data phenomenon. This thesis merges two important approaches in modelling predictive literature as they are: “econometrics” and “machine learning”. All in all, this thesis contributes to enhance the methodology of how empirical analysis in many experimental and non-experimental sciences have being doing so far

    Discriminative latent variable models for visual recognition

    Get PDF
    Visual Recognition is a central problem in computer vision, and it has numerous potential applications in many dierent elds, such as robotics, human computer interaction, and entertainment. In this dissertation, we propose two discriminative latent variable models for handling challenging visual recognition problems. In particular, we use latent variables to capture and model various prior knowledge in the training data. In the rst model, we address the problem of recognizing human actions from still images. We jointly consider both poses and actions in a unied framework, and treat human poses as latent variables. The learning of this model follows the framework of latent SVM. Secondly, we propose another latent variable model to address the problem of automated tag learning on YouTube videos. In particular, we address the semantic variations (sub-tags) of the videos which have the same tag. In the model, each video is assumed to be associated with a sub-tag label, and we treat this sub-tag label as latent information. This model is trained using a latent learning framework based on LogitBoost, which jointly considers both the latent sub-tag label and the tag label. Moreover, we propose a novel discriminative latent learning framework, kernel latent SVM, which combines the benet of latent SVM and kernel methods. The framework of kernel latent SVM is general enough to be applied in many applications of visual recognition. It is also able to handle complex latent variables with interdependent structures using composite kernels
    • 

    corecore