19,972 research outputs found

    Prediction with Confidence Based on a Random Forest Classifier

    Full text link

    Prediction of Fatigue on Rotating-Shift Workers

    Get PDF
    Rotating shifts have become prevalent in many industries, leading to a growing concern about the impact of fatigue on workers performance and safety. Thus, it is useful to develop a method to predict the fatigue of workers with rotating shifts. This thesis aims at contributing to the development of such method by building data-driven models to predict level of fatigue. We use random forest classifier and random forest regressor to build two fatigue prediction models. A third model is built by a combination of random forest classifier and regressor. Two imbalanced datasets from different groups of workers in the same industry are used. We explore two strategies to deal with imbalanced datasets: random over-sampling and class weights. We select features with feature importance of random forest and discover that a set of 19 features, selected from 38 original features, gives best performance. We obtain good prediction accuracy on both datasets. The combined model reaches mean absolute error of 0.93 and 0.83 on two datasets, on a 9-level scale of fatigue. In the area of high level of fatigue, which in real work is of particular interest, our model can predict with average 85\% confidence that the true level falls into +-1 range of prediction. We conclude that fatigue can be predicted with high confidence, based on a dataset of sleep patterns, work schedules and demographic data. Future work will focus on model generalization to datasets from different industries or geographical areas; and the discovery of other sets of features that give better prediction

    Dissimilarity-based representation for radiomics applications

    Full text link
    Radiomics is a term which refers to the analysis of the large amount of quantitative tumor features extracted from medical images to find useful predictive, diagnostic or prognostic information. Many recent studies have proved that radiomics can offer a lot of useful information that physicians cannot extract from the medical images and can be associated with other information like gene or protein data. However, most of the classification studies in radiomics report the use of feature selection methods without identifying the machine learning challenges behind radiomics. In this paper, we first show that the radiomics problem should be viewed as an high dimensional, low sample size, multi view learning problem, then we compare different solutions proposed in multi view learning for classifying radiomics data. Our experiments, conducted on several real world multi view datasets, show that the intermediate integration methods work significantly better than filter and embedded feature selection methods commonly used in radiomics.Comment: conference, 6 pages, 2 figure

    The BSM-AI project: SUSY-AI - Generalizing LHC limits on Supersymmetry with Machine Learning

    Get PDF
    A key research question at the Large Hadron Collider (LHC) is the test of models of new physics. Testing if a particular parameter set of such a model is excluded by LHC data is a challenge: It requires the time consuming generation of scattering events, the simulation of the detector response, the event reconstruction, cross section calculations and analysis code to test against several hundred signal regions defined by the ATLAS and CMS experiment. In the BSM-AI project we attack this challenge with a new approach. Machine learning tools are thought to predict within a fraction of a millisecond if a model is excluded or not directly from the model parameters. A first example is SUSY-AI, trained on the phenomenological supersymmetric standard model (pMSSM). About 300,000 pMSSM model sets - each tested with 200 signal regions by ATLAS - have been used to train and validate SUSY-AI. The code is currently able to reproduce the ATLAS exclusion regions in 19 dimensions with an accuracy of at least 93 percent. It has been validated further within the constrained MSSM and a minimal natural supersymmetric model, again showing high accuracy. SUSY-AI and its future BSM derivatives will help to solve the problem of recasting LHC results for any model of new physics. SUSY-AI can be downloaded at http://susyai.hepforge.org/. An on-line interface to the program for quick testing purposes can be found at http://www.susy-ai.org/

    Machine Learning Techniques for Stellar Light Curve Classification

    Get PDF
    We apply machine learning techniques in an attempt to predict and classify stellar properties from noisy and sparse time series data. We preprocessed over 94 GB of Kepler light curves from MAST to classify according to ten distinct physical properties using both representation learning and feature engineering approaches. Studies using machine learning in the field have been primarily done on simulated data, making our study one of the first to use real light curve data for machine learning approaches. We tuned our data using previous work with simulated data as a template and achieved mixed results between the two approaches. Representation learning using a Long Short-Term Memory (LSTM) Recurrent Neural Network (RNN) produced no successful predictions, but our work with feature engineering was successful for both classification and regression. In particular, we were able to achieve values for stellar density, stellar radius, and effective temperature with low error (~ 2 - 4%) and good accuracy (~ 75%) for classifying the number of transits for a given star. The results show promise for improvement for both approaches upon using larger datasets with a larger minority class. This work has the potential to provide a foundation for future tools and techniques to aid in the analysis of astrophysical data.Comment: Accepted to The Astronomical Journa
    • …
    corecore