19,972 research outputs found
Prediction of Fatigue on Rotating-Shift Workers
Rotating shifts have become prevalent in many industries, leading to a growing concern about the impact of fatigue on workers performance and safety. Thus, it is useful to develop a method to predict the fatigue of workers with rotating shifts. This thesis aims at contributing to the development of such method by building data-driven models to predict level of fatigue.
We use random forest classifier and random forest regressor to build two fatigue prediction models. A third model is built by a combination of random forest classifier and regressor. Two imbalanced datasets from different groups of workers in the same industry are used. We explore two strategies to deal with imbalanced datasets: random over-sampling and class weights.
We select features with feature importance of random forest and discover that a set of 19 features, selected from 38 original features, gives best performance.
We obtain good prediction accuracy on both datasets. The combined model reaches mean absolute error of 0.93 and 0.83 on two datasets, on a 9-level scale of fatigue. In the area of high level of fatigue, which in real work is of particular interest, our model can predict with average 85\% confidence that the true level falls into +-1 range of prediction.
We conclude that fatigue can be predicted with high confidence, based on a dataset of sleep patterns, work schedules and demographic data. Future work will focus on model generalization to datasets from different industries or geographical areas; and the discovery of other sets of features that give better prediction
Recommended from our members
DNA methylation-based classification of central nervous system tumours.
Accurate pathological diagnosis is crucial for optimal management of patients with cancer. For the approximately 100 known tumour types of the central nervous system, standardization of the diagnostic process has been shown to be particularly challenging-with substantial inter-observer variability in the histopathological diagnosis of many tumour types. Here we present a comprehensive approach for the DNA methylation-based classification of central nervous system tumours across all entities and age groups, and demonstrate its application in a routine diagnostic setting. We show that the availability of this method may have a substantial impact on diagnostic precision compared to standard methods, resulting in a change of diagnosis in up to 12% of prospective cases. For broader accessibility, we have designed a free online classifier tool, the use of which does not require any additional onsite data processing. Our results provide a blueprint for the generation of machine-learning-based tumour classifiers across other cancer entities, with the potential to fundamentally transform tumour pathology
Dissimilarity-based representation for radiomics applications
Radiomics is a term which refers to the analysis of the large amount of
quantitative tumor features extracted from medical images to find useful
predictive, diagnostic or prognostic information. Many recent studies have
proved that radiomics can offer a lot of useful information that physicians
cannot extract from the medical images and can be associated with other
information like gene or protein data. However, most of the classification
studies in radiomics report the use of feature selection methods without
identifying the machine learning challenges behind radiomics. In this paper, we
first show that the radiomics problem should be viewed as an high dimensional,
low sample size, multi view learning problem, then we compare different
solutions proposed in multi view learning for classifying radiomics data. Our
experiments, conducted on several real world multi view datasets, show that the
intermediate integration methods work significantly better than filter and
embedded feature selection methods commonly used in radiomics.Comment: conference, 6 pages, 2 figure
The BSM-AI project: SUSY-AI - Generalizing LHC limits on Supersymmetry with Machine Learning
A key research question at the Large Hadron Collider (LHC) is the test of
models of new physics. Testing if a particular parameter set of such a model is
excluded by LHC data is a challenge: It requires the time consuming generation
of scattering events, the simulation of the detector response, the event
reconstruction, cross section calculations and analysis code to test against
several hundred signal regions defined by the ATLAS and CMS experiment. In the
BSM-AI project we attack this challenge with a new approach. Machine learning
tools are thought to predict within a fraction of a millisecond if a model is
excluded or not directly from the model parameters. A first example is SUSY-AI,
trained on the phenomenological supersymmetric standard model (pMSSM). About
300,000 pMSSM model sets - each tested with 200 signal regions by ATLAS - have
been used to train and validate SUSY-AI. The code is currently able to
reproduce the ATLAS exclusion regions in 19 dimensions with an accuracy of at
least 93 percent. It has been validated further within the constrained MSSM and
a minimal natural supersymmetric model, again showing high accuracy. SUSY-AI
and its future BSM derivatives will help to solve the problem of recasting LHC
results for any model of new physics.
SUSY-AI can be downloaded at http://susyai.hepforge.org/. An on-line
interface to the program for quick testing purposes can be found at
http://www.susy-ai.org/
Machine Learning Techniques for Stellar Light Curve Classification
We apply machine learning techniques in an attempt to predict and classify
stellar properties from noisy and sparse time series data. We preprocessed over
94 GB of Kepler light curves from MAST to classify according to ten distinct
physical properties using both representation learning and feature engineering
approaches. Studies using machine learning in the field have been primarily
done on simulated data, making our study one of the first to use real light
curve data for machine learning approaches. We tuned our data using previous
work with simulated data as a template and achieved mixed results between the
two approaches. Representation learning using a Long Short-Term Memory (LSTM)
Recurrent Neural Network (RNN) produced no successful predictions, but our work
with feature engineering was successful for both classification and regression.
In particular, we were able to achieve values for stellar density, stellar
radius, and effective temperature with low error (~ 2 - 4%) and good accuracy
(~ 75%) for classifying the number of transits for a given star. The results
show promise for improvement for both approaches upon using larger datasets
with a larger minority class. This work has the potential to provide a
foundation for future tools and techniques to aid in the analysis of
astrophysical data.Comment: Accepted to The Astronomical Journa
Recommended from our members
Identification of the expressome by machine learning on omics data.
Accurate annotation of plant genomes remains complex due to the presence of many pseudogenes arising from whole-genome duplication-generated redundancy or the capture and movement of gene fragments by transposable elements. Machine learning on genome-wide epigenetic marks, informed by transcriptomic and proteomic training data, could be used to improve annotations through classification of all putative protein-coding genes as either constitutively silent or able to be expressed. Expressed genes were subclassified as able to express both mRNAs and proteins or only RNAs, and CG gene body methylation was associated only with the former subclass. More than 60,000 protein-coding genes have been annotated in the reference genome of maize inbred B73. About two-thirds of these genes are transcribed and are designated the filtered gene set (FGS). Classification of genes by our trained random forest algorithm was accurate and relied only on histone modifications or DNA methylation patterns within the gene body; promoter methylation was unimportant. Other inbred lines are known to transcribe significantly different sets of genes, indicating that the FGS is specific to B73. We accurately classified the sets of transcribed genes in additional inbred lines, arising from inbred-specific DNA methylation patterns. This approach highlights the potential of using chromatin information to improve annotations of functional genes
- …