Search CORE

19,972 research outputs found

Prediction with Confidence Based on a Random Forest Classifier

Author: A. Gammerman
J.F. Timms
L. Breiman
V. Vovk
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Prediction of Fatigue on Rotating-Shift Workers

Author: Tran Anh Tuan
Publication venue
Publication date: 01/07/2019
Field of study

Rotating shifts have become prevalent in many industries, leading to a growing concern about the impact of fatigue on workers performance and safety. Thus, it is useful to develop a method to predict the fatigue of workers with rotating shifts. This thesis aims at contributing to the development of such method by building data-driven models to predict level of fatigue. We use random forest classifier and random forest regressor to build two fatigue prediction models. A third model is built by a combination of random forest classifier and regressor. Two imbalanced datasets from different groups of workers in the same industry are used. We explore two strategies to deal with imbalanced datasets: random over-sampling and class weights. We select features with feature importance of random forest and discover that a set of 19 features, selected from 38 original features, gives best performance. We obtain good prediction accuracy on both datasets. The combined model reaches mean absolute error of 0.93 and 0.83 on two datasets, on a 9-level scale of fatigue. In the area of high level of fatigue, which in real work is of particular interest, our model can predict with average 85\% confidence that the true level falls into +-1 range of prediction. We conclude that fatigue can be predicted with high confidence, based on a dataset of sleep patterns, work schedules and demographic data. Future work will focus on model generalization to datasets from different industries or geographical areas; and the discovery of other sets of features that give better prediction

Concordia University Research Repository

Recommended from our members

DNA methylation-based classification of central nervous system tumours.

Author: Aronica Eleonora
Becker Albert
Benner Axel
Beschorner Rudi
Bewerunge-Hudler Melanie
Bjerkvig Rolf
Braczynski Anne K
Brehmer Stefanie
Brück Wolfgang
Calaminus Gabriele
Capper David
Chavez Lukas
Coras Roland
Cryan Jane
Deckert Martina
Dohmen Hildegard
Driever Pablo Hernáiz
Engel Nils W
Farrell Michael
Fischer Roger
Fleischhack Gudrun
Frank Stephan
Frühwald Michael C
Garvalov Boyan K
Geisenberger Christoph
Giangaspero Felice
Gnekow Astrid
Gottardo Nicholas G
Haberler Christine
Hans Volkmar
Hansford Jordan R
Harter Patrick N
Hench Jürgen
Heppner Frank
Hewer Ekkehard
Hofer Silvia
Hovestadt Volker
Huang Kristin
Hänggi Daniel
Hölsken Annett
Jones Chris
Jones David TW
Jouvet Anne
Kannan Kasthuri
Keohane Catherine
Ketter Ralf
Khatib Ziad
Koch Arend
Koelsche Christian
Kohlhof Patricia
Kramm Christof M
Kratz Annekathrin
Kristensen Bjarne W
Kulozik Andreas
Lechner Matt
Lindenberg Kerstin
Lohmann Dietmar
Lopes Beatriz
Mawrin Christian
Milde Till
Monoranu Camelia-Maria
Mueller Wolf
Mühleisen Helmut
Müller Hermann L
Olar Adriana
Pages Melanie
Pajtler Kristian W
Perry Arie
Plate Karl H
Pohl Ute
Preusser Matthias
Prinz Marco
Reuss David E
Rodriguez Fausto J
Rozsnoki Stephanie
Rushing Elisabeth
Rutkowski Stefan
Sahm Felix
Scheurlen Wolfram
Schick Matthias
Schittenhelm Jens
Schrimpf Daniel
Schweizer Leonille
Seiz-Rosenhagen Marcel
Selt Florian
Serrano Jonathan
Sill Martin
Staszewski Ori
Stichel Damian
Sturm Dominik
Temming Petra
Tippelt Stephan
Tsirigos Aristotelis
Varlet Pascale
von Hoff Katja
Wani Khalida
Wefers Annika K
Witt Hendrik
Witt Olaf
Zapatka Marc
Publication venue: eScholarship, University of California
Publication date: 01/03/2018
Field of study

Accurate pathological diagnosis is crucial for optimal management of patients with cancer. For the approximately 100 known tumour types of the central nervous system, standardization of the diagnostic process has been shown to be particularly challenging-with substantial inter-observer variability in the histopathological diagnosis of many tumour types. Here we present a comprehensive approach for the DNA methylation-based classification of central nervous system tumours across all entities and age groups, and demonstrate its application in a routine diagnostic setting. We show that the availability of this method may have a substantial impact on diagnostic precision compared to standard methods, resulting in a change of diagnosis in up to 12% of prospective cases. For broader accessibility, we have designed a free online classifier tool, the use of which does not require any additional onsite data processing. Our results provide a blueprint for the generation of machine-learning-based tumour classifiers across other cancer entities, with the potential to fundamentally transform tumour pathology

eScholarship - University of California

Dissimilarity-based representation for radiomics applications

Author: Bernard Simon
Cao Hongliu
Heutte Laurent
Sabourin Robert
Publication venue
Publication date: 12/03/2018
Field of study

Radiomics is a term which refers to the analysis of the large amount of quantitative tumor features extracted from medical images to find useful predictive, diagnostic or prognostic information. Many recent studies have proved that radiomics can offer a lot of useful information that physicians cannot extract from the medical images and can be associated with other information like gene or protein data. However, most of the classification studies in radiomics report the use of feature selection methods without identifying the machine learning challenges behind radiomics. In this paper, we first show that the radiomics problem should be viewed as an high dimensional, low sample size, multi view learning problem, then we compare different solutions proposed in multi view learning for classifying radiomics data. Our experiments, conducted on several real world multi view datasets, show that the intermediate integration methods work significantly better than filter and embedded feature selection methods commonly used in radiomics.Comment: conference, 6 pages, 2 figure

arXiv.org e-Print Archive

HAL - Normandie Université

The BSM-AI project: SUSY-AI - Generalizing LHC limits on Supersymmetry with Machine Learning

Author: Caron Sascha
de Austri Roberto Ruiz
Kim Jong Soo
Rolbiecki Krzysztof
Stienen Bob
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

A key research question at the Large Hadron Collider (LHC) is the test of models of new physics. Testing if a particular parameter set of such a model is excluded by LHC data is a challenge: It requires the time consuming generation of scattering events, the simulation of the detector response, the event reconstruction, cross section calculations and analysis code to test against several hundred signal regions defined by the ATLAS and CMS experiment. In the BSM-AI project we attack this challenge with a new approach. Machine learning tools are thought to predict within a fraction of a millisecond if a model is excluded or not directly from the model parameters. A first example is SUSY-AI, trained on the phenomenological supersymmetric standard model (pMSSM). About 300,000 pMSSM model sets - each tested with 200 signal regions by ATLAS - have been used to train and validate SUSY-AI. The code is currently able to reproduce the ATLAS exclusion regions in 19 dimensions with an accuracy of at least 93 percent. It has been validated further within the constrained MSSM and a minimal natural supersymmetric model, again showing high accuracy. SUSY-AI and its future BSM derivatives will help to solve the problem of recasting LHC results for any model of new physics. SUSY-AI can be downloaded at http://susyai.hepforge.org/. An on-line interface to the program for quick testing purposes can be found at http://www.susy-ai.org/

arXiv.org e-Print Archive

EDP Sciences OAI-PMH repository (1.2.0)

Directory of Open Access Journals

Radboud Repository

Machine Learning Techniques for Stellar Light Curve Classification

Author: Hinners Trisha
Tat Kevin
Thorp Rachel
Publication venue: 'American Astronomical Society'
Publication date: 26/04/2018
Field of study

We apply machine learning techniques in an attempt to predict and classify stellar properties from noisy and sparse time series data. We preprocessed over 94 GB of Kepler light curves from MAST to classify according to ten distinct physical properties using both representation learning and feature engineering approaches. Studies using machine learning in the field have been primarily done on simulated data, making our study one of the first to use real light curve data for machine learning approaches. We tuned our data using previous work with simulated data as a template and achieved mixed results between the two approaches. Representation learning using a Long Short-Term Memory (LSTM) Recurrent Neural Network (RNN) produced no successful predictions, but our work with feature engineering was successful for both classification and regression. In particular, we were able to achieve values for stellar density, stellar radius, and effective temperature with low error (~ 2 - 4%) and good accuracy (~ 75%) for classifying the number of transits for a given star. The results show promise for improvement for both approaches upon using larger datasets with a larger minority class. This work has the potential to provide a foundation for future tools and techniques to aid in the analysis of astrophysical data.Comment: Accepted to The Astronomical Journa

arXiv.org e-Print Archive

Caltech Authors

Recommended from our members

Identification of the expressome by machine learning on omics data.

Author: Briggs Steven P
Noshay Jaclyn
Sartor Ryan C
Springer Nathan M
Publication venue: eScholarship, University of California
Publication date: 01/09/2019
Field of study

Accurate annotation of plant genomes remains complex due to the presence of many pseudogenes arising from whole-genome duplication-generated redundancy or the capture and movement of gene fragments by transposable elements. Machine learning on genome-wide epigenetic marks, informed by transcriptomic and proteomic training data, could be used to improve annotations through classification of all putative protein-coding genes as either constitutively silent or able to be expressed. Expressed genes were subclassified as able to express both mRNAs and proteins or only RNAs, and CG gene body methylation was associated only with the former subclass. More than 60,000 protein-coding genes have been annotated in the reference genome of maize inbred B73. About two-thirds of these genes are transcribed and are designated the filtered gene set (FGS). Classification of genes by our trained random forest algorithm was accurate and relied only on histone modifications or DNA methylation patterns within the gene body; promoter methylation was unimportant. Other inbred lines are known to transcribe significantly different sets of genes, indicating that the FGS is specific to B73. We accurately classified the sets of transcribed genes in additional inbred lines, arising from inbred-specific DNA methylation patterns. This approach highlights the potential of using chromatin information to improve annotations of functional genes

eScholarship - University of California