Search CORE

9,505 research outputs found

Modeling Binary Time Series Using Gaussian Processes with Application to Predicting Sleep States

Author: Gao Xu
Ombao Hernando
Shahbaba Babak
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/10/2018
Field of study

Motivated by the problem of predicting sleep states, we develop a mixed effects model for binary time series with a stochastic component represented by a Gaussian process. The fixed component captures the effects of covariates on the binary-valued response. The Gaussian process captures the residual variations in the binary response that are not explained by covariates and past realizations. We develop a frequentist modeling framework that provides efficient inference and more accurate predictions. Results demonstrate the advantages of improved prediction rates over existing approaches such as logistic regression, generalized additive mixed model, models for ordinal data, gradient boosting, decision tree and random forest. Using our proposed model, we show that previous sleep state and heart rates are significant predictors for future sleep states. Simulation studies also show that our proposed method is promising and robust. To handle computational complexity, we utilize Laplace approximation, golden section search and successive parabolic interpolation. With this paper, we also submit an R-package (HIBITS) that implements the proposed procedure.Comment: Journal of Classification (2018

arXiv.org e-Print Archive

eScholarship - University of California

Tree Boosting Data Competitions with XGBoost

Author: Bort Escabias Carlos
Publication venue: 'Edicions de la Universitat de Barcelona'
Publication date: 01/01/2017
Field of study

This Master's Degree Thesis objective is to provide understanding on how to approach a supervised learning predictive problem and illustrate it using a statistical/machine learning algorithm, Tree Boosting. A review of tree methodology is introduced in order to understand its evolution, since Classification and Regression Trees, followed by Bagging, Random Forest and, nowadays, Tree Boosting. The methodology is explained following the XGBoost implementation, which achieved state-of-the-art results in several data competitions. A framework for applied predictive modelling is explained with its proper concepts: objective function, regularization term, overfitting, hyperparameter tuning, k-fold cross validation and feature engineering. All these concepts are illustrated with a real dataset of videogame churn; used in a datathon competition

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

RandomBoost: Simplified Multi-class Boosting through Randomization

Author: Hengel Anton van den
Paisitkriangkrai Sakrapee
Shen Chunhua
Shi Qinfeng
Publication venue
Publication date: 05/02/2013
Field of study

We propose a novel boosting approach to multi-class classification problems, in which multiple classes are distinguished by a set of random projection matrices in essence. The approach uses random projections to alleviate the proliferation of binary classifiers typically required to perform multi-class classification. The result is a multi-class classifier with a single vector-valued parameter, irrespective of the number of classes involved. Two variants of this approach are proposed. The first method randomly projects the original data into new spaces, while the second method randomly projects the outputs of learned weak classifiers. These methods are not only conceptually simple but also effective and easy to implement. A series of experiments on synthetic, machine learning and visual recognition data sets demonstrate that our proposed methods compare favorably to existing multi-class boosting algorithms in terms of both the convergence rate and classification accuracy.Comment: 15 page

arXiv.org e-Print Archive

Adelaide Research & Scholarship

Boosting for high-dimensional linear models

Author: Bühlmann Peter
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2006
Field of study

We prove that boosting with the squared error loss,

L_2

Boosting, is consistent for very high-dimensional linear models, where the number of predictor variables is allowed to grow essentially as fast as

O

(exp(sample size)), assuming that the true underlying regression function is sparse in terms of the

\ell_1

-norm of the regression coefficients. In the language of signal processing, this means consistency for de-noising using a strongly overcomplete dictionary if the underlying signal is sparse in terms of the

\ell_1

-norm. We also propose here an

\mathit{AIC}

-based method for tuning, namely for choosing the number of boosting iterations. This makes

L_2

Boosting computationally attractive since it is not required to run the algorithm multiple times for cross-validation as commonly used so far. We demonstrate

L_2

Boosting for simulated data, in particular where the predictor dimension is large in comparison to sample size, and for a difficult tumor-classification problem with gene expression microarray data.Comment: Published at http://dx.doi.org/10.1214/009053606000000092 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Repository for Publications and Research Data

CiteSeerX

Crossref

Predicting time to graduation at a large enrollment American university

Author: Aiken John M.
Caballero Marcos D.
De Bin Riccardo
Hjorth-Jensen Morten
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2020
Field of study

The time it takes a student to graduate with a university degree is mitigated by a variety of factors such as their background, the academic performance at university, and their integration into the social communities of the university they attend. Different universities have different populations, student services, instruction styles, and degree programs, however, they all collect institutional data. This study presents data for 160,933 students attending a large American research university. The data includes performance, enrollment, demographics, and preparation features. Discrete time hazard models for the time-to-graduation are presented in the context of Tinto's Theory of Drop Out. Additionally, a novel machine learning method: gradient boosted trees, is applied and compared to the typical maximum likelihood method. We demonstrate that enrollment factors (such as changing a major) lead to greater increases in model predictive performance of when a student graduates than performance factors (such as grades) or preparation (such as high school GPA).Comment: 28 pages, 11 figure

arXiv.org e-Print Archive

Directory of Open Access Journals

NORA - Norwegian Open Research Archives