Search CORE

17,309 research outputs found

An update on statistical boosting in biomedicine

Author: Gefeller Olaf
Hepp Tobias
Hofner Benjamin
Mayr Andreas
Schmid Matthias
Waldmann Elisabeth
Publication venue
Publication date: 01/01/2017
Field of study

Statistical boosting algorithms have triggered a lot of research during the last decade. They combine a powerful machine-learning approach with classical statistical modelling, offering various practical advantages like automated variable selection and implicit regularization of effect estimates. They are extremely flexible, as the underlying base-learners (regression functions defining the type of effect for the explanatory variables) can be combined with any kind of loss function (target function to be optimized, defining the type of regression setting). In this review article, we highlight the most recent methodological developments on statistical boosting regarding variable selection, functional regression and advanced time-to-event modelling. Additionally, we provide a short overview on relevant applications of statistical boosting in biomedicine

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

Open Access LMU

Forecasting Player Behavioral Data and Simulating in-Game Events

Author: A Natekin
AJ Fox
C Bauckhage
Colin Chen
DH Ackley
G Ridgeway
G Schwarz
G Zhang
GE Box
GE Hinton
H Akaike
JG Cragg
JG Gooijer De
JH Friedman
KD Lawrence
L Deng
L Dwyer
M Gilliland
M Längkvist
MS El-Nasr
N Srivastava
NE Breslow
PH Eilers
PJ Brockwell
RJ Hyndman
S Asmussen
S Hochreiter
S Makridakis
SN Wood
SN Wood
SN Wood
T Hastie
T Zhang
TJ Hastie
Y Bengio
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 05/10/2017
Field of study

Understanding player behavior is fundamental in game data science. Video games evolve as players interact with the game, so being able to foresee player experience would help to ensure a successful game development. In particular, game developers need to evaluate beforehand the impact of in-game events. Simulation optimization of these events is crucial to increase player engagement and maximize monetization. We present an experimental analysis of several methods to forecast game-related variables, with two main aims: to obtain accurate predictions of in-app purchases and playtime in an operational production environment, and to perform simulations of in-game events in order to maximize sales and playtime. Our ultimate purpose is to take a step towards the data-driven development of games. The results suggest that, even though the performance of traditional approaches such as ARIMA is still better, the outcomes of state-of-the-art techniques like deep learning are promising. Deep learning comes up as a well-suited general model that could be used to forecast a variety of time series with different dynamic behaviors

arXiv.org e-Print Archive

Crossref

Recommended from our members

Building more accurate decision trees with the additive tree.

Author: Diffenderfer Eric S
Eaton Eric
Friedman Jerome H
Gennatas Efstathios D
Jensen Shane T
Luna José Marcio
Simone Charles B
Solberg Timothy D
Ungar Lyle H
Valdes Gilmer
Publication venue: eScholarship, University of California
Publication date: 01/10/2019
Field of study

The expansion of machine learning to high-stakes application domains such as medicine, finance, and criminal justice, where making informed decisions requires clear understanding of the model, has increased the interest in interpretable machine learning. The widely used Classification and Regression Trees (CART) have played a major role in health sciences, due to their simple and intuitive explanation of predictions. Ensemble methods like gradient boosting can improve the accuracy of decision trees, but at the expense of the interpretability of the generated model. Additive models, such as those produced by gradient boosting, and full interaction models, such as CART, have been investigated largely in isolation. We show that these models exist along a spectrum, revealing previously unseen connections between these approaches. This paper introduces a rigorous formalization for the additive tree, an empirically validated learning technique for creating a single decision tree, and shows that this method can produce models equivalent to CART or gradient boosted stumps at the extremes by varying a single parameter. Although the additive tree is designed primarily to provide both the model interpretability and predictive performance needed for high-stakes applications like medicine, it also can produce decision trees represented by hybrid models between CART and boosted stumps that can outperform either of these approaches

eScholarship - University of California

Variation in Spatial Predictions Among Species Distribution Modeling Methods

Author: Alexandra D. Syphard
Janet Franklin
Publication venue
Publication date
Field of study

<p>Prediction maps produced by species distribution models (SDMs) influence decision-making in resource management or designation of land in conservation planning. Many studies have compared the prediction accuracy of different SDM modeling methods, but few have quantified the similarity among prediction maps. There has also been little systematic exploration of how the relative importance of different predictor variables varies among model types. Our objective was to expand the evaluation of SDM performance for 45 plant species in southern California to better understand how map predictions vary among model types, and to explain what factors may affect spatial correspondence, including the selection and relative importance of different environmental variables. Four types of models were tested. Correlation among maps was highest between generalized linear models (GLMs) and generalized additive models (GAMs) and lowest between classification trees and GAMs or GLMs. Correlation between Random Forests (RFs) and GAMs was the same as between RFs and classification trees. Spatial correspondence among maps was influenced the most by model prediction accuracy (AUC) and species prevalence; map correspondence was highest when accuracy was high and prevalence was intermediate. Species functional type and the selection of climate variables also influenced map correspondence. For most (but not all) species, climate variables were more important than terrain or soil in predicting their distributions. Environmental variable selection varied according to modeling method, but the largest differences were between RFs and GLMs or GAMs. Although prediction accuracy was equal for GLMs, GAMs, and RFs, the differences in spatial predictions suggest that it may be important to evaluate the results of more than one model to estimate a range of spatial uncertainty before making planning decisions based on map outputs. This may be particularly important if models have low accuracy or if species prevalence is not intermediate.</p>

Research Papers in Economics