17,309 research outputs found

    An update on statistical boosting in biomedicine

    Get PDF
    Statistical boosting algorithms have triggered a lot of research during the last decade. They combine a powerful machine-learning approach with classical statistical modelling, offering various practical advantages like automated variable selection and implicit regularization of effect estimates. They are extremely flexible, as the underlying base-learners (regression functions defining the type of effect for the explanatory variables) can be combined with any kind of loss function (target function to be optimized, defining the type of regression setting). In this review article, we highlight the most recent methodological developments on statistical boosting regarding variable selection, functional regression and advanced time-to-event modelling. Additionally, we provide a short overview on relevant applications of statistical boosting in biomedicine

    Forecasting Player Behavioral Data and Simulating in-Game Events

    Full text link
    Understanding player behavior is fundamental in game data science. Video games evolve as players interact with the game, so being able to foresee player experience would help to ensure a successful game development. In particular, game developers need to evaluate beforehand the impact of in-game events. Simulation optimization of these events is crucial to increase player engagement and maximize monetization. We present an experimental analysis of several methods to forecast game-related variables, with two main aims: to obtain accurate predictions of in-app purchases and playtime in an operational production environment, and to perform simulations of in-game events in order to maximize sales and playtime. Our ultimate purpose is to take a step towards the data-driven development of games. The results suggest that, even though the performance of traditional approaches such as ARIMA is still better, the outcomes of state-of-the-art techniques like deep learning are promising. Deep learning comes up as a well-suited general model that could be used to forecast a variety of time series with different dynamic behaviors

    Variation in Spatial Predictions Among Species Distribution Modeling Methods

    Get PDF
    <p>Prediction maps produced by species distribution models (SDMs) influence decision-making in resource management or designation of land in conservation planning. Many studies have compared the prediction accuracy of different SDM modeling methods, but few have quantified the similarity among prediction maps. There has also been little systematic exploration of how the relative importance of different predictor variables varies among model types. Our objective was to expand the evaluation of SDM performance for 45 plant species in southern California to better understand how map predictions vary among model types, and to explain what factors may affect spatial correspondence, including the selection and relative importance of different environmental variables. Four types of models were tested. Correlation among maps was highest between generalized linear models (GLMs) and generalized additive models (GAMs) and lowest between classification trees and GAMs or GLMs. Correlation between Random Forests (RFs) and GAMs was the same as between RFs and classification trees. Spatial correspondence among maps was influenced the most by model prediction accuracy (AUC) and species prevalence; map correspondence was highest when accuracy was high and prevalence was intermediate. Species functional type and the selection of climate variables also influenced map correspondence. For most (but not all) species, climate variables were more important than terrain or soil in predicting their distributions. Environmental variable selection varied according to modeling method, but the largest differences were between RFs and GLMs or GAMs. Although prediction accuracy was equal for GLMs, GAMs, and RFs, the differences in spatial predictions suggest that it may be important to evaluate the results of more than one model to estimate a range of spatial uncertainty before making planning decisions based on map outputs. This may be particularly important if models have low accuracy or if species prevalence is not intermediate.</p>
    corecore