686 research outputs found

    Boosting the concordance index for survival data - a unified framework to derive and evaluate biomarker combinations

    Get PDF
    The development of molecular signatures for the prediction of time-to-event outcomes is a methodologically challenging task in bioinformatics and biostatistics. Although there are numerous approaches for the derivation of marker combinations and their evaluation, the underlying methodology often suffers from the problem that different optimization criteria are mixed during the feature selection, estimation and evaluation steps. This might result in marker combinations that are only suboptimal regarding the evaluation criterion of interest. To address this issue, we propose a unified framework to derive and evaluate biomarker combinations. Our approach is based on the concordance index for time-to-event data, which is a non-parametric measure to quantify the discrimatory power of a prediction rule. Specifically, we propose a component-wise boosting algorithm that results in linear biomarker combinations that are optimal with respect to a smoothed version of the concordance index. We investigate the performance of our algorithm in a large-scale simulation study and in two molecular data sets for the prediction of survival in breast cancer patients. Our numerical results show that the new approach is not only methodologically sound but can also lead to a higher discriminatory power than traditional approaches for the derivation of gene signatures.Comment: revised manuscript - added simulation study, additional result

    Gradient boosting in Markov-switching generalized additive models for location, scale and shape

    Full text link
    We propose a novel class of flexible latent-state time series regression models which we call Markov-switching generalized additive models for location, scale and shape. In contrast to conventional Markov-switching regression models, the presented methodology allows us to model different state-dependent parameters of the response distribution - not only the mean, but also variance, skewness and kurtosis parameters - as potentially smooth functions of a given set of explanatory variables. In addition, the set of possible distributions that can be specified for the response is not limited to the exponential family but additionally includes, for instance, a variety of Box-Cox-transformed, zero-inflated and mixture distributions. We propose an estimation approach based on the EM algorithm, where we use the gradient boosting framework to prevent overfitting while simultaneously performing variable selection. The feasibility of the suggested approach is assessed in simulation experiments and illustrated in a real-data setting, where we model the conditional distribution of the daily average price of energy in Spain over time

    Model-based Boosting in R: A Hands-on Tutorial Using the R Package mboost

    Get PDF
    We provide a detailed hands-on tutorial for the R add-on package mboost. The package implements boosting for optimizing general risk functions utilizing component-wise (penalized) least squares estimates as base-learners for fitting various kinds of generalized linear and generalized additive models to potentially high-dimensional data. We give a theoretical background and demonstrate how mboost can be used to fit interpretable models of different complexity. As an example we use mboost to predict the body fat based on anthropometric measurements throughout the tutorial
    • …
    corecore