686 research outputs found
Boosting the concordance index for survival data - a unified framework to derive and evaluate biomarker combinations
The development of molecular signatures for the prediction of time-to-event
outcomes is a methodologically challenging task in bioinformatics and
biostatistics. Although there are numerous approaches for the derivation of
marker combinations and their evaluation, the underlying methodology often
suffers from the problem that different optimization criteria are mixed during
the feature selection, estimation and evaluation steps. This might result in
marker combinations that are only suboptimal regarding the evaluation criterion
of interest. To address this issue, we propose a unified framework to derive
and evaluate biomarker combinations. Our approach is based on the concordance
index for time-to-event data, which is a non-parametric measure to quantify the
discrimatory power of a prediction rule. Specifically, we propose a
component-wise boosting algorithm that results in linear biomarker combinations
that are optimal with respect to a smoothed version of the concordance index.
We investigate the performance of our algorithm in a large-scale simulation
study and in two molecular data sets for the prediction of survival in breast
cancer patients. Our numerical results show that the new approach is not only
methodologically sound but can also lead to a higher discriminatory power than
traditional approaches for the derivation of gene signatures.Comment: revised manuscript - added simulation study, additional result
Gradient boosting in Markov-switching generalized additive models for location, scale and shape
We propose a novel class of flexible latent-state time series regression
models which we call Markov-switching generalized additive models for location,
scale and shape. In contrast to conventional Markov-switching regression
models, the presented methodology allows us to model different state-dependent
parameters of the response distribution - not only the mean, but also variance,
skewness and kurtosis parameters - as potentially smooth functions of a given
set of explanatory variables. In addition, the set of possible distributions
that can be specified for the response is not limited to the exponential family
but additionally includes, for instance, a variety of Box-Cox-transformed,
zero-inflated and mixture distributions. We propose an estimation approach
based on the EM algorithm, where we use the gradient boosting framework to
prevent overfitting while simultaneously performing variable selection. The
feasibility of the suggested approach is assessed in simulation experiments and
illustrated in a real-data setting, where we model the conditional distribution
of the daily average price of energy in Spain over time
Model-based Boosting in R: A Hands-on Tutorial Using the R Package mboost
We provide a detailed hands-on tutorial for the R add-on package mboost. The package implements boosting for optimizing general risk functions utilizing component-wise (penalized) least squares estimates as base-learners for fitting various kinds of generalized linear and generalized additive models to potentially high-dimensional data. We give a theoretical background and demonstrate how mboost can be used to fit interpretable models of different complexity. As an example we use mboost to predict the body fat based on anthropometric measurements throughout the tutorial
- âŚ