872 research outputs found
A Unified Framework of Constrained Regression
Generalized additive models (GAMs) play an important role in modeling and
understanding complex relationships in modern applied statistics. They allow
for flexible, data-driven estimation of covariate effects. Yet researchers
often have a priori knowledge of certain effects, which might be monotonic or
periodic (cyclic) or should fulfill boundary conditions. We propose a unified
framework to incorporate these constraints for both univariate and bivariate
effect estimates and for varying coefficients. As the framework is based on
component-wise boosting methods, variables can be selected intrinsically, and
effects can be estimated for a wide range of different distributional
assumptions. Bootstrap confidence intervals for the effect estimates are
derived to assess the models. We present three case studies from environmental
sciences to illustrate the proposed seamless modeling framework. All discussed
constrained effect estimates are implemented in the comprehensive R package
mboost for model-based boosting.Comment: This is a preliminary version of the manuscript. The final
publication is available at
http://link.springer.com/article/10.1007/s11222-014-9520-
Nonparametric estimation of an additive quantile regression model
This paper is concerned with estimating the additive components of a nonparametric
additive quantile regression model. We develop an estimator that is asymptotically
normally distributed with a rate of convergence in probability of n^{-r/(2+10)} when the
additive components are r-times continuously differentiable for some r\geq2. This result
holds regardless of the dimension of the covariates and, therefore, the new estimator
has no curse of dimensionality. In addition, the estimator has an oracle property and is
easily extended to a generalized additive quantile regression model with a link function.
The numerical performance and usefulness of the estimator are illustrated by Monte
Carlo experiments and an empirical example
Peaks detection and alignment for mass spectrometry data
The goal of this paper is to review existing methods for protein mass spectrometry data analysis, and to present a new methodology for automatic extraction of significant peaks (biomarkers). For the pre-processing step required for data from MALDI-TOF or SELDI- TOF spectra, we use a purely nonparametric approach that combines stationary invariant wavelet transform for noise removal and penalized spline quantile regression for baseline correction. We further present a multi-scale spectra alignment technique that is based on identification of statistically significant peaks from a set of spectra. This method allows one to find common peaks in a set of spectra that can subsequently be mapped to individual proteins. This may serve as useful biomarkers in medical applications, or as individual features for further multidimensional statistical analysis. MALDI-TOF spectra obtained from serum samples are used throughout the paper to illustrate the methodology
Conditional Transformation Models
The ultimate goal of regression analysis is to obtain information about the
conditional distribution of a response given a set of explanatory variables.
This goal is, however, seldom achieved because most established regression
models only estimate the conditional mean as a function of the explanatory
variables and assume that higher moments are not affected by the regressors.
The underlying reason for such a restriction is the assumption of additivity of
signal and noise. We propose to relax this common assumption in the framework
of transformation models. The novel class of semiparametric regression models
proposed herein allows transformation functions to depend on explanatory
variables. These transformation functions are estimated by regularised
optimisation of scoring rules for probabilistic forecasts, e.g. the continuous
ranked probability score. The corresponding estimated conditional distribution
functions are consistent. Conditional transformation models are potentially
useful for describing possible heteroscedasticity, comparing spatially varying
distributions, identifying extreme events, deriving prediction intervals and
selecting variables beyond mean regression effects. An empirical investigation
based on a heteroscedastic varying coefficient simulation model demonstrates
that semiparametric estimation of conditional distribution functions can be
more beneficial than kernel-based non-parametric approaches or parametric
generalised additive models for location, scale and shape
Convex mixture regression for quantitative risk assessment
There is wide interest in studying how the distribution of a continuous response changes with a predictor. We are motivated by environmental applications in which the predictor is the dose of an exposure and the response is a health outcome. A main focus in these studies is inference on dose levels associated with a given increase in risk relative to a baseline. In addressing this goal, popular methods either dichotomize the continuous response or focus on modeling changes with the dose in the expectation of the outcome. Such choices may lead to information loss and provide inaccurate inference on dose-response relationships. We instead propose a Bayesian convex mixture regression model that allows the entire distribution of the health outcome to be unknown and changing with the dose. To balance flexibility and parsimony, we rely on a mixture model for the density at the extreme doses, and express the conditional density at each intermediate dose via a convex combination of these extremal densities. This representation generalizes classical dose-response models for quantitative outcomes, and provides a more parsimonious, but still powerful, formulation compared to nonparametric methods, thereby improving interpretability and efficiency in inference on risk functions. A Markov chain Monte Carlo algorithm for posterior inference is developed, and the benefits of our methods are outlined in simulations, along with a study on the impact of dde exposure on gestational age
Nonparametric estimation of an additive quantile regression model
This paper is concerned with estimating the additive components of a nonparametric additive quantile regression model. We develop an estimator that is asymptotically normally distributed with a rate of convergence in probability of n-r/(2r+1) when the additive components are r-times continuously differentiable for some r = 2. This result holds regardless of the dimension of the covariates and, therefore, the new estimator has no curse of dimensionality. In addition, the estimator has an oracle property and is easily extended to a generalized additive quantile regression model with a link function. The numerical performance and usefulness of the estimator are illustrated by Monte Carlo experiments and an empirical example.
- …