14,786 research outputs found
Functional principal components analysis via penalized rank one approximation
Two existing approaches to functional principal components analysis (FPCA)
are due to Rice and Silverman (1991) and Silverman (1996), both based on
maximizing variance but introducing penalization in different ways. In this
article we propose an alternative approach to FPCA using penalized rank one
approximation to the data matrix. Our contributions are four-fold: (1) by
considering invariance under scale transformation of the measurements, the new
formulation sheds light on how regularization should be performed for FPCA and
suggests an efficient power algorithm for computation; (2) it naturally
incorporates spline smoothing of discretized functional data; (3) the
connection with smoothing splines also facilitates construction of
cross-validation or generalized cross-validation criteria for smoothing
parameter selection that allows efficient computation; (4) different smoothing
parameters are permitted for different FPCs. The methodology is illustrated
with a real data example and a simulation.Comment: Published in at http://dx.doi.org/10.1214/08-EJS218 the Electronic
Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Conditional Transformation Models
The ultimate goal of regression analysis is to obtain information about the
conditional distribution of a response given a set of explanatory variables.
This goal is, however, seldom achieved because most established regression
models only estimate the conditional mean as a function of the explanatory
variables and assume that higher moments are not affected by the regressors.
The underlying reason for such a restriction is the assumption of additivity of
signal and noise. We propose to relax this common assumption in the framework
of transformation models. The novel class of semiparametric regression models
proposed herein allows transformation functions to depend on explanatory
variables. These transformation functions are estimated by regularised
optimisation of scoring rules for probabilistic forecasts, e.g. the continuous
ranked probability score. The corresponding estimated conditional distribution
functions are consistent. Conditional transformation models are potentially
useful for describing possible heteroscedasticity, comparing spatially varying
distributions, identifying extreme events, deriving prediction intervals and
selecting variables beyond mean regression effects. An empirical investigation
based on a heteroscedastic varying coefficient simulation model demonstrates
that semiparametric estimation of conditional distribution functions can be
more beneficial than kernel-based non-parametric approaches or parametric
generalised additive models for location, scale and shape
Nonlinear association structures in flexible Bayesian additive joint models
Joint models of longitudinal and survival data have become an important tool
for modeling associations between longitudinal biomarkers and event processes.
The association between marker and log-hazard is assumed to be linear in
existing shared random effects models, with this assumption usually remaining
unchecked. We present an extended framework of flexible additive joint models
that allows the estimation of nonlinear, covariate specific associations by
making use of Bayesian P-splines. Our joint models are estimated in a Bayesian
framework using structured additive predictors for all model components,
allowing for great flexibility in the specification of smooth nonlinear,
time-varying and random effects terms for longitudinal submodel, survival
submodel and their association. The ability to capture truly linear and
nonlinear associations is assessed in simulations and illustrated on the widely
studied biomedical data on the rare fatal liver disease primary biliary
cirrhosis. All methods are implemented in the R package bamlss to facilitate
the application of this flexible joint model in practice.Comment: Changes to initial commit: minor language editing, additional
information in Section 4, formatting in Supplementary Informatio
Boosted Beta regression.
Regression analysis with a bounded outcome is a common problem in applied statistics. Typical examples include regression models for percentage outcomes and the analysis of ratings that are measured on a bounded scale. In this paper, we consider beta regression, which is a generalization of logit models to situations where the response is continuous on the interval (0,1). Consequently, beta regression is a convenient tool for analyzing percentage responses. The classical approach to fit a beta regression model is to use maximum likelihood estimation with subsequent AIC-based variable selection. As an alternative to this established - yet unstable - approach, we propose a new estimation technique called boosted beta regression. With boosted beta regression estimation and variable selection can be carried out simultaneously in a highly efficient way. Additionally, both the mean and the variance of a percentage response can be modeled using flexible nonlinear covariate effects. As a consequence, the new method accounts for common problems such as overdispersion and non-binomial variance structures
A Unified Framework of Constrained Regression
Generalized additive models (GAMs) play an important role in modeling and
understanding complex relationships in modern applied statistics. They allow
for flexible, data-driven estimation of covariate effects. Yet researchers
often have a priori knowledge of certain effects, which might be monotonic or
periodic (cyclic) or should fulfill boundary conditions. We propose a unified
framework to incorporate these constraints for both univariate and bivariate
effect estimates and for varying coefficients. As the framework is based on
component-wise boosting methods, variables can be selected intrinsically, and
effects can be estimated for a wide range of different distributional
assumptions. Bootstrap confidence intervals for the effect estimates are
derived to assess the models. We present three case studies from environmental
sciences to illustrate the proposed seamless modeling framework. All discussed
constrained effect estimates are implemented in the comprehensive R package
mboost for model-based boosting.Comment: This is a preliminary version of the manuscript. The final
publication is available at
http://link.springer.com/article/10.1007/s11222-014-9520-
Representation of Functional Data in Neural Networks
Functional Data Analysis (FDA) is an extension of traditional data analysis
to functional data, for example spectra, temporal series, spatio-temporal
images, gesture recognition data, etc. Functional data are rarely known in
practice; usually a regular or irregular sampling is known. For this reason,
some processing is needed in order to benefit from the smooth character of
functional data in the analysis methods. This paper shows how to extend the
Radial-Basis Function Networks (RBFN) and Multi-Layer Perceptron (MLP) models
to functional data inputs, in particular when the latter are known through
lists of input-output pairs. Various possibilities for functional processing
are discussed, including the projection on smooth bases, Functional Principal
Component Analysis, functional centering and reduction, and the use of
differential operators. It is shown how to incorporate these functional
processing into the RBFN and MLP models. The functional approach is illustrated
on a benchmark of spectrometric data analysis.Comment: Also available online from:
http://www.sciencedirect.com/science/journal/0925231
Boosting Functional Response Models for Location, Scale and Shape with an Application to Bacterial Competition
We extend Generalized Additive Models for Location, Scale, and Shape (GAMLSS)
to regression with functional response. This allows us to simultaneously model
point-wise mean curves, variances and other distributional parameters of the
response in dependence of various scalar and functional covariate effects. In
addition, the scope of distributions is extended beyond exponential families.
The model is fitted via gradient boosting, which offers inherent model
selection and is shown to be suitable for both complex model structures and
highly auto-correlated response curves. This enables us to analyze bacterial
growth in \textit{Escherichia coli} in a complex interaction scenario,
fruitfully extending usual growth models.Comment: bootstrap confidence interval type uncertainty bounds added; minor
changes in formulation
- …