4,790 research outputs found
Component selection and smoothing in multivariate nonparametric regression
We propose a new method for model selection and model fitting in multivariate
nonparametric regression models, in the framework of smoothing spline ANOVA.
The ``COSSO'' is a method of regularization with the penalty functional being
the sum of component norms, instead of the squared norm employed in the
traditional smoothing spline method. The COSSO provides a unified framework for
several recent proposals for model selection in linear models and smoothing
spline ANOVA models. Theoretical properties, such as the existence and the rate
of convergence of the COSSO estimator, are studied. In the special case of a
tensor product design with periodic functions, a detailed analysis reveals that
the COSSO does model selection by applying a novel soft thresholding type
operation to the function components. We give an equivalent formulation of the
COSSO estimator which leads naturally to an iterative algorithm. We compare the
COSSO with MARS, a popular method that builds functional ANOVA models, in
simulations and real examples. The COSSO method can be extended to
classification problems and we compare its performance with those of a number
of machine learning algorithms on real datasets. The COSSO gives very
competitive performance in these studies.Comment: Published at http://dx.doi.org/10.1214/009053606000000722 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
P-spline anova-type interaction models for spatio-temporal smoothing
In recent years, spatial and spatio-temporal modelling have become an important area of research in many fields (epidemiology, environmental studies, disease mapping, ...). However, most of the models developed are constrained by the large amounts of data available. We propose the use of Penalized splines (P-splines) in a mixed model framework for smoothing spatio-temporal data. Our approach allows the consideration of interaction terms which can be decomposed as a sum of smooth functions similarly as an ANOVA decomposition. The properties of the bases used for regression allow the use of algorithms that can handle large amount of data. We show that imposing the same constraints as in a factorial design it is possible to avoid identifiability problems. We illustrate the methodology for Europe ozone levels in the period 1999-2005
Smoothing Spline ANOVA Models and their Applications in Complex and Massive Datasets
Complex and massive datasets can be easily accessed using the newly developed data acquisition technology. In spite of the fact that the smoothing spline ANOVA models have proven to be useful in a variety of fields, these datasets impose the challenges on the applications of the models. In this chapter, we present a selected review of the smoothing spline ANOVA models and highlight some challenges and opportunities in massive datasets. We review two approaches to significantly reduce the computational costs of fitting the model. One real case study is used to illustrate the performance of the reviewed methods
Computationally Efficient Kalman Filter Approaches for Fitting Smoothing Splines
Smoothing spline models have shown to be effective in various fields (e.g., engineering and biomedical sciences) for understanding complex signals from noisy data. As nonparametric models, smoothing spline ANOVA (Analysis Of variance) models do not fix the structure of the regression function, leading to more flexible model estimates (e.g., linear or nonlinear estimates). The functional ANOVA decomposition of the regression function estimates offers interpretable results that describe the relationship between the outcome variable, and the main and interaction effects of different covariates/predictors. However, smoothing spline ANOVA (SS-ANOVA) models suffer from high computational costs, with a computational complexity of ON3 for N observations. Various numerical approaches can address this problem. In this chapter, we focus on the introduction to a state space representation of SS-ANOVA models. The estimation algorithms based on the Kalman filter are implemented within the SS-ANOVA framework using the state space representation, reducing the computational costs significantly
Calibration of Computational Models with Categorical Parameters and Correlated Outputs via Bayesian Smoothing Spline ANOVA
It has become commonplace to use complex computer models to predict outcomes
in regions where data does not exist. Typically these models need to be
calibrated and validated using some experimental data, which often consists of
multiple correlated outcomes. In addition, some of the model parameters may be
categorical in nature, such as a pointer variable to alternate models (or
submodels) for some of the physics of the system. Here we present a general
approach for calibration in such situations where an emulator of the
computationally demanding models and a discrepancy term from the model to
reality are represented within a Bayesian Smoothing Spline (BSS) ANOVA
framework. The BSS-ANOVA framework has several advantages over the traditional
Gaussian Process, including ease of handling categorical inputs and correlated
outputs, and improved computational efficiency. Finally this framework is then
applied to the problem that motivated its design; a calibration of a
computational fluid dynamics model of a bubbling fluidized which is used as an
absorber in a CO2 capture system
Multivariate Bernoulli distribution
In this paper, we consider the multivariate Bernoulli distribution as a model
to estimate the structure of graphs with binary nodes. This distribution is
discussed in the framework of the exponential family, and its statistical
properties regarding independence of the nodes are demonstrated. Importantly
the model can estimate not only the main effects and pairwise interactions
among the nodes but also is capable of modeling higher order interactions,
allowing for the existence of complex clique effects. We compare the
multivariate Bernoulli model with existing graphical inference models - the
Ising model and the multivariate Gaussian model, where only the pairwise
interactions are considered. On the other hand, the multivariate Bernoulli
distribution has an interesting property in that independence and
uncorrelatedness of the component random variables are equivalent. Both the
marginal and conditional distributions of a subset of variables in the
multivariate Bernoulli distribution still follow the multivariate Bernoulli
distribution. Furthermore, the multivariate Bernoulli logistic model is
developed under generalized linear model theory by utilizing the canonical link
function in order to include covariate information on the nodes, edges and
cliques. We also consider variable selection techniques such as LASSO in the
logistic model to impose sparsity structure on the graph. Finally, we discuss
extending the smoothing spline ANOVA approach to the multivariate Bernoulli
logistic model to enable estimation of non-linear effects of the predictor
variables.Comment: Published in at http://dx.doi.org/10.3150/12-BEJSP10 the Bernoulli
(http://isi.cbs.nl/bernoulli/) by the International Statistical
Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm
Nonparametric spectral analysis with applications to seizure characterization using EEG time series
Understanding the seizure initiation process and its propagation pattern(s)
is a critical task in epilepsy research. Characteristics of the pre-seizure
electroencephalograms (EEGs) such as oscillating powers and high-frequency
activities are believed to be indicative of the seizure onset and spread
patterns. In this article, we analyze epileptic EEG time series using
nonparametric spectral estimation methods to extract information on
seizure-specific power and characteristic frequency [or frequency band(s)].
Because the EEGs may become nonstationary before seizure events, we develop
methods for both stationary and local stationary processes. Based on penalized
Whittle likelihood, we propose a direct generalized maximum likelihood (GML)
and generalized approximate cross-validation (GACV) methods to estimate
smoothing parameters in both smoothing spline spectrum estimation of a
stationary process and smoothing spline ANOVA time-varying spectrum estimation
of a locally stationary process. We also propose permutation methods to test if
a locally stationary process is stationary. Extensive simulations indicate that
the proposed direct methods, especially the direct GML, are stable and perform
better than other existing methods. We apply the proposed methods to the
intracranial electroencephalograms (IEEGs) of an epileptic patient to gain
insights into the seizure generation process.Comment: Published in at http://dx.doi.org/10.1214/08-AOAS185 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …