4,790 research outputs found

    Component selection and smoothing in multivariate nonparametric regression

    Full text link
    We propose a new method for model selection and model fitting in multivariate nonparametric regression models, in the framework of smoothing spline ANOVA. The ``COSSO'' is a method of regularization with the penalty functional being the sum of component norms, instead of the squared norm employed in the traditional smoothing spline method. The COSSO provides a unified framework for several recent proposals for model selection in linear models and smoothing spline ANOVA models. Theoretical properties, such as the existence and the rate of convergence of the COSSO estimator, are studied. In the special case of a tensor product design with periodic functions, a detailed analysis reveals that the COSSO does model selection by applying a novel soft thresholding type operation to the function components. We give an equivalent formulation of the COSSO estimator which leads naturally to an iterative algorithm. We compare the COSSO with MARS, a popular method that builds functional ANOVA models, in simulations and real examples. The COSSO method can be extended to classification problems and we compare its performance with those of a number of machine learning algorithms on real datasets. The COSSO gives very competitive performance in these studies.Comment: Published at http://dx.doi.org/10.1214/009053606000000722 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    P-spline anova-type interaction models for spatio-temporal smoothing

    Get PDF
    In recent years, spatial and spatio-temporal modelling have become an important area of research in many fields (epidemiology, environmental studies, disease mapping, ...). However, most of the models developed are constrained by the large amounts of data available. We propose the use of Penalized splines (P-splines) in a mixed model framework for smoothing spatio-temporal data. Our approach allows the consideration of interaction terms which can be decomposed as a sum of smooth functions similarly as an ANOVA decomposition. The properties of the bases used for regression allow the use of algorithms that can handle large amount of data. We show that imposing the same constraints as in a factorial design it is possible to avoid identifiability problems. We illustrate the methodology for Europe ozone levels in the period 1999-2005

    Smoothing Spline ANOVA Models and their Applications in Complex and Massive Datasets

    Get PDF
    Complex and massive datasets can be easily accessed using the newly developed data acquisition technology. In spite of the fact that the smoothing spline ANOVA models have proven to be useful in a variety of fields, these datasets impose the challenges on the applications of the models. In this chapter, we present a selected review of the smoothing spline ANOVA models and highlight some challenges and opportunities in massive datasets. We review two approaches to significantly reduce the computational costs of fitting the model. One real case study is used to illustrate the performance of the reviewed methods

    Computationally Efficient Kalman Filter Approaches for Fitting Smoothing Splines

    Get PDF
    Smoothing spline models have shown to be effective in various fields (e.g., engineering and biomedical sciences) for understanding complex signals from noisy data. As nonparametric models, smoothing spline ANOVA (Analysis Of variance) models do not fix the structure of the regression function, leading to more flexible model estimates (e.g., linear or nonlinear estimates). The functional ANOVA decomposition of the regression function estimates offers interpretable results that describe the relationship between the outcome variable, and the main and interaction effects of different covariates/predictors. However, smoothing spline ANOVA (SS-ANOVA) models suffer from high computational costs, with a computational complexity of ON3 for N observations. Various numerical approaches can address this problem. In this chapter, we focus on the introduction to a state space representation of SS-ANOVA models. The estimation algorithms based on the Kalman filter are implemented within the SS-ANOVA framework using the state space representation, reducing the computational costs significantly

    Calibration of Computational Models with Categorical Parameters and Correlated Outputs via Bayesian Smoothing Spline ANOVA

    Full text link
    It has become commonplace to use complex computer models to predict outcomes in regions where data does not exist. Typically these models need to be calibrated and validated using some experimental data, which often consists of multiple correlated outcomes. In addition, some of the model parameters may be categorical in nature, such as a pointer variable to alternate models (or submodels) for some of the physics of the system. Here we present a general approach for calibration in such situations where an emulator of the computationally demanding models and a discrepancy term from the model to reality are represented within a Bayesian Smoothing Spline (BSS) ANOVA framework. The BSS-ANOVA framework has several advantages over the traditional Gaussian Process, including ease of handling categorical inputs and correlated outputs, and improved computational efficiency. Finally this framework is then applied to the problem that motivated its design; a calibration of a computational fluid dynamics model of a bubbling fluidized which is used as an absorber in a CO2 capture system

    Multivariate Bernoulli distribution

    Full text link
    In this paper, we consider the multivariate Bernoulli distribution as a model to estimate the structure of graphs with binary nodes. This distribution is discussed in the framework of the exponential family, and its statistical properties regarding independence of the nodes are demonstrated. Importantly the model can estimate not only the main effects and pairwise interactions among the nodes but also is capable of modeling higher order interactions, allowing for the existence of complex clique effects. We compare the multivariate Bernoulli model with existing graphical inference models - the Ising model and the multivariate Gaussian model, where only the pairwise interactions are considered. On the other hand, the multivariate Bernoulli distribution has an interesting property in that independence and uncorrelatedness of the component random variables are equivalent. Both the marginal and conditional distributions of a subset of variables in the multivariate Bernoulli distribution still follow the multivariate Bernoulli distribution. Furthermore, the multivariate Bernoulli logistic model is developed under generalized linear model theory by utilizing the canonical link function in order to include covariate information on the nodes, edges and cliques. We also consider variable selection techniques such as LASSO in the logistic model to impose sparsity structure on the graph. Finally, we discuss extending the smoothing spline ANOVA approach to the multivariate Bernoulli logistic model to enable estimation of non-linear effects of the predictor variables.Comment: Published in at http://dx.doi.org/10.3150/12-BEJSP10 the Bernoulli (http://isi.cbs.nl/bernoulli/) by the International Statistical Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm

    Nonparametric spectral analysis with applications to seizure characterization using EEG time series

    Full text link
    Understanding the seizure initiation process and its propagation pattern(s) is a critical task in epilepsy research. Characteristics of the pre-seizure electroencephalograms (EEGs) such as oscillating powers and high-frequency activities are believed to be indicative of the seizure onset and spread patterns. In this article, we analyze epileptic EEG time series using nonparametric spectral estimation methods to extract information on seizure-specific power and characteristic frequency [or frequency band(s)]. Because the EEGs may become nonstationary before seizure events, we develop methods for both stationary and local stationary processes. Based on penalized Whittle likelihood, we propose a direct generalized maximum likelihood (GML) and generalized approximate cross-validation (GACV) methods to estimate smoothing parameters in both smoothing spline spectrum estimation of a stationary process and smoothing spline ANOVA time-varying spectrum estimation of a locally stationary process. We also propose permutation methods to test if a locally stationary process is stationary. Extensive simulations indicate that the proposed direct methods, especially the direct GML, are stable and perform better than other existing methods. We apply the proposed methods to the intracranial electroencephalograms (IEEGs) of an epileptic patient to gain insights into the seizure generation process.Comment: Published in at http://dx.doi.org/10.1214/08-AOAS185 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org
    • …
    corecore