3,277 research outputs found

    Functional Regression

    Full text link
    Functional data analysis (FDA) involves the analysis of data whose ideal units of observation are functions defined on some continuous domain, and the observed data consist of a sample of functions taken from some population, sampled on a discrete grid. Ramsay and Silverman's 1997 textbook sparked the development of this field, which has accelerated in the past 10 years to become one of the fastest growing areas of statistics, fueled by the growing number of applications yielding this type of data. One unique characteristic of FDA is the need to combine information both across and within functions, which Ramsay and Silverman called replication and regularization, respectively. This article will focus on functional regression, the area of FDA that has received the most attention in applications and methodological development. First will be an introduction to basis functions, key building blocks for regularization in functional regression methods, followed by an overview of functional regression methods, split into three types: [1] functional predictor regression (scalar-on-function), [2] functional response regression (function-on-scalar) and [3] function-on-function regression. For each, the role of replication and regularization will be discussed and the methodological development described in a roughly chronological manner, at times deviating from the historical timeline to group together similar methods. The primary focus is on modeling and methodology, highlighting the modeling structures that have been developed and the various regularization approaches employed. At the end is a brief discussion describing potential areas of future development in this field

    P-spline anova-type interaction models for spatio-temporal smoothing

    Get PDF
    In recent years, spatial and spatio-temporal modelling have become an important area of research in many fields (epidemiology, environmental studies, disease mapping, ...). However, most of the models developed are constrained by the large amounts of data available. We propose the use of Penalized splines (P-splines) in a mixed model framework for smoothing spatio-temporal data. Our approach allows the consideration of interaction terms which can be decomposed as a sum of smooth functions similarly as an ANOVA decomposition. The properties of the bases used for regression allow the use of algorithms that can handle large amount of data. We show that imposing the same constraints as in a factorial design it is possible to avoid identifiability problems. We illustrate the methodology for Europe ozone levels in the period 1999-2005

    Penalized Likelihood and Bayesian Function Selection in Regression Models

    Full text link
    Challenging research in various fields has driven a wide range of methodological advances in variable selection for regression models with high-dimensional predictors. In comparison, selection of nonlinear functions in models with additive predictors has been considered only more recently. Several competing suggestions have been developed at about the same time and often do not refer to each other. This article provides a state-of-the-art review on function selection, focusing on penalized likelihood and Bayesian concepts, relating various approaches to each other in a unified framework. In an empirical comparison, also including boosting, we evaluate several methods through applications to simulated and real data, thereby providing some guidance on their performance in practice

    Calibration of Computational Models with Categorical Parameters and Correlated Outputs via Bayesian Smoothing Spline ANOVA

    Full text link
    It has become commonplace to use complex computer models to predict outcomes in regions where data does not exist. Typically these models need to be calibrated and validated using some experimental data, which often consists of multiple correlated outcomes. In addition, some of the model parameters may be categorical in nature, such as a pointer variable to alternate models (or submodels) for some of the physics of the system. Here we present a general approach for calibration in such situations where an emulator of the computationally demanding models and a discrepancy term from the model to reality are represented within a Bayesian Smoothing Spline (BSS) ANOVA framework. The BSS-ANOVA framework has several advantages over the traditional Gaussian Process, including ease of handling categorical inputs and correlated outputs, and improved computational efficiency. Finally this framework is then applied to the problem that motivated its design; a calibration of a computational fluid dynamics model of a bubbling fluidized which is used as an absorber in a CO2 capture system

    Conditional Spectral Analysis of Replicated Multiple Time Series with Application to Nocturnal Physiology

    Full text link
    This article considers the problem of analyzing associations between power spectra of multiple time series and cross-sectional outcomes when data are observed from multiple subjects. The motivating application comes from sleep medicine, where researchers are able to non-invasively record physiological time series signals during sleep. The frequency patterns of these signals, which can be quantified through the power spectrum, contain interpretable information about biological processes. An important problem in sleep research is drawing connections between power spectra of time series signals and clinical characteristics; these connections are key to understanding biological pathways through which sleep affects, and can be treated to improve, health. Such analyses are challenging as they must overcome the complicated structure of a power spectrum from multiple time series as a complex positive-definite matrix-valued function. This article proposes a new approach to such analyses based on a tensor-product spline model of Cholesky components of outcome-dependent power spectra. The approach flexibly models power spectra as nonparametric functions of frequency and outcome while preserving geometric constraints. Formulated in a fully Bayesian framework, a Whittle likelihood based Markov chain Monte Carlo (MCMC) algorithm is developed for automated model fitting and for conducting inference on associations between outcomes and spectral measures. The method is used to analyze data from a study of sleep in older adults and uncovers new insights into how stress and arousal are connected to the amount of time one spends in bed

    Normal-Mixture-of-Inverse-Gamma Priors for Bayesian Regularization and Model Selection in Structured Additive Regression Models

    Get PDF
    In regression models with many potential predictors, choosing an appropriate subset of covariates and their interactions at the same time as determining whether linear or more flexible functional forms are required is a challenging and important task. We propose a spike-and-slab prior structure in order to include or exclude single coefficients as well as blocks of coefficients associated with factor variables, random effects or basis expansions of smooth functions. Structured additive models with this prior structure are estimated with Markov Chain Monte Carlo using a redundant multiplicative parameter expansion. We discuss shrinkage properties of the novel prior induced by the redundant parameterization, investigate its sensitivity to hyperparameter settings and compare performance of the proposed method in terms of model selection, sparsity recovery, and estimation error for Gaussian, binomial and Poisson responses on real and simulated data sets with that of component-wise boosting and other approaches

    Extending Functional kriging to a multivariate context

    Get PDF
    Environmental data usually have a spatio-temporal structure; pollutant concentrations, for example, are recorded along time and space. Generalized Additive Models (GAMs) represent a suitable tool to model spatial and/or temporal trends of this kind of data, that can be treated as functional, although they are collected as discrete observations. Frequently, the attention is focused on the prediction of a single pollutant at an unmonitored site and, at this aim, we extend kriging for functional data to a multivariate context by exploiting the correlation with the other pollutants. In particular, we propose two procedures: the first one (FKED) combines the regression of a variable (pollutant), of primary interest on the other variables, with functional kriging of the regression residuals; the second one (FCK) is based on linear unbiased prediction of spatially correlated multivariate random processes. The performance of the two proposed procedures is assessed by cross validation; data recorded during a year (2011) from the monitoring network of the state of California (USA) are considered
    corecore