3,277 research outputs found
Functional Regression
Functional data analysis (FDA) involves the analysis of data whose ideal
units of observation are functions defined on some continuous domain, and the
observed data consist of a sample of functions taken from some population,
sampled on a discrete grid. Ramsay and Silverman's 1997 textbook sparked the
development of this field, which has accelerated in the past 10 years to become
one of the fastest growing areas of statistics, fueled by the growing number of
applications yielding this type of data. One unique characteristic of FDA is
the need to combine information both across and within functions, which Ramsay
and Silverman called replication and regularization, respectively. This article
will focus on functional regression, the area of FDA that has received the most
attention in applications and methodological development. First will be an
introduction to basis functions, key building blocks for regularization in
functional regression methods, followed by an overview of functional regression
methods, split into three types: [1] functional predictor regression
(scalar-on-function), [2] functional response regression (function-on-scalar)
and [3] function-on-function regression. For each, the role of replication and
regularization will be discussed and the methodological development described
in a roughly chronological manner, at times deviating from the historical
timeline to group together similar methods. The primary focus is on modeling
and methodology, highlighting the modeling structures that have been developed
and the various regularization approaches employed. At the end is a brief
discussion describing potential areas of future development in this field
P-spline anova-type interaction models for spatio-temporal smoothing
In recent years, spatial and spatio-temporal modelling have become an important area of research in many fields (epidemiology, environmental studies, disease mapping, ...). However, most of the models developed are constrained by the large amounts of data available. We propose the use of Penalized splines (P-splines) in a mixed model framework for smoothing spatio-temporal data. Our approach allows the consideration of interaction terms which can be decomposed as a sum of smooth functions similarly as an ANOVA decomposition. The properties of the bases used for regression allow the use of algorithms that can handle large amount of data. We show that imposing the same constraints as in a factorial design it is possible to avoid identifiability problems. We illustrate the methodology for Europe ozone levels in the period 1999-2005
Penalized Likelihood and Bayesian Function Selection in Regression Models
Challenging research in various fields has driven a wide range of
methodological advances in variable selection for regression models with
high-dimensional predictors. In comparison, selection of nonlinear functions in
models with additive predictors has been considered only more recently. Several
competing suggestions have been developed at about the same time and often do
not refer to each other. This article provides a state-of-the-art review on
function selection, focusing on penalized likelihood and Bayesian concepts,
relating various approaches to each other in a unified framework. In an
empirical comparison, also including boosting, we evaluate several methods
through applications to simulated and real data, thereby providing some
guidance on their performance in practice
Calibration of Computational Models with Categorical Parameters and Correlated Outputs via Bayesian Smoothing Spline ANOVA
It has become commonplace to use complex computer models to predict outcomes
in regions where data does not exist. Typically these models need to be
calibrated and validated using some experimental data, which often consists of
multiple correlated outcomes. In addition, some of the model parameters may be
categorical in nature, such as a pointer variable to alternate models (or
submodels) for some of the physics of the system. Here we present a general
approach for calibration in such situations where an emulator of the
computationally demanding models and a discrepancy term from the model to
reality are represented within a Bayesian Smoothing Spline (BSS) ANOVA
framework. The BSS-ANOVA framework has several advantages over the traditional
Gaussian Process, including ease of handling categorical inputs and correlated
outputs, and improved computational efficiency. Finally this framework is then
applied to the problem that motivated its design; a calibration of a
computational fluid dynamics model of a bubbling fluidized which is used as an
absorber in a CO2 capture system
Conditional Spectral Analysis of Replicated Multiple Time Series with Application to Nocturnal Physiology
This article considers the problem of analyzing associations between power
spectra of multiple time series and cross-sectional outcomes when data are
observed from multiple subjects. The motivating application comes from sleep
medicine, where researchers are able to non-invasively record physiological
time series signals during sleep. The frequency patterns of these signals,
which can be quantified through the power spectrum, contain interpretable
information about biological processes. An important problem in sleep research
is drawing connections between power spectra of time series signals and
clinical characteristics; these connections are key to understanding biological
pathways through which sleep affects, and can be treated to improve, health.
Such analyses are challenging as they must overcome the complicated structure
of a power spectrum from multiple time series as a complex positive-definite
matrix-valued function. This article proposes a new approach to such analyses
based on a tensor-product spline model of Cholesky components of
outcome-dependent power spectra. The approach flexibly models power spectra as
nonparametric functions of frequency and outcome while preserving geometric
constraints. Formulated in a fully Bayesian framework, a Whittle likelihood
based Markov chain Monte Carlo (MCMC) algorithm is developed for automated
model fitting and for conducting inference on associations between outcomes and
spectral measures. The method is used to analyze data from a study of sleep in
older adults and uncovers new insights into how stress and arousal are
connected to the amount of time one spends in bed
Normal-Mixture-of-Inverse-Gamma Priors for Bayesian Regularization and Model Selection in Structured Additive Regression Models
In regression models with many potential predictors, choosing an appropriate subset of covariates and their interactions at the same time as determining whether linear or more flexible functional forms are required is a challenging and important task. We propose a spike-and-slab prior structure in order to include or exclude single coefficients as well as blocks of coefficients associated
with factor variables, random effects or basis expansions
of smooth functions. Structured additive models with this prior structure are estimated with Markov Chain Monte Carlo using a redundant multiplicative parameter expansion. We discuss shrinkage properties of the novel prior induced by the redundant parameterization, investigate its sensitivity to hyperparameter settings and compare performance of the proposed method in terms of model selection, sparsity recovery, and estimation error for Gaussian, binomial and Poisson responses on real and simulated data sets with that of component-wise boosting and other approaches
Extending Functional kriging to a multivariate context
Environmental data usually have a spatio-temporal structure; pollutant concentrations, for example, are recorded along time and space. Generalized Additive Models (GAMs) represent a suitable tool to model spatial and/or temporal trends of this kind of data, that can be treated as functional, although they are collected as discrete observations. Frequently, the attention is focused on the prediction of a single pollutant at an unmonitored site and, at this aim, we extend kriging for functional data to a multivariate context by exploiting the correlation with the other pollutants. In particular, we propose two procedures: the first one (FKED) combines the regression of a variable (pollutant), of primary interest on the other variables, with functional kriging of the regression residuals; the second one (FCK) is based on linear unbiased prediction of spatially correlated multivariate random processes. The performance of the two proposed procedures is assessed by cross validation; data recorded during a year (2011) from the monitoring network of the state of California (USA) are considered
- …