642 research outputs found
Functional Regression
Functional data analysis (FDA) involves the analysis of data whose ideal
units of observation are functions defined on some continuous domain, and the
observed data consist of a sample of functions taken from some population,
sampled on a discrete grid. Ramsay and Silverman's 1997 textbook sparked the
development of this field, which has accelerated in the past 10 years to become
one of the fastest growing areas of statistics, fueled by the growing number of
applications yielding this type of data. One unique characteristic of FDA is
the need to combine information both across and within functions, which Ramsay
and Silverman called replication and regularization, respectively. This article
will focus on functional regression, the area of FDA that has received the most
attention in applications and methodological development. First will be an
introduction to basis functions, key building blocks for regularization in
functional regression methods, followed by an overview of functional regression
methods, split into three types: [1] functional predictor regression
(scalar-on-function), [2] functional response regression (function-on-scalar)
and [3] function-on-function regression. For each, the role of replication and
regularization will be discussed and the methodological development described
in a roughly chronological manner, at times deviating from the historical
timeline to group together similar methods. The primary focus is on modeling
and methodology, highlighting the modeling structures that have been developed
and the various regularization approaches employed. At the end is a brief
discussion describing potential areas of future development in this field
Group Iterative Spectrum Thresholding for Super-Resolution Sparse Spectral Selection
Recently, sparsity-based algorithms are proposed for super-resolution
spectrum estimation. However, to achieve adequately high resolution in
real-world signal analysis, the dictionary atoms have to be close to each other
in frequency, thereby resulting in a coherent design. The popular convex
compressed sensing methods break down in presence of high coherence and large
noise. We propose a new regularization approach to handle model collinearity
and obtain parsimonious frequency selection simultaneously. It takes advantage
of the pairing structure of sine and cosine atoms in the frequency dictionary.
A probabilistic spectrum screening is also developed for fast computation in
high dimensions. A data-resampling version of high-dimensional Bayesian
Information Criterion is used to determine the regularization parameters.
Experiments show the efficacy and efficiency of the proposed algorithms in
challenging situations with small sample size, high frequency resolution, and
low signal-to-noise ratio
Stochastic partial differential equation based modelling of large space-time data sets
Increasingly larger data sets of processes in space and time ask for
statistical models and methods that can cope with such data. We show that the
solution of a stochastic advection-diffusion partial differential equation
provides a flexible model class for spatio-temporal processes which is
computationally feasible also for large data sets. The Gaussian process defined
through the stochastic partial differential equation has in general a
nonseparable covariance structure. Furthermore, its parameters can be
physically interpreted as explicitly modeling phenomena such as transport and
diffusion that occur in many natural processes in diverse fields ranging from
environmental sciences to ecology. In order to obtain computationally efficient
statistical algorithms we use spectral methods to solve the stochastic partial
differential equation. This has the advantage that approximation errors do not
accumulate over time, and that in the spectral space the computational cost
grows linearly with the dimension, the total computational costs of Bayesian or
frequentist inference being dominated by the fast Fourier transform. The
proposed model is applied to postprocessing of precipitation forecasts from a
numerical weather prediction model for northern Switzerland. In contrast to the
raw forecasts from the numerical model, the postprocessed forecasts are
calibrated and quantify prediction uncertainty. Moreover, they outperform the
raw forecasts, in the sense that they have a lower mean absolute error
Scalar and vector Slepian functions, spherical signal estimation and spectral analysis
It is a well-known fact that mathematical functions that are timelimited (or
spacelimited) cannot be simultaneously bandlimited (in frequency). Yet the
finite precision of measurement and computation unavoidably bandlimits our
observation and modeling scientific data, and we often only have access to, or
are only interested in, a study area that is temporally or spatially bounded.
In the geosciences we may be interested in spectrally modeling a time series
defined only on a certain interval, or we may want to characterize a specific
geographical area observed using an effectively bandlimited measurement device.
It is clear that analyzing and representing scientific data of this kind will
be facilitated if a basis of functions can be found that are "spatiospectrally"
concentrated, i.e. "localized" in both domains at the same time. Here, we give
a theoretical overview of one particular approach to this "concentration"
problem, as originally proposed for time series by Slepian and coworkers, in
the 1960s. We show how this framework leads to practical algorithms and
statistically performant methods for the analysis of signals and their power
spectra in one and two dimensions, and, particularly for applications in the
geosciences, for scalar and vectorial signals defined on the surface of a unit
sphere.Comment: Submitted to the 2nd Edition of the Handbook of Geomathematics,
edited by Willi Freeden, Zuhair M. Nashed and Thomas Sonar, and to be
published by Springer Verlag. This is a slightly modified but expanded
version of the paper arxiv:0909.5368 that appeared in the 1st Edition of the
Handbook, when it was called: Slepian functions and their use in signal
estimation and spectral analysi
Automatic Classification of Irregularly Sampled Time Series with Unequal Lengths: A Case Study on Estimated Glomerular Filtration Rate
A patient's estimated glomerular filtration rate (eGFR) can provide important
information about disease progression and kidney function. Traditionally, an
eGFR time series is interpreted by a human expert labelling it as stable or
unstable. While this approach works for individual patients, the time consuming
nature of it precludes the quick evaluation of risk in large numbers of
patients. However, automating this process poses significant challenges as eGFR
measurements are usually recorded at irregular intervals and the series of
measurements differs in length between patients. Here we present a two-tier
system to automatically classify an eGFR trend. First, we model the time series
using Gaussian process regression (GPR) to fill in `gaps' by resampling a fixed
size vector of fifty time-dependent observations. Second, we classify the
resampled eGFR time series using a K-NN/SVM classifier, and evaluate its
performance via 5-fold cross validation. Using this approach we achieved an
F-score of 0.90, compared to 0.96 for 5 human experts when scored amongst
themselves
Slepian functions and their use in signal estimation and spectral analysis
It is a well-known fact that mathematical functions that are timelimited (or
spacelimited) cannot be simultaneously bandlimited (in frequency). Yet the
finite precision of measurement and computation unavoidably bandlimits our
observation and modeling scientific data, and we often only have access to, or
are only interested in, a study area that is temporally or spatially bounded.
In the geosciences we may be interested in spectrally modeling a time series
defined only on a certain interval, or we may want to characterize a specific
geographical area observed using an effectively bandlimited measurement device.
It is clear that analyzing and representing scientific data of this kind will
be facilitated if a basis of functions can be found that are "spatiospectrally"
concentrated, i.e. "localized" in both domains at the same time. Here, we give
a theoretical overview of one particular approach to this "concentration"
problem, as originally proposed for time series by Slepian and coworkers, in
the 1960s. We show how this framework leads to practical algorithms and
statistically performant methods for the analysis of signals and their power
spectra in one and two dimensions, and on the surface of a sphere.Comment: Submitted to the Handbook of Geomathematics, edited by Willi Freeden,
Zuhair M. Nashed and Thomas Sonar, and to be published by Springer Verla
- …