45,333 research outputs found
Functional Regression
Functional data analysis (FDA) involves the analysis of data whose ideal
units of observation are functions defined on some continuous domain, and the
observed data consist of a sample of functions taken from some population,
sampled on a discrete grid. Ramsay and Silverman's 1997 textbook sparked the
development of this field, which has accelerated in the past 10 years to become
one of the fastest growing areas of statistics, fueled by the growing number of
applications yielding this type of data. One unique characteristic of FDA is
the need to combine information both across and within functions, which Ramsay
and Silverman called replication and regularization, respectively. This article
will focus on functional regression, the area of FDA that has received the most
attention in applications and methodological development. First will be an
introduction to basis functions, key building blocks for regularization in
functional regression methods, followed by an overview of functional regression
methods, split into three types: [1] functional predictor regression
(scalar-on-function), [2] functional response regression (function-on-scalar)
and [3] function-on-function regression. For each, the role of replication and
regularization will be discussed and the methodological development described
in a roughly chronological manner, at times deviating from the historical
timeline to group together similar methods. The primary focus is on modeling
and methodology, highlighting the modeling structures that have been developed
and the various regularization approaches employed. At the end is a brief
discussion describing potential areas of future development in this field
Exact Bayesian curve fitting and signal segmentation.
We consider regression models where the underlying functional relationship between the response and the explanatory variable is modeled as independent linear regressions on disjoint segments. We present an algorithm for perfect simulation from the posterior distribution of such a model, even allowing for an unknown number of segments and an unknown model order for the linear regressions within each segment. The algorithm is simple, can scale well to large data sets, and avoids the problem of diagnosing convergence that is present with Monte Carlo Markov Chain (MCMC) approaches to this problem. We demonstrate our algorithm on standard denoising problems, on a piecewise constant AR model, and on a speech segmentation problem
The Overlooked Potential of Generalized Linear Models in Astronomy - I: Binomial Regression
Revealing hidden patterns in astronomical data is often the path to
fundamental scientific breakthroughs; meanwhile the complexity of scientific
inquiry increases as more subtle relationships are sought. Contemporary data
analysis problems often elude the capabilities of classical statistical
techniques, suggesting the use of cutting edge statistical methods. In this
light, astronomers have overlooked a whole family of statistical techniques for
exploratory data analysis and robust regression, the so-called Generalized
Linear Models (GLMs). In this paper -- the first in a series aimed at
illustrating the power of these methods in astronomical applications -- we
elucidate the potential of a particular class of GLMs for handling
binary/binomial data, the so-called logit and probit regression techniques,
from both a maximum likelihood and a Bayesian perspective. As a case in point,
we present the use of these GLMs to explore the conditions of star formation
activity and metal enrichment in primordial minihaloes from cosmological
hydro-simulations including detailed chemistry, gas physics, and stellar
feedback. We predict that for a dark mini-halo with metallicity , an increase of in the gas
molecular fraction, increases the probability of star formation occurrence by a
factor of 75%. Finally, we highlight the use of receiver operating
characteristic curves as a diagnostic for binary classifiers, and ultimately we
use these to demonstrate the competitive predictive performance of GLMs against
the popular technique of artificial neural networks.Comment: 20 pages, 10 figures, 3 tables, accepted for publication in Astronomy
and Computin
Recommended from our members
Automatic, computer aided geometric design of free-knot, regression splines
A new algorithm for Computer Aided Geometric Design of least squares (LS) splines with variable knots, named GeDS, is presented. It is based on interpreting functional spline regression as a parametric B-spline curve, and on using the shape preserving property of its control polygon. The GeDS algorithm includes two major stages. For the first stage, an automatic adaptive, knot location algorithm is developed. By adding knots, one at a time, it sequentially "breaks" a straight line segment into pieces in order to construct a linear LS B-spline fit, which captures the "shape" of the data. A stopping rule is applied which avoids both over and under fitting and selects the number of knots for the second stage of GeDS, in which smoother, higher order (quadratic, cubic, etc.) fits are generated. The knots appropriate for the second stage are determined, according to a new knot location method, called the averaging method. It approximately preserves the linear precision property of B-spline curves and allows the attachment of smooth higher order LS B-spline fits to a control polygon, so that the shape of the linear polygon of stage one is followed. The GeDS method produces simultaneously linear, quadratic, cubic (and possibly higher order) spline fits with one and the same number of B-spline regression functions. The GeDS algorithm is very fast, since no deterministic or stochastic knot insertion/deletion and relocation search strategies are involved, neither in the first nor the second stage. Extensive numerical examples are provided, illustrating the performance of GeDS and the quality of the resulting LS spline fits. The GeDS procedure is compared with other existing variable knot spline methods and smoothing techniques, such as SARS, HAS, MDL, AGS methods and is shown to produce models with fewer parameters but with similar goodness of fit characteristics, and visual quality
- …