1,513 research outputs found

    Functional Regression

    Full text link
    Functional data analysis (FDA) involves the analysis of data whose ideal units of observation are functions defined on some continuous domain, and the observed data consist of a sample of functions taken from some population, sampled on a discrete grid. Ramsay and Silverman's 1997 textbook sparked the development of this field, which has accelerated in the past 10 years to become one of the fastest growing areas of statistics, fueled by the growing number of applications yielding this type of data. One unique characteristic of FDA is the need to combine information both across and within functions, which Ramsay and Silverman called replication and regularization, respectively. This article will focus on functional regression, the area of FDA that has received the most attention in applications and methodological development. First will be an introduction to basis functions, key building blocks for regularization in functional regression methods, followed by an overview of functional regression methods, split into three types: [1] functional predictor regression (scalar-on-function), [2] functional response regression (function-on-scalar) and [3] function-on-function regression. For each, the role of replication and regularization will be discussed and the methodological development described in a roughly chronological manner, at times deviating from the historical timeline to group together similar methods. The primary focus is on modeling and methodology, highlighting the modeling structures that have been developed and the various regularization approaches employed. At the end is a brief discussion describing potential areas of future development in this field

    A Framework for Unbiased Model Selection Based on Boosting

    Get PDF
    Variable selection and model choice are of major concern in many statistical applications, especially in high-dimensional regression models. Boosting is a convenient statistical method that combines model fitting with intrinsic model selection. We investigate the impact of base-learner specification on the performance of boosting as a model selection procedure. We show that variable selection may be biased if the covariates are of different nature. Important examples are models combining continuous and categorical covariates, especially if the number of categories is large. In this case, least squares base-learners offer increased flexibility for the categorical covariate and lead to a preference even if the categorical covariate is non-informative. Similar difficulties arise when comparing linear and nonlinear base-learners for a continuous covariate. The additional flexibility in the nonlinear base-learner again yields a preference of the more complex modeling alternative. We investigate these problems from a theoretical perspective and suggest a framework for unbiased model selection based on a general class of penalized least squares base-learners. Making all base-learners comparable in terms of their degrees of freedom strongly reduces the selection bias observed in naive boosting specifications. The importance of unbiased model selection is demonstrated in simulations and an application to forest health models

    Model-Based Clustering and Classification of Functional Data

    Full text link
    The problem of complex data analysis is a central topic of modern statistical science and learning systems and is becoming of broader interest with the increasing prevalence of high-dimensional data. The challenge is to develop statistical models and autonomous algorithms that are able to acquire knowledge from raw data for exploratory analysis, which can be achieved through clustering techniques or to make predictions of future data via classification (i.e., discriminant analysis) techniques. Latent data models, including mixture model-based approaches are one of the most popular and successful approaches in both the unsupervised context (i.e., clustering) and the supervised one (i.e, classification or discrimination). Although traditionally tools of multivariate analysis, they are growing in popularity when considered in the framework of functional data analysis (FDA). FDA is the data analysis paradigm in which the individual data units are functions (e.g., curves, surfaces), rather than simple vectors. In many areas of application, the analyzed data are indeed often available in the form of discretized values of functions or curves (e.g., time series, waveforms) and surfaces (e.g., 2d-images, spatio-temporal data). This functional aspect of the data adds additional difficulties compared to the case of a classical multivariate (non-functional) data analysis. We review and present approaches for model-based clustering and classification of functional data. We derive well-established statistical models along with efficient algorithmic tools to address problems regarding the clustering and the classification of these high-dimensional data, including their heterogeneity, missing information, and dynamical hidden structure. The presented models and algorithms are illustrated on real-world functional data analysis problems from several application area

    Facial Expression Recognition

    Get PDF

    Bayesian semiparametric inference for multivariate doubly-interval-censored data

    Get PDF
    Based on a data set obtained in a dental longitudinal study, conducted in Flanders (Belgium), the joint time to caries distribution of permanent first molars was modeled as a function of covariates. This involves an analysis of multivariate continuous doubly-interval-censored data since: (i) the emergence time of a tooth and the time it experiences caries were recorded yearly, and (ii) events on teeth of the same child are dependent. To model the joint distribution of the emergence times and the times to caries, we propose a dependent Bayesian semiparametric model. A major feature of the proposed approach is that survival curves can be estimated without imposing assumptions such as proportional hazards, additive hazards, proportional odds or accelerated failure time.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS368 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Parametric Regression on the Grassmannian

    Get PDF
    We address the problem of fitting parametric curves on the Grassmann manifold for the purpose of intrinsic parametric regression. As customary in the literature, we start from the energy minimization formulation of linear least-squares in Euclidean spaces and generalize this concept to general nonflat Riemannian manifolds, following an optimal-control point of view. We then specialize this idea to the Grassmann manifold and demonstrate that it yields a simple, extensible and easy-to-implement solution to the parametric regression problem. In fact, it allows us to extend the basic geodesic model to (1) a time-warped variant and (2) cubic splines. We demonstrate the utility of the proposed solution on different vision problems, such as shape regression as a function of age, traffic-speed estimation and crowd-counting from surveillance video clips. Most notably, these problems can be conveniently solved within the same framework without any specifically-tailored steps along the processing pipeline.Comment: 14 pages, 11 figure