22 research outputs found

    Clusterwise methods, past and present

    Get PDF
    International audienceInstead of fitting a single and global model (regression, PCA, etc.) to a set of observations, clusterwise methods look simultaneously for a partition into k clusters and k local models optimizing some criterion. There are two main approaches: 1. the least squares approach introduced by E.Diday in the 70's, derived from k-means 2. mixture models using maximum likelihood but only the first one easily enables prediction. After a survey of classical methods, we will present recent extensions to functional, symbolic and multiblock data

    A Clusterwise Regression Method for the Prediction of the Disposal Income in Municipalities

    Get PDF
    The paper illustrates a clusterwise regression procedure applied to the prediction of per capita disposal income (PCDI) in Italian municipalities. The municipal prediction is derived from the provincial PCDI taking into account the discrepancy between municipality and province in some indicators like per capita taxable income, per capita bank deposits, employment rate, etc. The relation between PCDI and indicators is shaped by a regression model. A single regression model doesn\u2019t fit very well all territorial units, but different regression models do it in groups of them. The aim of clusteriwise regression is just that: detecting clusters where the correspondent regression models explain the data better than an overall regression model does. The application of the procedure to a real case shows that a significative reduction of the regression standard error can be achieved

    Nonparametric time series forecasting with dynamic updating

    Get PDF
    We present a nonparametric method to forecast a seasonal univariate time series, and propose four dynamic updating methods to improve point forecast accuracy. Our methods consider a seasonal univariate time series as a functional time series. We propose first to reduce the dimensionality by applying functional principal component analysis to the historical observations, and then to use univariate time series forecasting and functional principal component regression techniques. When data in the most recent year are partially observed, we improve point forecast accuracy using dynamic updating methods. We also introduce a nonparametric approach to construct prediction intervals of updated forecasts, and compare the empirical coverage probability with an existing parametric method. Our approaches are data-driven and computationally fast, and hence they are feasible to be applied in real time high frequency dynamic updating. The methods are demonstrated using monthly sea surface temperatures from 1950 to 2008.Functional time series, Functional principal component analysis, Ordinary least squares, Penalized least squares, Ridge regression, Sea surface temperatures, Seasonal time series.

    Nonparametric modeling and forecasting electricity demand: an empirical study

    Get PDF
    This paper uses half-hourly electricity demand data in South Australia as an empirical study of nonparametric modeling and forecasting methods for prediction from half-hour ahead to one year ahead. A notable feature of the univariate time series of electricity demand is the presence of both intraweek and intraday seasonalities. An intraday seasonal cycle is apparent from the similarity of the demand from one day to the next, and an intraweek seasonal cycle is evident from comparing the demand on the corresponding day of adjacent weeks. There is a strong appeal in using forecasting methods that are able to capture both seasonalities. In this paper, the forecasting methods slice a seasonal univariate time series into a time series of curves. The forecasting methods reduce the dimensionality by applying functional principal component analysis to the observed data, and then utilize an univariate time series forecasting method and functional principal component regression techniques. When data points in the most recent curve are sequentially observed, updating methods can improve the point and interval forecast accuracy. We also revisit a nonparametric approach to construct prediction intervals of updated forecasts, and evaluate the interval forecast accuracy.Functional principal component analysis; functional time series; multivariate time series, ordinary least squares, penalized least squares; ridge regression; seasonal time series

    Methodology and theory for partial least squares applied to functional data

    Full text link
    The partial least squares procedure was originally developed to estimate the slope parameter in multivariate parametric models. More recently it has gained popularity in the functional data literature. There, the partial least squares estimator of slope is either used to construct linear predictive models, or as a tool to project the data onto a one-dimensional quantity that is employed for further statistical analysis. Although the partial least squares approach is often viewed as an attractive alternative to projections onto the principal component basis, its properties are less well known than those of the latter, mainly because of its iterative nature. We develop an explicit formulation of partial least squares for functional data, which leads to insightful results and motivates new theory, demonstrating consistency and establishing convergence rates.Comment: Published in at http://dx.doi.org/10.1214/11-AOS958 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Fast Convergence on Perfect Classification for Functional Data

    Full text link
    In this study, we investigate the availability of approaching to perfect classification on functional data with finite samples. The seminal work (Delaigle and Hall (2012)) showed that classification on functional data is easier to define on a perfect classifier than on finite-dimensional data. This result is based on their finding that a sufficient condition for the existence of a perfect classifier, named a Delaigle--Hall (DH) condition, is only available for functional data. However, there is a danger that a large sample size is required to achieve the perfect classification even though the DH condition holds because a convergence of misclassification errors of functional data is significantly slow. Specifically, a minimax rate of the convergence of errors with functional data has a logarithm order in the sample size. This study solves this complication by proving that the DH condition also achieves fast convergence of the misclassification error in sample size. Therefore, we study a classifier with empirical risk minimization using reproducing kernel Hilbert space (RKHS) and analyse its convergence rate under the DH condition. The result shows that the convergence speed of the misclassification error by the RKHS classifier has an exponential order in sample size. Technically, the proof is based on the following points: (i) connecting the DH condition and a margin of classifiers, and (ii) handling metric entropy of functional data. Experimentally, we validate that the DH condition and the associated margin condition have a certain impact on the convergence rate of the RKHS classifier. We also find that some of the other classifiers for functional data have a similar property.Comment: 26 page

    Subgroup analysis for the functional linear model

    Full text link
    Classical functional linear regression models the relationship between a scalar response and a functional covariate, where the coefficient function is assumed to be identical for all subjects. In this paper, the classical model is extended to allow heterogeneous coefficient functions across different subgroups of subjects. The greatest challenge is that the subgroup structure is usually unknown to us. To this end, we develop a penalization-based approach which innovatively applies the penalized fusion technique to simultaneously determine the number and structure of subgroups and coefficient functions within each subgroup. An effective computational algorithm is derived. We also establish the oracle properties and estimation consistency. Extensive numerical simulations demonstrate its superiority compared to several competing methods. The analysis of an air quality dataset leads to interesting findings and improved predictions.Comment: 24 pages, 9 figure

    Clusterwise analysis for multiblock component methods

    Get PDF
    International audienceMultiblock component methods are applied to data sets for which several blocks of variables are measured on a same set of observations with the goal to analyze the relationships between these blocks of variables. In this article, we focus on multi-block component methods that integrate the information found in several blocks of explanatory variables in order to describe and explain one set of dependent variables. In the following, multiblock PLS and multiblock redundancy analysis are chosen, as particular cases of multiblock component methods when one set of variables is explained by a set of predictor variables that is organized into blocks. Because these multiblock techniques assume that the observations come from a homogeneous population they will provide suboptimal results when the observations actually come from different populations. A strategy to palliate this problem-presented in this article-is to use a technique such as clusterwise regression in order to identify homogeneous clusters of observations. This approach creates two new methods that provide clusters that have their own sets of regression coefficients. This combination of clustering and regres-B Stéphanie Bougeard 123 S. Bougeard et al. sion improves the overall quality of the prediction and facilitates the interpretation. In addition, the minimization of a well-defined criterion-by means of a sequential algorithm-ensures that the algorithm converges monotonously. Finally, the proposed method is distribution-free and can be used when the explanatory variables outnumber the observations within clusters. The proposed clusterwise multiblock methods are illustrated with of a simulation study and a (simulated) example from marketing
    corecore