Search CORE

121 research outputs found

Horvitz-Thompson estimators for functional data: asymptotic confidence bands and optimal allocation for stratified sampling

Author: Cardot Hervé
Josserand Etienne
Publication venue
Publication date: 29/09/2010
Field of study

When dealing with very large datasets of functional data, survey sampling approaches are useful in order to obtain estimators of simple functional quantities, without being obliged to store all the data. We propose here a Horvitz--Thompson estimator of the mean trajectory. In the context of a superpopulation framework, we prove under mild regularity conditions that we obtain uniformly consistent estimators of the mean function and of its variance function. With additional assumptions on the sampling design we state a functional Central Limit Theorem and deduce asymptotic confidence bands. Stratified sampling is studied in detail, and we also obtain a functional version of the usual optimal allocation rule considering a mean variance criterion. These techniques are illustrated by means of a test population of N=18902 electricity meters for which we have individual electricity consumption measures every 30 minutes over one week. We show that stratification can substantially improve both the accuracy of the estimators and reduce the width of the global confidence bands compared to simple random sampling without replacement.Comment: Accepted for publication in Biometrik

arXiv.org e-Print Archive

CiteSeerX

Fast Estimation of the Median Covariation Matrix with Application to Online Robust Principal Components Analysis

Author: Cardot Hervé
Godichon-Baggioni Antoine
Publication venue
Publication date: 09/07/2016
Field of study

The geometric median covariation matrix is a robust multivariate indicator of dispersion which can be extended without any difficulty to functional data. We define estimators, based on recursive algorithms, that can be simply updated at each new observation and are able to deal rapidly with large samples of high dimensional data without being obliged to store all the data in memory. Asymptotic convergence properties of the recursive algorithms are studied under weak conditions. The computation of the principal components can also be performed online and this approach can be useful for online outlier detection. A simulation study clearly shows that this robust indicator is a competitive alternative to minimum covariance determinant when the dimension of the data is small and robust principal components analysis based on projection pursuit and spherical projections for high dimension data. An illustration on a large sample and high dimensional dataset consisting of individual TV audiences measured at a minute scale over a period of 24 hours confirms the interest of considering the robust principal components analysis based on the median covariation matrix. All studied algorithms are available in the R package Gmedian on CRAN

arXiv.org e-Print Archive

HAL-uB

HAL - Université de Franche-Comté

Crossref

Confidence bands for Horvitz-Thompson estimators using sampled noisy functional data

Author: Cardot Hervé
Degras David
Josserand Etienne
Publication venue: 'Bernoulli Society for Mathematical Statistics and Probability'
Publication date: 01/01/2013
Field of study

When collections of functional data are too large to be exhaustively observed, survey sampling techniques provide an effective way to estimate global quantities such as the population mean function. Assuming functional data are collected from a finite population according to a probabilistic sampling scheme, with the measurements being discrete in time and noisy, we propose to first smooth the sampled trajectories with local polynomials and then estimate the mean function with a Horvitz-Thompson estimator. Under mild conditions on the population size, observation times, regularity of the trajectories, sampling scheme, and smoothing bandwidth, we prove a Central Limit theorem in the space of continuous functions. We also establish the uniform consistency of a covariance function estimator and apply the former results to build confidence bands for the mean function. The bands attain nominal coverage and are obtained through Gaussian process simulations conditional on the estimated covariance function. To select the bandwidth, we propose a cross-validation method that accounts for the sampling weights. A simulation study assesses the performance of our approach and highlights the influence of the sampling scheme and bandwidth choice.Comment: Published in at http://dx.doi.org/10.3150/12-BEJ443 the Bernoulli (http://isi.cbs.nl/bernoulli/) by the International Statistical Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm

arXiv.org e-Print Archive

CiteSeerX

Crossref

Online estimation of the geometric median in Hilbert spaces : non asymptotic confidence balls

Author: Cardot Hervé
Cénac Peggy
Godichon Antoine
Publication venue
Publication date: 27/01/2015
Field of study

Estimation procedures based on recursive algorithms are interesting and powerful techniques that are able to deal rapidly with (very) large samples of high dimensional data. The collected data may be contaminated by noise so that robust location indicators, such as the geometric median, may be preferred to the mean. In this context, an estimator of the geometric median based on a fast and efficient averaged non linear stochastic gradient algorithm has been developed by Cardot, C\'enac and Zitt (2013). This work aims at studying more precisely the non asymptotic behavior of this algorithm by giving non asymptotic confidence balls. This new result is based on the derivation of improved

L^2

rates of convergence as well as an exponential inequality for the martingale terms of the recursive non linear Robbins-Monro algorithm

arXiv.org e-Print Archive

HAL-uB

HAL - Université de Franche-Comté

Crossref

A fast and recursive algorithm for clustering large datasets with $k$ -medians

Author: Cardot Hervé
Cénac Peggy
Monnez Jean-Marie
Publication venue
Publication date: 18/10/2011
Field of study

Clustering with fast algorithms large samples of high dimensional data is an important challenge in computational statistics. Borrowing ideas from MacQueen (1967) who introduced a sequential version of the

k

-means algorithm, a new class of recursive stochastic gradient algorithms designed for the

k

-medians loss criterion is proposed. By their recursive nature, these algorithms are very fast and are well adapted to deal with large samples of data that are allowed to arrive sequentially. It is proved that the stochastic gradient algorithm converges almost surely to the set of stationary points of the underlying loss criterion. A particular attention is paid to the averaged versions, which are known to have better performances, and a data-driven procedure that allows automatic selection of the value of the descent step is proposed. The performance of the averaged sequential estimator is compared on a simulation study, both in terms of computation speed and accuracy of the estimations, with more classical partitioning techniques such as

k

-means, trimmed

k

-means and PAM (partitioning around medoids). Finally, this new online clustering technique is illustrated on determining television audience profiles with a sample of more than 5000 individual television audiences measured every minute over a period of 24 hours.Comment: Under revision for Computational Statistics and Data Analysi

arXiv.org e-Print Archive

HAL-uB

HAL - Université de Franche-Comté

Crossref

INRIA a CCSD electronic archive server

Uniform convergence and asymptotic confidence bands for model-assisted estimators of the mean of sampled functional data

Author: Cardot Hervé
Goga Camelia
Lardin Pauline
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2013
Field of study

Revised version for the Electronic Journal of StatisticsInternational audienceWhen the study variable is functional and storage capacities are limited or transmission costs are high, selecting with survey sampling techniques a small fraction of the observations is an interesting alternative to signal compression techniques, particularly when the goal is the estimation of simple quantities such as means or totals. We extend, in this functional framework, model-assisted estimators with linear regression models that can take account of auxiliary variables whose totals over the population are known. We first show, under weak hypotheses on the sampling design and the regularity of the trajectories, that the estimator of the mean function as well as its variance estimator are uniformly consistent. Then, under additional assumptions, we prove a functional central limit theorem and we assess rigorously a fast technique based on simulations of Gaussian processes which is employed to build asymptotic confidence bands. The accuracy of the variance function estimator is evaluated on a real dataset of sampled electricity consumption curves measured every half an hour over a period of one week

arXiv.org e-Print Archive

HAL-uB

HAL - Université de Franche-Comté

Crossref

Estimation spline de quantiles conditionnels pour variables explicatives fonctionnelles

Author: Cardot Hervé
Crambes Christophe
Sarda Pascal
Publication venue: 'Elsevier BV'
Publication date: 01/01/2004
Field of study

International audienceCette Note a pour objet un modèle de régression linéaire sur quantiles lorsque la variable explicative est à valeurs dans un espace fonctionnel alors que la variable réponse est réelle. Nous proposons un estimateur spline du coefficient fonctionnel basé sur la minimisation d'un critère de type L1 pénalisé (la pénalisation est primordiale pour avoir l'existence et la convergence de l'estimateur), puis nous étudions le comportement asymptotique de cet estimateur

HAL-uB

HAL - Université de Franche-Comté

Scientific Publications of the University of Toulouse II Le Mirail

Comptes Rendus Mathématique

Numérisation de Documents Anciens Mathématiques

HAL-INSA Toulouse