8 research outputs found

    Semiparametric curve alignment and shift density estimation for biological data

    Full text link
    Assume that we observe a large number of curves, all of them with identical, although unknown, shape, but with a different random shift. The objective is to estimate the individual time shifts and their distribution. Such an objective appears in several biological applications like neuroscience or ECG signal processing, in which the estimation of the distribution of the elapsed time between repetitive pulses with a possibly low signal-noise ratio, and without a knowledge of the pulse shape is of interest. We suggest an M-estimator leading to a three-stage algorithm: we split our data set in blocks, on which the estimation of the shifts is done by minimizing a cost criterion based on a functional of the periodogram; the estimated shifts are then plugged into a standard density estimator. We show that under mild regularity assumptions the density estimate converges weakly to the true shift distribution. The theory is applied both to simulations and to alignment of real ECG signals. The estimator of the shift distribution performs well, even in the case of low signal-to-noise ratio, and is shown to outperform the standard methods for curve alignment.Comment: 30 pages ; v5 : minor changes and correction in the proof of Proposition 3.

    A Robbins-Monro procedure for estimation in semiparametric regression models

    Get PDF
    This paper is devoted to the parametric estimation of a shift together with the nonparametric estimation of a regression function in a semiparametric regression model. We implement a very efficient and easy to handle Robbins-Monro procedure. On the one hand, we propose a stochastic algorithm similar to that of Robbins-Monro in order to estimate the shift parameter. A preliminary evaluation of the regression function is not necessary to estimate the shift parameter. On the other hand, we make use of a recursive Nadaraya-Watson estimator for the estimation of the regression function. This kernel estimator takes into account the previous estimation of the shift parameter. We establish the almost sure convergence for both Robbins-Monro and Nadaraya--Watson estimators. The asymptotic normality of our estimates is also provided. Finally, we illustrate our semiparametric estimation procedure on simulated and real data.Comment: Published in at http://dx.doi.org/10.1214/12-AOS969 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Fréchet means of curves for signal averaging and application to ECG data analysis

    Get PDF
    Signal averaging is the process that consists in computing a mean shape from a set of noisy signals. In the presence of geometric variability in time in the data, the usual Euclidean mean of the raw data yields a mean pattern that does not reflect the typical shape of the observed signals. In this setting, it is necessary to use alignment techniques for a precise synchronization of the signals, and then to average the aligned data to obtain a consistent mean shape. In this paper, we study the numerical performances of Fréchet means of curves which are extensions of the usual Euclidean mean to spaces endowed with non-Euclidean metrics. This yields a new algorithm for signal averaging and for the estimation of the time variability of a set of signals. We apply this approach to the analysis of heartbeats from ECG records

    On the consistency of Fr\'echet means in deformable models for curve and image analysis

    Get PDF
    A new class of statistical deformable models is introduced to study high-dimensional curves or images. In addition to the standard measurement error term, these deformable models include an extra error term modeling the individual variations in intensity around a mean pattern. It is shown that an appropriate tool for statistical inference in such models is the notion of sample Fr\'echet means, which leads to estimators of the deformation parameters and the mean pattern. The main contribution of this paper is to study how the behavior of these estimators depends on the number n of design points and the number J of observed curves (or images). Numerical experiments are given to illustrate the finite sample performances of the procedure

    Template estimation for samples of curves and functional calibration estimation via the method of maximum entropy on the mean

    Get PDF
    L'une des principales difficultés de l'analyse des données fonctionnelles consiste à extraire un motif commun qui synthétise l'information contenue par toutes les fonctions de l'échantillon. Le Chapitre 2 examine le problème d'identification d'une fonction qui représente le motif commun en supposant que les données appartiennent à une variété ou en sont suffisamment proches, d'une variété non linéaire de basse dimension intrinsèque munie d'une structure géométrique inconnue et incluse dans un espace de grande dimension. Sous cette hypothèse, un approximation de la distance géodésique est proposé basé sur une version modifiée de l'algorithme Isomap. Cette approximation est utilisée pour calculer la fonction médiane empirique de Fréchet correspondante. Cela fournit un estimateur intrinsèque robuste de la forme commune. Le Chapitre 3 étudie les propriétés asymptotiques de la méthode de normalisation quantile développée par Bolstad, et al. (2003) qui est devenue l'une des méthodes les plus populaires pour aligner des courbes de densité en analyse de données de microarrays en bioinformatique. Les propriétés sont démontrées considérant la méthode comme un cas particulier de la procédure de la moyenne structurelle pour l'alignement des courbes proposée par Dupuy, Loubes and Maza (2011). Toutefois, la méthode échoue dans certains cas. Ainsi, nous proposons une nouvelle méthode, pour faire face à ce problème. Cette méthode utilise l'algorithme développée dans le Chapitre 2. Dans le Chapitre 4, nous étendons le problème d'estimation de calage pour la moyenne d'une population finie de la variable de sondage dans un cadre de données fonctionnelles. Nous considérons le problème de l'estimation des poids de sondage fonctionnel à travers le principe du maximum d'entropie sur la moyenne -MEM-. En particulier, l'estimation par calage est considérée comme un problème inverse linéaire de dimension infinie suivant la structure de l'approche du MEM. Nous donnons un résultat précis d'estimation des poids de calage fonctionnels pour deux types de mesures aléatoires a priori: la measure Gaussienne centrée et la measure de Poisson généralisée.One of the main difficulties in functional data analysis is the extraction of a meaningful common pattern that summarizes the information conveyed by all functions in the sample. The problem of finding a meaningful template function that represents this pattern is considered in Chapter 2 assuming that the functional data lie on an intrinsically low-dimensional smooth manifold with an unknown underlying geometric structure embedding in a high-dimensional space. Under this setting, an approximation of the geodesic distance is developed based on a robust version of the Isomap algorithm. This approximation is used to compute the corresponding empirical Fréchet median function, which provides a robust intrinsic estimator of the template. The Chapter 3 investigates the asymptotic properties of the quantile normalization method by Bolstad, et al. (2003) which is one of the most popular methods to align density curves in microarray data analysis. The properties are proved by considering the method as a particular case of the structural mean curve alignment procedure by Dupuy, Loubes and Maza (2011). However, the method fails in some case of mixtures, and a new methodology to cope with this issue is proposed via the algorithm developed in Chapter 2. Finally, the problem of calibration estimation for the finite population mean of a survey variable under a functional data framework is studied in Chapter 4. The functional calibration sampling weights of the estimator are obtained by matching the calibration estimation problem with the maximum entropy on the mean -MEM- principle. In particular, the calibration estimation is viewed as an infinite-dimensional linear inverse problem following the structure of the MEM approach. A precise theoretical setting is given and the estimation of functional calibration weights assuming, as prior measures, the centered Gaussian and compound Poisson random measures is carried out

    Etude des propriétés statistiques des moyennes de Fréchet dans des modèles de déformations pour l'analyse de courbes et d'images en grande dimension

    Get PDF
    Cette thèse porte sur l'analyse statistique de données sur lesquelles agissent des déformations. Dans un premier temps, nous présentons une nouvelle classe de modèles statistiques semiparamétriques dits de déformations. Ces modèles peuvent s'appliquer à l'étude de courbes temporelles ou d'images de grande dimension. Les données sont supposées être générées par une courbe/image moyenne qui est bruitée et sur laquelle agit un opérateur de déformation. Nous étudions l'estimation des paramètres d'intérêt de ces modèles dans le cas général, puis dans le cas particulier des courbes du plan sur lesquelles agissent les rotations, translations et homothéties. Dans un second temps, nous considérons les structures non-euclidiennes induites par les actions de groupes de déformations. Un des enjeux des statistiques dans de tels espaces est de généraliser la notion de moyenne euclidienne. C'est ainsi que nous étudions les propriétés qui garantissent l'existence de la moyenne de Fréchet dans le cas particulier du cercle unité muni de la distance de la longueur d'arc.We are concerned with the statistical analysis of data observed with extra nuisance deformations. To this end, we first introduce a new class of semi-parametric deformable models. These models can be used to study the variability of time dependent curves or high dimensional images. We suppose that the curves or images at hand are generated by a noisy ideal mean pattern on which act some deformations operators. We then study the estimation of the parameters of interest of such models in the general case and in the particular case of planar curves observed with some rotation, translation and scaling. In a second part, we study the notion of mean in non-Euclidean spaces. More precisely, we study the conditions of existence of the Fréchet mean in the unit circle of the plane endowed with the arclength distance
    corecore