142 research outputs found

    Online estimation of the geometric median in Hilbert spaces : non asymptotic confidence balls

    Full text link
    Estimation procedures based on recursive algorithms are interesting and powerful techniques that are able to deal rapidly with (very) large samples of high dimensional data. The collected data may be contaminated by noise so that robust location indicators, such as the geometric median, may be preferred to the mean. In this context, an estimator of the geometric median based on a fast and efficient averaged non linear stochastic gradient algorithm has been developed by Cardot, C\'enac and Zitt (2013). This work aims at studying more precisely the non asymptotic behavior of this algorithm by giving non asymptotic confidence balls. This new result is based on the derivation of improved L2L^2 rates of convergence as well as an exponential inequality for the martingale terms of the recursive non linear Robbins-Monro algorithm

    Digital search trees and chaos game representation

    Get PDF
    In this paper, we consider a possible representation of a DNA sequence in a quaternary tree, in which on can visualize repetitions of subwords. The CGR-tree turns a sequence of letters into a digital search tree (DST), obtained from the suffixes of the reversed sequence. Several results are known concerning the height and the insertion depth for DST built from i.i.d. successive sequences. Here, the successive inserted wors are strongly dependent. We give the asymptotic behaviour of the insertion depth and of the length of branches for the CGR-tree obtained from the suffixes of reversed i.i.d. or Markovian sequence. This behaviour turns out to be at first order the same one as in the case of independent words. As a by-product, asymptotic results on the length of longest runs in a Markovian sequence are obtained

    A fast and recursive algorithm for clustering large datasets with kk-medians

    Get PDF
    Clustering with fast algorithms large samples of high dimensional data is an important challenge in computational statistics. Borrowing ideas from MacQueen (1967) who introduced a sequential version of the kk-means algorithm, a new class of recursive stochastic gradient algorithms designed for the kk-medians loss criterion is proposed. By their recursive nature, these algorithms are very fast and are well adapted to deal with large samples of data that are allowed to arrive sequentially. It is proved that the stochastic gradient algorithm converges almost surely to the set of stationary points of the underlying loss criterion. A particular attention is paid to the averaged versions, which are known to have better performances, and a data-driven procedure that allows automatic selection of the value of the descent step is proposed. The performance of the averaged sequential estimator is compared on a simulation study, both in terms of computation speed and accuracy of the estimations, with more classical partitioning techniques such as kk-means, trimmed kk-means and PAM (partitioning around medoids). Finally, this new online clustering technique is illustrated on determining television audience profiles with a sample of more than 5000 individual television audiences measured every minute over a period of 24 hours.Comment: Under revision for Computational Statistics and Data Analysi

    Variable length Markov chains and dynamical sources

    Full text link
    Infinite random sequences of letters can be viewed as stochastic chains or as strings produced by a source, in the sense of information theory. The relationship between Variable Length Markov Chains (VLMC) and probabilistic dynamical sources is studied. We establish a probabilistic frame for context trees and VLMC and we prove that any VLMC is a dynamical source for which we explicitly build the mapping. On two examples, the ``comb'' and the ``bamboo blossom'', we find a necessary and sufficient condition for the existence and the unicity of a stationary probability measure for the VLMC. These two examples are detailed in order to provide the associated Dirichlet series as well as the generating functions of word occurrences.Comment: 45 pages, 15 figure

    Attractive regular stochastic chains: perfect simulation and phase transition

    Full text link
    We prove that uniqueness of the stationary chain, or equivalently, of the gg-measure, compatible with an attractive regular probability kernel is equivalent to either one of the following two assertions for this chain: (1) it is a finitary coding of an i.i.d. process with countable alphabet, (2) the concentration of measure holds at exponential rate. We show in particular that if a stationary chain is uniquely defined by a kernel that is continuous and attractive, then this chain can be sampled using a coupling-from-the-past algorithm. For the original Bramson-Kalikow model we further prove that there exists a unique compatible chain if and only if the chain is a finitary coding of a finite alphabet i.i.d. process. Finally, we obtain some partial results on conditions for phase transition for general chains of infinite order.Comment: 22 pages, 1 pseudo-algorithm, 1 figure. Minor changes in the presentation. Lemma 6 has been remove

    Control of friction forces with stationary wave piezoelectric actuator

    Get PDF
    "In the field of the research on piezoelectric motors, the control of friction forces by ultrasonic waves was studied mainly from an experimental point of view [1]. The principle of friction force reduction by imposed mechanical vibrations in unlubricated contacts was recently studied in order to reduce the friction losses in an internal combustion engine with Langevin type actuators [2]. This article deals with the advantages of flexural stationary wave piezoelectric actuators in the control of the friction forces thanks to their high pressures generated at high frequencies (>10 Khz). The use of specific contact geometry which is defined by the Hertz theory associated with partial slip contact conditions, allows optimizing the lubrication effect. In the case of piezoelectric torque limiter application, the design and the numerical simulation of dedicated piezoelectric actuator are compared. In agreement with the contact modeling, the characterization of the complete actuator on mechanical test bench validates the ""torque limiter"" function and the optimization of lubrication principle with dedicated contact geometry

    Digital search trees and chaos game representation

    Get PDF
    Version préliminaire (2006) d'un travail publié sous forme définitive (2009).International audienceIn this paper, we consider a possible representation of a DNA sequence in a quaternary tree, in which on can visualize repetitions of subwords. The CGR-tree turns a sequence of letters into a digital search tree (DST), obtained from the suffixes of the reversed sequence. Several results are known concerning the height and the insertion depth for DST built from i.i.d. successive sequences. Here, the successive inserted wors are strongly dependent. We give the asymptotic behaviour of the insertion depth and of the length of branches for the CGR-tree obtained from the suffixes of reversed i.i.d. or Markovian sequence. This behaviour turns out to be at first order the same one as in the case of independent words. As a by-product, asymptotic results on the length of longest runs in a Markovian sequence are obtained

    Dynamical Systems in the Analysis of Biological Sequences

    Get PDF
    The Chaos Game Representation (CGR) maps a sequence of letters taken from a finite alphabet onto the unit square in R2R^2. While it is a popular tool, few mathematical results have been proved to date. In this report, we show that the CGR gives rise to a limit measure, assuming only the input sequence is stationary ergodic. Some more precise properties are given in the i.i.d. and Markov cases. A new family of statistical tests to characterize the randomness of the inputs is proposed and analyzed. Finally, some basic properties of the CGR are used to generalize the notion of genomic signatur
    • …
    corecore