142 research outputs found
Online estimation of the geometric median in Hilbert spaces : non asymptotic confidence balls
Estimation procedures based on recursive algorithms are interesting and
powerful techniques that are able to deal rapidly with (very) large samples of
high dimensional data. The collected data may be contaminated by noise so that
robust location indicators, such as the geometric median, may be preferred to
the mean. In this context, an estimator of the geometric median based on a fast
and efficient averaged non linear stochastic gradient algorithm has been
developed by Cardot, C\'enac and Zitt (2013). This work aims at studying more
precisely the non asymptotic behavior of this algorithm by giving non
asymptotic confidence balls. This new result is based on the derivation of
improved rates of convergence as well as an exponential inequality for
the martingale terms of the recursive non linear Robbins-Monro algorithm
Digital search trees and chaos game representation
In this paper, we consider a possible representation of a DNA sequence in a
quaternary tree, in which on can visualize repetitions of subwords. The
CGR-tree turns a sequence of letters into a digital search tree (DST), obtained
from the suffixes of the reversed sequence. Several results are known
concerning the height and the insertion depth for DST built from i.i.d.
successive sequences. Here, the successive inserted wors are strongly
dependent. We give the asymptotic behaviour of the insertion depth and of the
length of branches for the CGR-tree obtained from the suffixes of reversed
i.i.d. or Markovian sequence. This behaviour turns out to be at first order the
same one as in the case of independent words. As a by-product, asymptotic
results on the length of longest runs in a Markovian sequence are obtained
A fast and recursive algorithm for clustering large datasets with -medians
Clustering with fast algorithms large samples of high dimensional data is an
important challenge in computational statistics. Borrowing ideas from MacQueen
(1967) who introduced a sequential version of the -means algorithm, a new
class of recursive stochastic gradient algorithms designed for the -medians
loss criterion is proposed. By their recursive nature, these algorithms are
very fast and are well adapted to deal with large samples of data that are
allowed to arrive sequentially. It is proved that the stochastic gradient
algorithm converges almost surely to the set of stationary points of the
underlying loss criterion. A particular attention is paid to the averaged
versions, which are known to have better performances, and a data-driven
procedure that allows automatic selection of the value of the descent step is
proposed.
The performance of the averaged sequential estimator is compared on a
simulation study, both in terms of computation speed and accuracy of the
estimations, with more classical partitioning techniques such as -means,
trimmed -means and PAM (partitioning around medoids). Finally, this new
online clustering technique is illustrated on determining television audience
profiles with a sample of more than 5000 individual television audiences
measured every minute over a period of 24 hours.Comment: Under revision for Computational Statistics and Data Analysi
Variable length Markov chains and dynamical sources
Infinite random sequences of letters can be viewed as stochastic chains or as
strings produced by a source, in the sense of information theory. The
relationship between Variable Length Markov Chains (VLMC) and probabilistic
dynamical sources is studied. We establish a probabilistic frame for context
trees and VLMC and we prove that any VLMC is a dynamical source for which we
explicitly build the mapping. On two examples, the ``comb'' and the ``bamboo
blossom'', we find a necessary and sufficient condition for the existence and
the unicity of a stationary probability measure for the VLMC. These two
examples are detailed in order to provide the associated Dirichlet series as
well as the generating functions of word occurrences.Comment: 45 pages, 15 figure
Attractive regular stochastic chains: perfect simulation and phase transition
We prove that uniqueness of the stationary chain, or equivalently, of the
-measure, compatible with an attractive regular probability kernel is
equivalent to either one of the following two assertions for this chain: (1) it
is a finitary coding of an i.i.d. process with countable alphabet, (2) the
concentration of measure holds at exponential rate. We show in particular that
if a stationary chain is uniquely defined by a kernel that is continuous and
attractive, then this chain can be sampled using a coupling-from-the-past
algorithm. For the original Bramson-Kalikow model we further prove that there
exists a unique compatible chain if and only if the chain is a finitary coding
of a finite alphabet i.i.d. process. Finally, we obtain some partial results on
conditions for phase transition for general chains of infinite order.Comment: 22 pages, 1 pseudo-algorithm, 1 figure. Minor changes in the
presentation. Lemma 6 has been remove
Control of friction forces with stationary wave piezoelectric actuator
"In the field of the research on piezoelectric motors, the control of friction forces by ultrasonic waves was studied mainly from an experimental point of view [1]. The principle of friction force reduction by imposed mechanical vibrations in unlubricated contacts was recently studied in order to reduce the friction losses in an internal combustion engine with Langevin type actuators [2]. This article deals with the advantages of flexural stationary wave piezoelectric actuators in the control of the friction forces thanks to their high pressures generated at high frequencies (>10 Khz). The use of specific contact geometry which is defined by the Hertz theory associated with partial slip contact conditions, allows optimizing the lubrication effect. In the case of piezoelectric torque limiter application, the design and the numerical simulation of dedicated piezoelectric actuator are compared. In agreement with the contact modeling, the characterization of the complete actuator on mechanical test bench validates the ""torque limiter"" function and the optimization of lubrication principle with dedicated contact geometry
Digital search trees and chaos game representation
Version préliminaire (2006) d'un travail publié sous forme définitive (2009).International audienceIn this paper, we consider a possible representation of a DNA sequence in a quaternary tree, in which on can visualize repetitions of subwords. The CGR-tree turns a sequence of letters into a digital search tree (DST), obtained from the suffixes of the reversed sequence. Several results are known concerning the height and the insertion depth for DST built from i.i.d. successive sequences. Here, the successive inserted wors are strongly dependent. We give the asymptotic behaviour of the insertion depth and of the length of branches for the CGR-tree obtained from the suffixes of reversed i.i.d. or Markovian sequence. This behaviour turns out to be at first order the same one as in the case of independent words. As a by-product, asymptotic results on the length of longest runs in a Markovian sequence are obtained
Dynamical Systems in the Analysis of Biological Sequences
The Chaos Game Representation (CGR) maps a sequence of letters taken from a finite alphabet onto the unit square in . While it is a popular tool, few mathematical results have been proved to date. In this report, we show that the CGR gives rise to a limit measure, assuming only the input sequence is stationary ergodic. Some more precise properties are given in the i.i.d. and Markov cases. A new family of statistical tests to characterize the randomness of the inputs is proposed and analyzed. Finally, some basic properties of the CGR are used to generalize the notion of genomic signatur
- …