16 research outputs found
Linear and Geometric Mixtures - Analysis
Linear and geometric mixtures are two methods to combine arbitrary models in
data compression. Geometric mixtures generalize the empirically well-performing
PAQ7 mixture. Both mixture schemes rely on weight vectors, which heavily
determine their performance. Typically weight vectors are identified via Online
Gradient Descent. In this work we show that one can obtain strong code length
bounds for such a weight estimation scheme. These bounds hold for arbitrary
input sequences. For this purpose we introduce the class of nice mixtures and
analyze how Online Gradient Descent with a fixed step size combined with a nice
mixture performs. These results translate to linear and geometric mixtures,
which are nice, as we show. The results hold for PAQ7 mixtures as well, thus we
provide the first theoretical analysis of PAQ7.Comment: Data Compression Conference (DCC) 201
Leading strategies in competitive on-line prediction
We start from a simple asymptotic result for the problem of on-line
regression with the quadratic loss function: the class of continuous
limited-memory prediction strategies admits a "leading prediction strategy",
which not only asymptotically performs at least as well as any continuous
limited-memory strategy but also satisfies the property that the excess loss of
any continuous limited-memory strategy is determined by how closely it imitates
the leading strategy. More specifically, for any class of prediction strategies
constituting a reproducing kernel Hilbert space we construct a leading
strategy, in the sense that the loss of any prediction strategy whose norm is
not too large is determined by how closely it imitates the leading strategy.
This result is extended to the loss functions given by Bregman divergences and
by strictly proper scoring rules.Comment: 20 pages; a conference version is to appear in the ALT'2006
proceeding
Recursive Aggregation of Estimators by Mirror Descent Algorithm with Averaging
We consider a recursive algorithm to construct an aggregated estimator from a
finite number of base decision rules in the classification problem. The
estimator approximately minimizes a convex risk functional under the
l1-constraint. It is defined by a stochastic version of the mirror descent
algorithm (i.e., of the method which performs gradient descent in the dual
space) with an additional averaging. The main result of the paper is an upper
bound for the expected accuracy of the proposed estimator. This bound is of the
order with an explicit and small constant factor, where
is the dimension of the problem and stands for the sample size. A similar
bound is proved for a more general setting that covers, in particular, the
regression model with squared loss.Comment: 29 pages; mai 200