13,095 research outputs found
Fast rates in statistical and online learning
The speed with which a learning algorithm converges as it is presented with
more data is a central problem in machine learning --- a fast rate of
convergence means less data is needed for the same level of performance. The
pursuit of fast rates in online and statistical learning has led to the
discovery of many conditions in learning theory under which fast learning is
possible. We show that most of these conditions are special cases of a single,
unifying condition, that comes in two forms: the central condition for 'proper'
learning algorithms that always output a hypothesis in the given model, and
stochastic mixability for online algorithms that may make predictions outside
of the model. We show that under surprisingly weak assumptions both conditions
are, in a certain sense, equivalent. The central condition has a
re-interpretation in terms of convexity of a set of pseudoprobabilities,
linking it to density estimation under misspecification. For bounded losses, we
show how the central condition enables a direct proof of fast rates and we
prove its equivalence to the Bernstein condition, itself a generalization of
the Tsybakov margin condition, both of which have played a central role in
obtaining fast rates in statistical learning. Yet, while the Bernstein
condition is two-sided, the central condition is one-sided, making it more
suitable to deal with unbounded losses. In its stochastic mixability form, our
condition generalizes both a stochastic exp-concavity condition identified by
Juditsky, Rigollet and Tsybakov and Vovk's notion of mixability. Our unifying
conditions thus provide a substantial step towards a characterization of fast
rates in statistical learning, similar to how classical mixability
characterizes constant regret in the sequential prediction with expert advice
setting.Comment: 69 pages, 3 figure
Asymptotic equivalence and adaptive estimation for robust nonparametric regression
Asymptotic equivalence theory developed in the literature so far are only for
bounded loss functions. This limits the potential applications of the theory
because many commonly used loss functions in statistical inference are
unbounded. In this paper we develop asymptotic equivalence results for robust
nonparametric regression with unbounded loss functions. The results imply that
all the Gaussian nonparametric regression procedures can be robustified in a
unified way. A key step in our equivalence argument is to bin the data and then
take the median of each bin. The asymptotic equivalence results have
significant practical implications. To illustrate the general principles of the
equivalence argument we consider two important nonparametric inference
problems: robust estimation of the regression function and the estimation of a
quadratic functional. In both cases easily implementable procedures are
constructed and are shown to enjoy simultaneously a high degree of robustness
and adaptivity. Other problems such as construction of confidence sets and
nonparametric hypothesis testing can be handled in a similar fashion.Comment: Published in at http://dx.doi.org/10.1214/08-AOS681 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
A Tight Excess Risk Bound via a Unified PAC-Bayesian-Rademacher-Shtarkov-MDL Complexity
We present a novel notion of complexity that interpolates between and
generalizes some classic existing complexity notions in learning theory: for
estimators like empirical risk minimization (ERM) with arbitrary bounded
losses, it is upper bounded in terms of data-independent Rademacher complexity;
for generalized Bayesian estimators, it is upper bounded by the data-dependent
information complexity (also known as stochastic or PAC-Bayesian,
complexity. For
(penalized) ERM, the new complexity reduces to (generalized) normalized maximum
likelihood (NML) complexity, i.e. a minimax log-loss individual-sequence
regret. Our first main result bounds excess risk in terms of the new
complexity. Our second main result links the new complexity via Rademacher
complexity to entropy, thereby generalizing earlier results of Opper,
Haussler, Lugosi, and Cesa-Bianchi who did the log-loss case with .
Together, these results recover optimal bounds for VC- and large (polynomial
entropy) classes, replacing localized Rademacher complexity by a simpler
analysis which almost completely separates the two aspects that determine the
achievable rates: 'easiness' (Bernstein) conditions and model complexity.Comment: 38 page
Wireless model-based predictive networked control system over cooperative wireless network
Owing to their distributed architecture, networked control systems (NCSs) are proven to be feasible in scenarios where a spatially distributed feedback control system is required. Traditionally, such NCSs operate over real-time wired networks. Recently, in order to achieve the utmost flexibility, scalability, ease of deployment, and maintainability, wireless networks such as IEEE 802.11 wireless local area networks (LANs) are being preferred over dedicated wired networks. However, conventional NCSs with event-triggered controllers and actuators cannot operate over such general purpose wireless networks since the stability of the system is compromised due to unbounded delays and unpredictable packet losses that are typical in the wireless medium. Approaching the wireless networked control problem from two perspectives, this work introduces a practical wireless NCS and an implementation of a cooperative medium access control protocol that work jointly to achieve decent control under severe impairments, such as unbounded delay, bursts of packet loss and ambient wireless traffic. The proposed system is evaluated on a dedicated test platform under numerous scenarios and significant performance gains are observed, making cooperative communications a strong candidate for improving the reliability of industrial wireless networks
Fast learning rates in statistical inference through aggregation
We develop minimax optimal risk bounds for the general learning task
consisting in predicting as well as the best function in a reference set
up to the smallest possible additive term, called the convergence
rate. When the reference set is finite and when denotes the size of the
training data, we provide minimax convergence rates of the form
with tight evaluation of the positive
constant and with exact , the latter value depending on the
convexity of the loss function and on the level of noise in the output
distribution. The risk upper bounds are based on a sequential randomized
algorithm, which at each step concentrates on functions having both low risk
and low variance with respect to the previous step prediction function. Our
analysis puts forward the links between the probabilistic and worst-case
viewpoints, and allows to obtain risk bounds unachievable with the standard
statistical learning approach. One of the key ideas of this work is to use
probabilistic inequalities with respect to appropriate (Gibbs) distributions on
the prediction function space instead of using them with respect to the
distribution generating the data. The risk lower bounds are based on
refinements of the Assouad lemma taking particularly into account the
properties of the loss function. Our key example to illustrate the upper and
lower bounds is to consider the -regression setting for which an
exhaustive analysis of the convergence rates is given while ranges in
.Comment: Published in at http://dx.doi.org/10.1214/08-AOS623 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Dynamic Metric Learning from Pairwise Comparisons
Recent work in distance metric learning has focused on learning
transformations of data that best align with specified pairwise similarity and
dissimilarity constraints, often supplied by a human observer. The learned
transformations lead to improved retrieval, classification, and clustering
algorithms due to the better adapted distance or similarity measures. Here, we
address the problem of learning these transformations when the underlying
constraint generation process is nonstationary. This nonstationarity can be due
to changes in either the ground-truth clustering used to generate constraints
or changes in the feature subspaces in which the class structure is apparent.
We propose Online Convex Ensemble StrongLy Adaptive Dynamic Learning (OCELAD),
a general adaptive, online approach for learning and tracking optimal metrics
as they change over time that is highly robust to a variety of nonstationary
behaviors in the changing metric. We apply the OCELAD framework to an ensemble
of online learners. Specifically, we create a retro-initialized composite
objective mirror descent (COMID) ensemble (RICE) consisting of a set of
parallel COMID learners with different learning rates, demonstrate RICE-OCELAD
on both real and synthetic data sets and show significant performance
improvements relative to previously proposed batch and online distance metric
learning algorithms.Comment: to appear Allerton 2016. arXiv admin note: substantial text overlap
with arXiv:1603.0367
Damping mechanisms for oscillations in solar prominences
Small amplitude oscillations are a commonly observed feature in
prominences/filaments. These oscillations appear to be of local nature, are
associated to the fine structure of prominence plasmas, and simultaneous flows
and counterflows are also present. The existing observational evidence reveals
that small amplitude oscillations, after excited, are damped in short spatial
and temporal scales by some as yet not well determined physical mechanism(s).
Commonly, these oscillations have been interpreted in terms of linear
magnetohydrodynamic (MHD) waves, and this paper reviews the theoretical damping
mechanisms that have been recently put forward in order to explain the observed
attenuation scales. These mechanisms include thermal effects, through
non-adiabatic processes, mass flows, resonant damping in non-uniform media, and
partial ionization effects. The relevance of each mechanism is assessed by
comparing the spatial and time scales produced by each of them with those
obtained from observations. Also, the application of the latest theoretical
results to perform prominence seismology is discussed, aiming to determine
physical parameters in prominence plasmas that are difficult to measure by
direct means.Comment: 36 pages, 16 figures, Space Science Reviews (accepted
- …