202 research outputs found
CIM: Constrained Intrinsic Motivation for Sparse-Reward Continuous Control
Intrinsic motivation is a promising exploration technique for solving
reinforcement learning tasks with sparse or absent extrinsic rewards. There
exist two technical challenges in implementing intrinsic motivation: 1) how to
design a proper intrinsic objective to facilitate efficient exploration; and 2)
how to combine the intrinsic objective with the extrinsic objective to help
find better solutions. In the current literature, the intrinsic objectives are
all designed in a task-agnostic manner and combined with the extrinsic
objective via simple addition (or used by itself for reward-free pre-training).
In this work, we show that these designs would fail in typical sparse-reward
continuous control tasks. To address the problem, we propose Constrained
Intrinsic Motivation (CIM) to leverage readily attainable task priors to
construct a constrained intrinsic objective, and at the same time, exploit the
Lagrangian method to adaptively balance the intrinsic and extrinsic objectives
via a simultaneous-maximization framework. We empirically show, on multiple
sparse-reward continuous control tasks, that our CIM approach achieves greatly
improved performance and sample efficiency over state-of-the-art methods.
Moreover, the key techniques of our CIM can also be plugged into existing
methods to boost their performances
An adaptive nearest neighbor rule for classification
We introduce a variant of the -nearest neighbor classifier in which is
chosen adaptively for each query, rather than supplied as a parameter. The
choice of depends on properties of each neighborhood, and therefore may
significantly vary between different points. (For example, the algorithm will
use larger for predicting the labels of points in noisy regions.)
We provide theory and experiments that demonstrate that the algorithm
performs comparably to, and sometimes better than, -NN with an optimal
choice of . In particular, we derive bounds on the convergence rates of our
classifier that depend on a local quantity we call the `advantage' which is
significantly weaker than the Lipschitz conditions used in previous convergence
rate proofs. These generalization bounds hinge on a variant of the seminal
Uniform Convergence Theorem due to Vapnik and Chervonenkis; this variant
concerns conditional probabilities and may be of independent interest
- …