14,323 research outputs found
CrossNorm: Normalization for Off-Policy TD Reinforcement Learning
Off-policy temporal difference (TD) methods are a powerful class of
reinforcement learning (RL) algorithms. Intriguingly, deep off-policy TD
algorithms are not commonly used in combination with feature normalization
techniques, despite positive effects of normalization in other domains. We show
that naive application of existing normalization techniques is indeed not
effective, but that well-designed normalization improves optimization stability
and removes the necessity of target networks. In particular, we introduce a
normalization based on a mixture of on- and off-policy transitions, which we
call cross-normalization. It can be regarded as an extension of batch
normalization that re-centers data for two different distributions, as present
in off-policy learning. Applied to DDPG and TD3, cross-normalization improves
over the state of the art across a range of MuJoCo benchmark tasks
Stabilizing Training of Generative Adversarial Networks through Regularization
Deep generative models based on Generative Adversarial Networks (GANs) have
demonstrated impressive sample quality but in order to work they require a
careful choice of architecture, parameter initialization, and selection of
hyper-parameters. This fragility is in part due to a dimensional mismatch or
non-overlapping support between the model distribution and the data
distribution, causing their density ratio and the associated f-divergence to be
undefined. We overcome this fundamental limitation and propose a new
regularization approach with low computational cost that yields a stable GAN
training procedure. We demonstrate the effectiveness of this regularizer across
several architectures trained on common benchmark image generation tasks. Our
regularization turns GAN models into reliable building blocks for deep
learning
MIHash: Online Hashing with Mutual Information
Learning-based hashing methods are widely used for nearest neighbor
retrieval, and recently, online hashing methods have demonstrated good
performance-complexity trade-offs by learning hash functions from streaming
data. In this paper, we first address a key challenge for online hashing: the
binary codes for indexed data must be recomputed to keep pace with updates to
the hash functions. We propose an efficient quality measure for hash functions,
based on an information-theoretic quantity, mutual information, and use it
successfully as a criterion to eliminate unnecessary hash table updates. Next,
we also show how to optimize the mutual information objective using stochastic
gradient descent. We thus develop a novel hashing method, MIHash, that can be
used in both online and batch settings. Experiments on image retrieval
benchmarks (including a 2.5M image dataset) confirm the effectiveness of our
formulation, both in reducing hash table recomputations and in learning
high-quality hash functions.Comment: International Conference on Computer Vision (ICCV), 201
- …