773 research outputs found
Benchmarking Deep Reinforcement Learning for Continuous Control
Recently, researchers have made significant progress combining the advances
in deep learning for learning feature representations with reinforcement
learning. Some notable examples include training agents to play Atari games
based on raw pixel data and to acquire advanced manipulation skills using raw
sensory inputs. However, it has been difficult to quantify progress in the
domain of continuous control due to the lack of a commonly adopted benchmark.
In this work, we present a benchmark suite of continuous control tasks,
including classic tasks like cart-pole swing-up, tasks with very high state and
action dimensionality such as 3D humanoid locomotion, tasks with partial
observations, and tasks with hierarchical structure. We report novel findings
based on the systematic evaluation of a range of implemented reinforcement
learning algorithms. Both the benchmark and reference implementations are
released at https://github.com/rllab/rllab in order to facilitate experimental
reproducibility and to encourage adoption by other researchers.Comment: 14 pages, ICML 201
InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets
This paper describes InfoGAN, an information-theoretic extension to the
Generative Adversarial Network that is able to learn disentangled
representations in a completely unsupervised manner. InfoGAN is a generative
adversarial network that also maximizes the mutual information between a small
subset of the latent variables and the observation. We derive a lower bound to
the mutual information objective that can be optimized efficiently, and show
that our training procedure can be interpreted as a variation of the Wake-Sleep
algorithm. Specifically, InfoGAN successfully disentangles writing styles from
digit shapes on the MNIST dataset, pose from lighting of 3D rendered images,
and background digits from the central digit on the SVHN dataset. It also
discovers visual concepts that include hair styles, presence/absence of
eyeglasses, and emotions on the CelebA face dataset. Experiments show that
InfoGAN learns interpretable representations that are competitive with
representations learned by existing fully supervised methods
VIME: Variational Information Maximizing Exploration
Scalable and effective exploration remains a key challenge in reinforcement
learning (RL). While there are methods with optimality guarantees in the
setting of discrete state and action spaces, these methods cannot be applied in
high-dimensional deep RL scenarios. As such, most contemporary RL relies on
simple heuristics such as epsilon-greedy exploration or adding Gaussian noise
to the controls. This paper introduces Variational Information Maximizing
Exploration (VIME), an exploration strategy based on maximization of
information gain about the agent's belief of environment dynamics. We propose a
practical implementation, using variational inference in Bayesian neural
networks which efficiently handles continuous state and action spaces. VIME
modifies the MDP reward function, and can be applied with several different
underlying RL algorithms. We demonstrate that VIME achieves significantly
better performance compared to heuristic exploration methods across a variety
of continuous control tasks and algorithms, including tasks with very sparse
rewards.Comment: Published in Advances in Neural Information Processing Systems 29
(NIPS), pages 1109-111
The bandmerged Planck Early Release Compact Source Catalogue: Probing sub-structure in the molecular gas at high Galactic latitude
The Planck Early Release Compact Source Catalogue (ERCSC) includes nine lists
of highly reliable sources, individually extracted at each of the nine Planck
frequency channels. To facilitate the study of the Planck sources, especially
their spectral behaviour across the radio/infrared frequencies, we provide a
"bandmerged" catalogue of the ERCSC sources. This catalogue consists of 15191
entries, with 79 sources detected in all nine frequency channels of Planck and
6818 sources detected in only one channel. We describe the bandmerging
algorithm, including the various steps used to disentangle sources in confused
regions. The multi-frequency matching allows us to develop spectral energy
distributions of sources between 30 and 857 GHz, in particular across the 100
GHz band, where the energetically important CO J=1->0 line enters the Planck
bandpass. We find ~3-5sigma evidence for contribution to the 100 GHz intensity
from foreground CO along the line of sight to 147 sources with |b|>30 deg. The
median excess contribution is 4.5+/-0.9 percent of their measured 100 GHz flux
density which cannot be explained by calibration or beam uncertainties. This
translates to 0.5+/-0.1 K km s^{-1} of CO which must be clumped on the scale of
the Planck 100 GHz beam, i.e., ~10 arcmin. If this is due to a population of
low mass (~15 Msun) molecular gas clumps, the total mass in these clumps may be
more than 2000 Msun. Further, high-spatial-resolution, ground-based
observations of the high-latitude sky will help shed light on the origin of
this diffuse, clumpy CO emission.Comment: 15 pages, 15 figures, MNRAS in pres
Residue theorem and summing over Kaluza-Klein excitations
Applying the equations of motion together with corresponding boundary
conditions of bulk profiles at infrared and ultraviolet branes, we verify some
lemmas on the eigenvalues of Kaluze-Klein modes in framework of warped extra
dimension with the custodial symmetry . Using the lemmas and performing properly
analytic extensions of bulk profiles, we present the sufficient condition for a
convergent series of Kaluze-Klein excitations and sum over the series through
the residue theorem. The method can also be applied to sum over the infinite
series of Kaluze-Klein excitations in unified extra dimension. Additional, we
analyze the possible connection between the propagators in five dimensional
full theory and the product of bulk profiles with corresponding propagators of
exciting Kaluze-Klein modes in four dimensional effective theory, and recover
some relations presented in literature for warped and unified extra dimensions
respectively. As an example, we demonstrate that the corrections from neutral
Higgs to the Wilson coefficients of relevant operators for contain the suppression factor comparing
with that from other sectors, thus can be neglected safely.Comment: 44 pages, no figur
- …