773 research outputs found

    Benchmarking Deep Reinforcement Learning for Continuous Control

    Get PDF
    Recently, researchers have made significant progress combining the advances in deep learning for learning feature representations with reinforcement learning. Some notable examples include training agents to play Atari games based on raw pixel data and to acquire advanced manipulation skills using raw sensory inputs. However, it has been difficult to quantify progress in the domain of continuous control due to the lack of a commonly adopted benchmark. In this work, we present a benchmark suite of continuous control tasks, including classic tasks like cart-pole swing-up, tasks with very high state and action dimensionality such as 3D humanoid locomotion, tasks with partial observations, and tasks with hierarchical structure. We report novel findings based on the systematic evaluation of a range of implemented reinforcement learning algorithms. Both the benchmark and reference implementations are released at https://github.com/rllab/rllab in order to facilitate experimental reproducibility and to encourage adoption by other researchers.Comment: 14 pages, ICML 201

    InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets

    Full text link
    This paper describes InfoGAN, an information-theoretic extension to the Generative Adversarial Network that is able to learn disentangled representations in a completely unsupervised manner. InfoGAN is a generative adversarial network that also maximizes the mutual information between a small subset of the latent variables and the observation. We derive a lower bound to the mutual information objective that can be optimized efficiently, and show that our training procedure can be interpreted as a variation of the Wake-Sleep algorithm. Specifically, InfoGAN successfully disentangles writing styles from digit shapes on the MNIST dataset, pose from lighting of 3D rendered images, and background digits from the central digit on the SVHN dataset. It also discovers visual concepts that include hair styles, presence/absence of eyeglasses, and emotions on the CelebA face dataset. Experiments show that InfoGAN learns interpretable representations that are competitive with representations learned by existing fully supervised methods

    VIME: Variational Information Maximizing Exploration

    Full text link
    Scalable and effective exploration remains a key challenge in reinforcement learning (RL). While there are methods with optimality guarantees in the setting of discrete state and action spaces, these methods cannot be applied in high-dimensional deep RL scenarios. As such, most contemporary RL relies on simple heuristics such as epsilon-greedy exploration or adding Gaussian noise to the controls. This paper introduces Variational Information Maximizing Exploration (VIME), an exploration strategy based on maximization of information gain about the agent's belief of environment dynamics. We propose a practical implementation, using variational inference in Bayesian neural networks which efficiently handles continuous state and action spaces. VIME modifies the MDP reward function, and can be applied with several different underlying RL algorithms. We demonstrate that VIME achieves significantly better performance compared to heuristic exploration methods across a variety of continuous control tasks and algorithms, including tasks with very sparse rewards.Comment: Published in Advances in Neural Information Processing Systems 29 (NIPS), pages 1109-111

    The bandmerged Planck Early Release Compact Source Catalogue: Probing sub-structure in the molecular gas at high Galactic latitude

    Get PDF
    The Planck Early Release Compact Source Catalogue (ERCSC) includes nine lists of highly reliable sources, individually extracted at each of the nine Planck frequency channels. To facilitate the study of the Planck sources, especially their spectral behaviour across the radio/infrared frequencies, we provide a "bandmerged" catalogue of the ERCSC sources. This catalogue consists of 15191 entries, with 79 sources detected in all nine frequency channels of Planck and 6818 sources detected in only one channel. We describe the bandmerging algorithm, including the various steps used to disentangle sources in confused regions. The multi-frequency matching allows us to develop spectral energy distributions of sources between 30 and 857 GHz, in particular across the 100 GHz band, where the energetically important CO J=1->0 line enters the Planck bandpass. We find ~3-5sigma evidence for contribution to the 100 GHz intensity from foreground CO along the line of sight to 147 sources with |b|>30 deg. The median excess contribution is 4.5+/-0.9 percent of their measured 100 GHz flux density which cannot be explained by calibration or beam uncertainties. This translates to 0.5+/-0.1 K km s^{-1} of CO which must be clumped on the scale of the Planck 100 GHz beam, i.e., ~10 arcmin. If this is due to a population of low mass (~15 Msun) molecular gas clumps, the total mass in these clumps may be more than 2000 Msun. Further, high-spatial-resolution, ground-based observations of the high-latitude sky will help shed light on the origin of this diffuse, clumpy CO emission.Comment: 15 pages, 15 figures, MNRAS in pres

    Residue theorem and summing over Kaluza-Klein excitations

    Full text link
    Applying the equations of motion together with corresponding boundary conditions of bulk profiles at infrared and ultraviolet branes, we verify some lemmas on the eigenvalues of Kaluze-Klein modes in framework of warped extra dimension with the custodial symmetry SU(3)c×SU(2)L×SU(2)R×U(1)X×PLRSU(3)_c\times SU(2)_L\times SU(2)_R\times U(1)_X\times P_{LR}. Using the lemmas and performing properly analytic extensions of bulk profiles, we present the sufficient condition for a convergent series of Kaluze-Klein excitations and sum over the series through the residue theorem. The method can also be applied to sum over the infinite series of Kaluze-Klein excitations in unified extra dimension. Additional, we analyze the possible connection between the propagators in five dimensional full theory and the product of bulk profiles with corresponding propagators of exciting Kaluze-Klein modes in four dimensional effective theory, and recover some relations presented in literature for warped and unified extra dimensions respectively. As an example, we demonstrate that the corrections from neutral Higgs to the Wilson coefficients of relevant operators for BXsγB\rightarrow X_s\gamma contain the suppression factor mb3ms/mw4m_b^3m_s/m_{_{\rm w}}^4 comparing with that from other sectors, thus can be neglected safely.Comment: 44 pages, no figur
    corecore