Search CORE

1,099 research outputs found

Resampled Priors for Variational Autoencoders

Author: Bauer Matthias
Mnih Andriy
Publication venue
Publication date: 01/01/2018
Field of study

We propose Learned Accept/Reject Sampling (LARS), a method for constructing richer priors using rejection sampling with a learned acceptance function. This work is motivated by recent analyses of the VAE objective, which pointed out that commonly used simple priors can lead to underfitting. As the distribution induced by LARS involves an intractable normalizing constant, we show how to estimate it and its gradients efficiently. We demonstrate that LARS priors improve VAE performance on several standard datasets both when they are learned jointly with the rest of the model and when they are fitted to a pretrained model. Finally, we show that LARS can be combined with existing methods for defining flexible priors for an additional boost in performance

arXiv.org e-Print Archive

MPG.PuRe

Conditional Restricted Boltzmann Machines for Structured Output Prediction

Author: Hinton Geoffrey E.
Larochelle Hugo
Mnih Volodymyr
Publication venue
Publication date: 01/01/2011
Field of study

Conditional Restricted Boltzmann Machines (CRBMs) are rich probabilistic models that have recently been applied to a wide range of problems, including collaborative filtering, classification, and modeling motion capture data. While much progress has been made in training non-conditional RBMs, these algorithms are not applicable to conditional models and there has been almost no work on training and generating predictions from conditional RBMs for structured output problems. We first argue that standard Contrastive Divergence-based learning may not be suitable for training CRBMs. We then identify two distinct types of structured output prediction problems and propose an improved learning algorithm for each. The first problem type is one where the output space has arbitrary structure but the set of likely output configurations is relatively small, such as in multi-label classification. The second problem is one where the output space is arbitrarily structured but where the output space variability is much greater, such as in image denoising or pixel labeling. We show that the new learning algorithms can work much better than Contrastive Divergence on both types of problems

arXiv.org e-Print Archive

CiteSeerX

A Policy Search Method For Temporal Logic Specified Reinforcement Learning Tasks

Author: amodei
chebotar
levine
levine
mnih
montgomery
silver
stulp
Publication venue
Publication date: 01/01/2017
Field of study

Reward engineering is an important aspect of reinforcement learning. Whether or not the user's intentions can be correctly encapsulated in the reward function can significantly impact the learning outcome. Current methods rely on manually crafted reward functions that often require parameter tuning to obtain the desired behavior. This operation can be expensive when exploration requires systems to interact with the physical world. In this paper, we explore the use of temporal logic (TL) to specify tasks in reinforcement learning. TL formula can be translated to a real-valued function that measures its level of satisfaction against a trajectory. We take advantage of this function and propose temporal logic policy search (TLPS), a model-free learning technique that finds a policy that satisfies the TL specification. A set of simulated experiments are conducted to evaluate the proposed approach

arXiv.org e-Print Archive

Crossref

Boston University Institutional Repository (OpenBU)

Deep Ordinal Reinforcement Learning

Author: C Wirth
CJ Watkins
RS Sutton
V Mnih
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 11/07/2019
Field of study

Reinforcement learning usually makes use of numerical rewards, which have nice properties but also come with drawbacks and difficulties. Using rewards on an ordinal scale (ordinal rewards) is an alternative to numerical rewards that has received more attention in recent years. In this paper, a general approach to adapting reinforcement learning problems to the use of ordinal rewards is presented and motivated. We show how to convert common reinforcement learning algorithms to an ordinal variation by the example of Q-learning and introduce Ordinal Deep Q-Networks, which adapt deep reinforcement learning to ordinal rewards. Additionally, we run evaluations on problems provided by the OpenAI Gym framework, showing that our ordinal variants exhibit a performance that is comparable to the numerical variations for a number of problems. We also give first evidence that our ordinal variant is able to produce better results for problems with less engineered and simpler-to-design reward signals.Comment: replaced figures for better visibility, added github repository, more details about source of experimental results, updated target value calculation for standard and ordinal Deep Q-Networ

arXiv.org e-Print Archive

Crossref

Using Monte Carlo Search With Data Aggregation to Improve Robot Soccer Policies

Author: D Silver
K Han
K Yasui
P Riley
V Mnih
Publication venue
Publication date: 01/06/2016
Field of study

RoboCup soccer competitions are considered among the most challenging multi-robot adversarial environments, due to their high dynamism and the partial observability of the environment. In this paper we introduce a method based on a combination of Monte Carlo search and data aggregation (MCSDA) to adapt discrete-action soccer policies for a defender robot to the strategy of the opponent team. By exploiting a simple representation of the domain, a supervised learning algorithm is trained over an initial collection of data consisting of several simulations of human expert policies. Monte Carlo policy rollouts are then generated and aggregated to previous data to improve the learned policy over multiple epochs and games. The proposed approach has been extensively tested both on a soccer-dedicated simulator and on real robots. Using this method, our learning robot soccer team achieves an improvement in ball interceptions, as well as a reduction in the number of opponents' goals. Together with a better performance, an overall more efficient positioning of the whole team within the field is achieved

arXiv.org e-Print Archive

Crossref

Archivio della ricerca- Università di Roma La Sapienza

Measuring collaborative emergent behavior in multi-agent reinforcement learning

Author: E Rovira
G Klein
L Matignon
R Parasuraman
V Mnih
Publication venue
Publication date: 23/07/2018
Field of study

Multi-agent reinforcement learning (RL) has important implications for the future of human-agent teaming. We show that improved performance with multi-agent RL is not a guarantee of the collaborative behavior thought to be important for solving multi-agent tasks. To address this, we present a novel approach for quantitatively assessing collaboration in continuous spatial tasks with multi-agent RL. Such a metric is useful for measuring collaboration between computational agents and may serve as a training signal for collaboration in future RL paradigms involving humans.Comment: 1st International Conference on Human Systems Engineering and Design, 6 pages, 2 figures, 1 tabl

arXiv.org e-Print Archive

Crossref