3,419 research outputs found
Training Agents using Upside-Down Reinforcement Learning
Traditional Reinforcement Learning (RL) algorithms either predict rewards
with value functions or maximize them using policy search. We study an
alternative: Upside-Down Reinforcement Learning (Upside-Down RL or UDRL), that
solves RL problems primarily using supervised learning techniques. Many of its
main principles are outlined in a companion report [34]. Here we present the
first concrete implementation of UDRL and demonstrate its feasibility on
certain episodic learning problems. Experimental results show that its
performance can be surprisingly competitive with, and even exceed that of
traditional baseline algorithms developed over decades of research.Comment: NNAISENSE Technical Report. 17 pages, 6 figure
Emergence of Locomotion Behaviours in Rich Environments
The reinforcement learning paradigm allows, in principle, for complex
behaviours to be learned directly from simple reward signals. In practice,
however, it is common to carefully hand-design the reward function to encourage
a particular solution, or to derive it from demonstration data. In this paper
explore how a rich environment can help to promote the learning of complex
behavior. Specifically, we train agents in diverse environmental contexts, and
find that this encourages the emergence of robust behaviours that perform well
across a suite of tasks. We demonstrate this principle for locomotion --
behaviours that are known for their sensitivity to the choice of reward. We
train several simulated bodies on a diverse set of challenging terrains and
obstacles, using a simple reward function based on forward progress. Using a
novel scalable variant of policy gradient reinforcement learning, our agents
learn to run, jump, crouch and turn as required by the environment without
explicit reward-based guidance. A visual depiction of highlights of the learned
behavior can be viewed following https://youtu.be/hx_bgoTF7bs
Agent Inspired Trading Using Recurrent Reinforcement Learning and LSTM Neural Networks
With the breakthrough of computational power and deep neural networks, many
areas that we haven't explore with various techniques that was researched
rigorously in past is feasible. In this paper, we will walk through possible
concepts to achieve robo-like trading or advising. In order to accomplish
similar level of performance and generality, like a human trader, our agents
learn for themselves to create successful strategies that lead to the
human-level long-term rewards. The learning model is implemented in Long Short
Term Memory (LSTM) recurrent structures with Reinforcement Learning or
Evolution Strategies acting as agents The robustness and feasibility of the
system is verified on GBPUSD trading
?????? ?????? ??????????????? ?????? ????????????
Department of Computer Science and EngineeringRecently deep reinforcement learning (DRL) algorithms show super human performances in the simulated game domains. In practical points, the sample efficiency is also one of the most important measures to determine the performance of a model. Especially for the environment of large search spaces (e.g. continuous action space), it is very critical condition to achieve the state-of-the-art performance.
In this thesis, we design a model to be applicable to multi-end games in continuous space with high sample efficiency. A multi-end game has several sub-games which are independent each other but affect the result of the game by some rules of its domain. We verify the algorithm in the environment of simulated curling.clos
Towards continuous control of flippers for a multi-terrain robot using deep reinforcement learning
In this paper we focus on developing a control algorithm for multi-terrain
tracked robots with flippers using a reinforcement learning (RL) approach. The
work is based on the deep deterministic policy gradient (DDPG) algorithm,
proven to be very successful in simple simulation environments. The algorithm
works in an end-to-end fashion in order to control the continuous position of
the flippers. This end-to-end approach makes it easy to apply the controller to
a wide array of circumstances, but the huge flexibility comes to the cost of an
increased difficulty of solution. The complexity of the task is enlarged even
more by the fact that real multi-terrain robots move in partially observable
environments. Notwithstanding these complications, being able to smoothly
control a multi-terrain robot can produce huge benefits in impaired people
daily lives or in search and rescue situations.Comment: 12 pages, single column, submitted to International Journal of
Robotics and Automation (IJRA
Poker-CNN: A Pattern Learning Strategy for Making Draws and Bets in Poker Games
Poker is a family of card games that includes many variations. We hypothesize
that most poker games can be solved as a pattern matching problem, and propose
creating a strong poker playing system based on a unified poker representation.
Our poker player learns through iterative self-play, and improves its
understanding of the game by training on the results of its previous actions
without sophisticated domain knowledge. We evaluate our system on three poker
games: single player video poker, two-player Limit Texas Hold'em, and finally
two-player 2-7 triple draw poker. We show that our model can quickly learn
patterns in these very different poker games while it improves from zero
knowledge to a competitive player against human experts.
The contributions of this paper include: (1) a novel representation for poker
games, extendable to different poker variations, (2) a CNN based learning model
that can effectively learn the patterns in three different games, and (3) a
self-trained system that significantly beats the heuristic-based program on
which it is trained, and our system is competitive against human expert
players.Comment: 8 page
Deep reinforcement learning on a multi-asset environment for trading
Financial trading has been widely analyzed for decades with market
participants and academics always looking for advanced methods to improve
trading performance. Deep reinforcement learning (DRL), a recently
reinvigorated method with significant success in multiple domains, still has to
show its benefit in the financial markets. We use a deep Q-network (DQN) to
design long-short trading strategies for futures contracts. The state space
consists of volatility-normalized daily returns, with buying or selling being
the reinforcement learning action and the total reward defined as the
cumulative profits from our actions. Our trading strategy is trained and tested
both on real and simulated price series and we compare the results with an
index benchmark. We analyze how training based on a combination of artificial
data and actual price series can be successfully deployed in real markets. The
trained reinforcement learning agent is applied to trading the E-mini S&P 500
continuous futures contract. Our results in this study are preliminary and need
further improvement
REPLAB: A Reproducible Low-Cost Arm Benchmark Platform for Robotic Learning
Standardized evaluation measures have aided in the progress of machine
learning approaches in disciplines such as computer vision and machine
translation. In this paper, we make the case that robotic learning would also
benefit from benchmarking, and present the "REPLAB" platform for benchmarking
vision-based manipulation tasks. REPLAB is a reproducible and self-contained
hardware stack (robot arm, camera, and workspace) that costs about 2000 USD,
occupies a cuboid of size 70x40x60 cm, and permits full assembly within a few
hours. Through this low-cost, compact design, REPLAB aims to drive wide
participation by lowering the barrier to entry into robotics and to enable easy
scaling to many robots. We envision REPLAB as a framework for reproducible
research across manipulation tasks, and as a step in this direction, we define
a template for a grasping benchmark consisting of a task definition, evaluation
protocol, performance measures, and a dataset of 92k grasp attempts. We
implement, evaluate, and analyze several previously proposed grasping
approaches to establish baselines for this benchmark. Finally, we also
implement and evaluate a deep reinforcement learning approach for 3D reaching
tasks on our REPLAB platform. Project page with assembly instructions, code,
and videos: https://goo.gl/5F9dP4.Comment: Extended version of paper accepted to ICRA 201
Beyond Reward: Offline Preference-guided Policy Optimization
This study focuses on the topic of offline preference-based reinforcement
learning (PbRL), a variant of conventional reinforcement learning that
dispenses with the need for online interaction or specification of reward
functions. Instead, the agent is provided with fixed offline trajectories and
human preferences between pairs of trajectories to extract the dynamics and
task information, respectively. Since the dynamics and task information are
orthogonal, a naive approach would involve using preference-based reward
learning followed by an off-the-shelf offline RL algorithm. However, this
requires the separate learning of a scalar reward function, which is assumed to
be an information bottleneck of the learning process. To address this issue, we
propose the offline preference-guided policy optimization (OPPO) paradigm,
which models offline trajectories and preferences in a one-step process,
eliminating the need for separately learning a reward function. OPPO achieves
this by introducing an offline hindsight information matching objective for
optimizing a contextual policy and a preference modeling objective for finding
the optimal context. OPPO further integrates a well-performing decision policy
by optimizing the two objectives iteratively. Our empirical results demonstrate
that OPPO effectively models offline preferences and outperforms prior
competing baselines, including offline RL algorithms performed over either true
or pseudo reward function specifications. Our code is available on the project
website: https://sites.google.com/view/oppo-icml-2023
Extending Deep Reinforcement Learning Frameworks in Cryptocurrency Market Making
There has been a recent surge in interest in the application of artificial
intelligence to automated trading. Reinforcement learning has been applied to
single- and multi-instrument use cases, such as market making or portfolio
management. This paper proposes a new approach to framing cryptocurrency market
making as a reinforcement learning challenge by introducing an event-based
environment wherein an event is defined as a change in price greater or less
than a given threshold, as opposed to by tick or time-based events (e.g., every
minute, hour, day, etc.). Two policy-based agents are trained to learn a market
making trading strategy using eight days of training data and evaluate their
performance using 30 days of testing data. Limit order book data recorded from
Bitmex exchange is used to validate this approach, which demonstrates improved
profit and stability compared to a time-based approach for both agents when
using a simple multi-layer perceptron neural network for function approximation
and seven different reward functions.Comment: 16 pages, 3 figures, 5 tables, 23 equation
- …