39 research outputs found
Deep Reinforcement Learning from Self-Play in Imperfect-Information Games
Many real-world applications can be described as large-scale games of
imperfect information. To deal with these challenging domains, prior work has
focused on computing Nash equilibria in a handcrafted abstraction of the
domain. In this paper we introduce the first scalable end-to-end approach to
learning approximate Nash equilibria without prior domain knowledge. Our method
combines fictitious self-play with deep reinforcement learning. When applied to
Leduc poker, Neural Fictitious Self-Play (NFSP) approached a Nash equilibrium,
whereas common reinforcement learning methods diverged. In Limit Texas Holdem,
a poker game of real-world scale, NFSP learnt a strategy that approached the
performance of state-of-the-art, superhuman algorithms based on significant
domain expertise.Comment: updated version, incorporating conference feedbac
Optimizing Execution Cost Using Stochastic Control
We devise an optimal allocation strategy for the execution of a predefined
number of stocks in a given time frame using the technique of discrete-time
Stochastic Control Theory for a defined market model. This market structure
allows an instant execution of the market orders and has been analyzed based on
the assumption of discretized geometric movement of the stock prices. We
consider two different cost functions where the first function involves just
the fiscal cost while the cost function of the second kind incorporates the
risks of non-strategic constrained investments along with fiscal costs.
Precisely, the strategic development of constrained execution of K stocks
within a stipulated time frame of T units is established mathematically using a
well-defined stochastic behaviour of stock prices and the same is compared with
some of the commonly-used execution strategies using the historical stock price
data
DeepLOB: Deep Convolutional Neural Networks for Limit Order Books
We develop a large-scale deep learning model to predict price movements from
limit order book (LOB) data of cash equities. The architecture utilises
convolutional filters to capture the spatial structure of the limit order books
as well as LSTM modules to capture longer time dependencies. The proposed
network outperforms all existing state-of-the-art algorithms on the benchmark
LOB dataset [1]. In a more realistic setting, we test our model by using one
year market quotes from the London Stock Exchange and the model delivers a
remarkably stable out-of-sample prediction accuracy for a variety of
instruments. Importantly, our model translates well to instruments which were
not part of the training set, indicating the model's ability to extract
universal features. In order to better understand these features and to go
beyond a "black box" model, we perform a sensitivity analysis to understand the
rationale behind the model predictions and reveal the components of LOBs that
are most relevant. The ability to extract robust features which translate well
to other instruments is an important property of our model which has many other
applications.Comment: 12 pages, 9 figure
Reinforcement Learning Applications in Real Time Trading
This study focuses on applying reinforcement learning techniques in real time trading. We first briefly introduce the concept of reinforcement learning, definition of a reward function, and review previous studies as foundations on why reinforcement learning can work, specifically in the setting of financial trading. We demonstrate that it is possible to apply reinforcement learning and output valid and simple profitable trading strategy in a daily setting (one trade a day), and show an example of intraday trading with reinforcement learning. We use a modified Q-learning algorithm in this scenario to optimize trading result. We also interpret the output policy of reinforcement learning, and illustrate that reinforcement learning output is not completely void of economic sense
Asynchronous Deep Double Duelling Q-Learning for Trading-Signal Execution in Limit Order Book Markets
We employ deep reinforcement learning (RL) to train an agent to successfully
translate a high-frequency trading signal into a trading strategy that places
individual limit orders. Based on the ABIDES limit order book simulator, we
build a reinforcement learning OpenAI gym environment and utilise it to
simulate a realistic trading environment for NASDAQ equities based on historic
order book messages. To train a trading agent that learns to maximise its
trading return in this environment, we use Deep Duelling Double Q-learning with
the APEX (asynchronous prioritised experience replay) architecture. The agent
observes the current limit order book state, its recent history, and a
short-term directional forecast. To investigate the performance of RL for
adaptive trading independently from a concrete forecasting algorithm, we study
the performance of our approach utilising synthetic alpha signals obtained by
perturbing forward-looking returns with varying levels of noise. Here, we find
that the RL agent learns an effective trading strategy for inventory management
and order placing that outperforms a heuristic benchmark trading strategy
having access to the same signal.Comment: 10 page