39 research outputs found

    Deep Reinforcement Learning from Self-Play in Imperfect-Information Games

    Get PDF
    Many real-world applications can be described as large-scale games of imperfect information. To deal with these challenging domains, prior work has focused on computing Nash equilibria in a handcrafted abstraction of the domain. In this paper we introduce the first scalable end-to-end approach to learning approximate Nash equilibria without prior domain knowledge. Our method combines fictitious self-play with deep reinforcement learning. When applied to Leduc poker, Neural Fictitious Self-Play (NFSP) approached a Nash equilibrium, whereas common reinforcement learning methods diverged. In Limit Texas Holdem, a poker game of real-world scale, NFSP learnt a strategy that approached the performance of state-of-the-art, superhuman algorithms based on significant domain expertise.Comment: updated version, incorporating conference feedbac

    Optimizing Execution Cost Using Stochastic Control

    Full text link
    We devise an optimal allocation strategy for the execution of a predefined number of stocks in a given time frame using the technique of discrete-time Stochastic Control Theory for a defined market model. This market structure allows an instant execution of the market orders and has been analyzed based on the assumption of discretized geometric movement of the stock prices. We consider two different cost functions where the first function involves just the fiscal cost while the cost function of the second kind incorporates the risks of non-strategic constrained investments along with fiscal costs. Precisely, the strategic development of constrained execution of K stocks within a stipulated time frame of T units is established mathematically using a well-defined stochastic behaviour of stock prices and the same is compared with some of the commonly-used execution strategies using the historical stock price data

    DeepLOB: Deep Convolutional Neural Networks for Limit Order Books

    Get PDF
    We develop a large-scale deep learning model to predict price movements from limit order book (LOB) data of cash equities. The architecture utilises convolutional filters to capture the spatial structure of the limit order books as well as LSTM modules to capture longer time dependencies. The proposed network outperforms all existing state-of-the-art algorithms on the benchmark LOB dataset [1]. In a more realistic setting, we test our model by using one year market quotes from the London Stock Exchange and the model delivers a remarkably stable out-of-sample prediction accuracy for a variety of instruments. Importantly, our model translates well to instruments which were not part of the training set, indicating the model's ability to extract universal features. In order to better understand these features and to go beyond a "black box" model, we perform a sensitivity analysis to understand the rationale behind the model predictions and reveal the components of LOBs that are most relevant. The ability to extract robust features which translate well to other instruments is an important property of our model which has many other applications.Comment: 12 pages, 9 figure

    Reinforcement Learning Applications in Real Time Trading

    Get PDF
    This study focuses on applying reinforcement learning techniques in real time trading. We first briefly introduce the concept of reinforcement learning, definition of a reward function, and review previous studies as foundations on why reinforcement learning can work, specifically in the setting of financial trading. We demonstrate that it is possible to apply reinforcement learning and output valid and simple profitable trading strategy in a daily setting (one trade a day), and show an example of intraday trading with reinforcement learning. We use a modified Q-learning algorithm in this scenario to optimize trading result. We also interpret the output policy of reinforcement learning, and illustrate that reinforcement learning output is not completely void of economic sense

    Asynchronous Deep Double Duelling Q-Learning for Trading-Signal Execution in Limit Order Book Markets

    Full text link
    We employ deep reinforcement learning (RL) to train an agent to successfully translate a high-frequency trading signal into a trading strategy that places individual limit orders. Based on the ABIDES limit order book simulator, we build a reinforcement learning OpenAI gym environment and utilise it to simulate a realistic trading environment for NASDAQ equities based on historic order book messages. To train a trading agent that learns to maximise its trading return in this environment, we use Deep Duelling Double Q-learning with the APEX (asynchronous prioritised experience replay) architecture. The agent observes the current limit order book state, its recent history, and a short-term directional forecast. To investigate the performance of RL for adaptive trading independently from a concrete forecasting algorithm, we study the performance of our approach utilising synthetic alpha signals obtained by perturbing forward-looking returns with varying levels of noise. Here, we find that the RL agent learns an effective trading strategy for inventory management and order placing that outperforms a heuristic benchmark trading strategy having access to the same signal.Comment: 10 page
    corecore