Search CORE

60 research outputs found

Universal Trading for Order Execution with Oracle Policy Distillation

Author: Bian Jiang
Fang Yuchen
Liu Tie-Yan
Liu Weiqing
Ren Kan
Yu Yong
Zhang Weinan
Zhou Dong
Publication venue
Publication date: 28/01/2021
Field of study

As a fundamental problem in algorithmic trading, order execution aims at fulfilling a specific trading order, either liquidation or acquirement, for a given instrument. Towards effective execution strategy, recent years have witnessed the shift from the analytical view with model-based market assumptions to model-free perspective, i.e., reinforcement learning, due to its nature of sequential decision optimization. However, the noisy and yet imperfect market information that can be leveraged by the policy has made it quite challenging to build up sample efficient reinforcement learning methods to achieve effective order execution. In this paper, we propose a novel universal trading policy optimization framework to bridge the gap between the noisy yet imperfect market states and the optimal action sequences for order execution. Particularly, this framework leverages a policy distillation method that can better guide the learning of the common policy towards practically optimal execution by an oracle teacher with perfect information to approximate the optimal trading strategy. The extensive experiments have shown significant improvements of our method over various strong baselines, with reasonable trading actions.Comment: Accepted in AAAI 2021, the code and the supplementary materials are in https://seqml.github.io/opd

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Policy Gradient Methods for the Noisy Linear Quadratic Regulator over a Finite Horizon

Author: Hambly Ben
Xu Renyuan
Yang Huining
Publication venue
Publication date: 01/01/2021
Field of study

We explore reinforcement learning methods for finding the optimal policy in the linear quadratic regulator (LQR) problem. In particular, we consider the convergence of policy gradient methods in the setting of known and unknown parameters. We are able to produce a global linear convergence guarantee for this approach in the setting of finite time horizon and stochastic state dynamics under weak assumptions. The convergence of a projected policy gradient method is also established in order to handle problems with constraints. We illustrate the performance of the algorithm with two examples. The first example is the optimal liquidation of a holding in an asset. We show results for the case where we assume a model for the underlying dynamics and where we apply the method to the data directly. The empirical evidence suggests that the policy gradient method can learn the global optimal solution for a larger class of stochastic systems containing the LQR framework and that it is more robust with respect to model mis-specification when compared to a model-based approach. The second example is an LQR system in a higher dimensional setting with synthetic data.Comment: 49 pages, 9 figure

arXiv.org e-Print Archive

Oxford University Research Archive

Learn Continuously, Act Discretely: Hybrid Action-Space Reinforcement Learning For Optimal Execution

Author: He Jia
Liu Shuoling
Luo Ling
Pan Feiyang
Zhang Tongzhe
Publication venue
Publication date: 22/07/2022
Field of study

Optimal execution is a sequential decision-making problem for cost-saving in algorithmic trading. Studies have found that reinforcement learning (RL) can help decide the order-splitting sizes. However, a problem remains unsolved: how to place limit orders at appropriate limit prices? The key challenge lies in the "continuous-discrete duality" of the action space. On the one hand, the continuous action space using percentage changes in prices is preferred for generalization. On the other hand, the trader eventually needs to choose limit prices discretely due to the existence of the tick size, which requires specialization for every single stock with different characteristics (e.g., the liquidity and the price range). So we need continuous control for generalization and discrete control for specialization. To this end, we propose a hybrid RL method to combine the advantages of both of them. We first use a continuous control agent to scope an action subset, then deploy a fine-grained agent to choose a specific limit price. Extensive experiments show that our method has higher sample efficiency and better training stability than existing RL algorithms and significantly outperforms previous learning-based methods for order execution

arXiv.org e-Print Archive

An Adaptive Dual-level Reinforcement Learning Approach for Optimal Trade Execution

Author: Hong Youngjoon
Kim Jimyeong
Kim Soohan
Sul Hong Kee
Publication venue
Publication date: 20/07/2023
Field of study

The purpose of this research is to devise a tactic that can closely track the daily cumulative volume-weighted average price (VWAP) using reinforcement learning. Previous studies often choose a relatively short trading horizon to implement their models, making it difficult to accurately track the daily cumulative VWAP since the variations of financial data are often insignificant within the short trading horizon. In this paper, we aim to develop a strategy that can accurately track the daily cumulative VWAP while minimizing the deviation from the VWAP. We propose a method that leverages the U-shaped pattern of intraday stock trade volumes and use Proximal Policy Optimization (PPO) as the learning algorithm. Our method follows a dual-level approach: a Transformer model that captures the overall(global) distribution of daily volumes in a U-shape, and a LSTM model that handles the distribution of orders within smaller(local) time intervals. The results from our experiments suggest that this dual-level architecture improves the accuracy of approximating the cumulative VWAP, when compared to previous reinforcement learning-based models.Comment: Submitted to Expert Systems with Applications (Under 2nd review

arXiv.org e-Print Archive

Relative entropy-regularized robust optimal order execution

Author: Wang Meng
Wang Tai-Ho
Publication venue
Publication date: 10/11/2023
Field of study

The problem of order execution is cast as a relative entropy-regularized robust optimal control problem in this article. The order execution agent's goal is to maximize an objective functional associated with his profit-and-loss of trading and simultaneously minimize the execution risk and the market's liquidity and uncertainty. We model the market's liquidity and uncertainty by the principle of least relative entropy associated with the market volume. The problem of order execution is made into a relative entropy-regularized stochastic differential game. Standard argument of dynamic programming yields that the value function of the differential game satisfies a relative entropy-regularized Hamilton-Jacobi-Isaacs (rHJI) equation. Under the assumptions of linear-quadratic model with Gaussian prior, the rHJI equation reduces to a system of Riccati and linear differential equations. Further imposing constancy of the corresponding coefficients, the system of differential equations can be solved in closed form, resulting in analytical expressions for optimal strategy and trajectory as well as the posterior distribution of market volume. Numerical examples illustrating the optimal strategies and the comparisons with conventional trading strategies are conducted.Comment: 32 pages, 8 figure

arXiv.org e-Print Archive