60 research outputs found
Universal Trading for Order Execution with Oracle Policy Distillation
As a fundamental problem in algorithmic trading, order execution aims at
fulfilling a specific trading order, either liquidation or acquirement, for a
given instrument. Towards effective execution strategy, recent years have
witnessed the shift from the analytical view with model-based market
assumptions to model-free perspective, i.e., reinforcement learning, due to its
nature of sequential decision optimization. However, the noisy and yet
imperfect market information that can be leveraged by the policy has made it
quite challenging to build up sample efficient reinforcement learning methods
to achieve effective order execution. In this paper, we propose a novel
universal trading policy optimization framework to bridge the gap between the
noisy yet imperfect market states and the optimal action sequences for order
execution. Particularly, this framework leverages a policy distillation method
that can better guide the learning of the common policy towards practically
optimal execution by an oracle teacher with perfect information to approximate
the optimal trading strategy. The extensive experiments have shown significant
improvements of our method over various strong baselines, with reasonable
trading actions.Comment: Accepted in AAAI 2021, the code and the supplementary materials are
in https://seqml.github.io/opd
Policy Gradient Methods for the Noisy Linear Quadratic Regulator over a Finite Horizon
We explore reinforcement learning methods for finding the optimal policy in
the linear quadratic regulator (LQR) problem. In particular, we consider the
convergence of policy gradient methods in the setting of known and unknown
parameters. We are able to produce a global linear convergence guarantee for
this approach in the setting of finite time horizon and stochastic state
dynamics under weak assumptions. The convergence of a projected policy gradient
method is also established in order to handle problems with constraints. We
illustrate the performance of the algorithm with two examples. The first
example is the optimal liquidation of a holding in an asset. We show results
for the case where we assume a model for the underlying dynamics and where we
apply the method to the data directly. The empirical evidence suggests that the
policy gradient method can learn the global optimal solution for a larger class
of stochastic systems containing the LQR framework and that it is more robust
with respect to model mis-specification when compared to a model-based
approach. The second example is an LQR system in a higher dimensional setting
with synthetic data.Comment: 49 pages, 9 figure
Learn Continuously, Act Discretely: Hybrid Action-Space Reinforcement Learning For Optimal Execution
Optimal execution is a sequential decision-making problem for cost-saving in
algorithmic trading. Studies have found that reinforcement learning (RL) can
help decide the order-splitting sizes. However, a problem remains unsolved: how
to place limit orders at appropriate limit prices? The key challenge lies in
the "continuous-discrete duality" of the action space. On the one hand, the
continuous action space using percentage changes in prices is preferred for
generalization. On the other hand, the trader eventually needs to choose limit
prices discretely due to the existence of the tick size, which requires
specialization for every single stock with different characteristics (e.g., the
liquidity and the price range). So we need continuous control for
generalization and discrete control for specialization. To this end, we propose
a hybrid RL method to combine the advantages of both of them. We first use a
continuous control agent to scope an action subset, then deploy a fine-grained
agent to choose a specific limit price. Extensive experiments show that our
method has higher sample efficiency and better training stability than existing
RL algorithms and significantly outperforms previous learning-based methods for
order execution
An Adaptive Dual-level Reinforcement Learning Approach for Optimal Trade Execution
The purpose of this research is to devise a tactic that can closely track the
daily cumulative volume-weighted average price (VWAP) using reinforcement
learning. Previous studies often choose a relatively short trading horizon to
implement their models, making it difficult to accurately track the daily
cumulative VWAP since the variations of financial data are often insignificant
within the short trading horizon. In this paper, we aim to develop a strategy
that can accurately track the daily cumulative VWAP while minimizing the
deviation from the VWAP. We propose a method that leverages the U-shaped
pattern of intraday stock trade volumes and use Proximal Policy Optimization
(PPO) as the learning algorithm. Our method follows a dual-level approach: a
Transformer model that captures the overall(global) distribution of daily
volumes in a U-shape, and a LSTM model that handles the distribution of orders
within smaller(local) time intervals. The results from our experiments suggest
that this dual-level architecture improves the accuracy of approximating the
cumulative VWAP, when compared to previous reinforcement learning-based models.Comment: Submitted to Expert Systems with Applications (Under 2nd review
Relative entropy-regularized robust optimal order execution
The problem of order execution is cast as a relative entropy-regularized
robust optimal control problem in this article. The order execution agent's
goal is to maximize an objective functional associated with his profit-and-loss
of trading and simultaneously minimize the execution risk and the market's
liquidity and uncertainty. We model the market's liquidity and uncertainty by
the principle of least relative entropy associated with the market volume. The
problem of order execution is made into a relative entropy-regularized
stochastic differential game. Standard argument of dynamic programming yields
that the value function of the differential game satisfies a relative
entropy-regularized Hamilton-Jacobi-Isaacs (rHJI) equation. Under the
assumptions of linear-quadratic model with Gaussian prior, the rHJI equation
reduces to a system of Riccati and linear differential equations. Further
imposing constancy of the corresponding coefficients, the system of
differential equations can be solved in closed form, resulting in analytical
expressions for optimal strategy and trajectory as well as the posterior
distribution of market volume. Numerical examples illustrating the optimal
strategies and the comparisons with conventional trading strategies are
conducted.Comment: 32 pages, 8 figure
- …