The purpose of this research is to devise a tactic that can closely track the
daily cumulative volume-weighted average price (VWAP) using reinforcement
learning. Previous studies often choose a relatively short trading horizon to
implement their models, making it difficult to accurately track the daily
cumulative VWAP since the variations of financial data are often insignificant
within the short trading horizon. In this paper, we aim to develop a strategy
that can accurately track the daily cumulative VWAP while minimizing the
deviation from the VWAP. We propose a method that leverages the U-shaped
pattern of intraday stock trade volumes and use Proximal Policy Optimization
(PPO) as the learning algorithm. Our method follows a dual-level approach: a
Transformer model that captures the overall(global) distribution of daily
volumes in a U-shape, and a LSTM model that handles the distribution of orders
within smaller(local) time intervals. The results from our experiments suggest
that this dual-level architecture improves the accuracy of approximating the
cumulative VWAP, when compared to previous reinforcement learning-based models.Comment: Submitted to Expert Systems with Applications (Under 2nd review