Pair trading is one of the most effective statistical arbitrage strategies
which seeks a neutral profit by hedging a pair of selected assets. Existing
methods generally decompose the task into two separate steps: pair selection
and trading. However, the decoupling of two closely related subtasks can block
information propagation and lead to limited overall performance. For pair
selection, ignoring the trading performance results in the wrong assets being
selected with irrelevant price movements, while the agent trained for trading
can overfit to the selected assets without any historical information of other
assets. To address it, in this paper, we propose a paradigm for automatic pair
trading as a unified task rather than a two-step pipeline. We design a
hierarchical reinforcement learning framework to jointly learn and optimize two
subtasks. A high-level policy would select two assets from all possible
combinations and a low-level policy would then perform a series of trading
actions. Experimental results on real-world stock data demonstrate the
effectiveness of our method on pair trading compared with both existing pair
selection and trading methods.Comment: 10 pages, 6 figure