With the fast development of quantitative portfolio optimization in financial
engineering, lots of promising algorithmic trading strategies have shown
competitive advantages in recent years. However, the environment from real
financial markets is complex and hard to be fully simulated, considering
non-stationarity of the stock data, unpredictable hidden causal factors and so
on. Fortunately, difference of stock prices is often stationary series, and the
internal relationship between difference of stocks can be linked to the
decision-making process, then the portfolio should be able to achieve better
performance. In this paper, we demonstrate normalizing flows is adopted to
simulated high-dimensional joint probability of the complex trading
environment, and develop a novel model based reinforcement learning framework
to better understand the intrinsic mechanisms of quantitative online trading.
Second, we experiment various stocks from three different financial markets
(Dow, NASDAQ and S&P 500) and show that among these three financial markets,
Dow gets the best performance results on various evaluation metrics under our
back-testing system. Especially, our proposed method even resists big drop
(less maximum drawdown) during COVID-19 pandemic period when the financial
market got unpredictable crisis. All these results are comparatively better
than modeling the state transition dynamics with independent Gaussian
Processes. Third, we utilize a causal analysis method to study the causal
relationship among different stocks of the environment. Further, by visualizing
high dimensional state transition data comparisons from real and virtual buffer
with t-SNE, we uncover some effective patterns of betComment: arXiv admin note: text overlap with arXiv:2205.1505