11,064 research outputs found
Reinforcement Learning Applied to Trading Systems: A Survey
Financial domain tasks, such as trading in market exchanges, are challenging
and have long attracted researchers. The recent achievements and the consequent
notoriety of Reinforcement Learning (RL) have also increased its adoption in
trading tasks. RL uses a framework with well-established formal concepts, which
raises its attractiveness in learning profitable trading strategies. However,
RL use without due attention in the financial area can prevent new researchers
from following standards or failing to adopt relevant conceptual guidelines. In
this work, we embrace the seminal RL technical fundamentals, concepts, and
recommendations to perform a unified, theoretically-grounded examination and
comparison of previous research that could serve as a structuring guide for the
field of study. A selection of twenty-nine articles was reviewed under our
classification that considers RL's most common formulations and design patterns
from a large volume of available studies. This classification allowed for
precise inspection of the most relevant aspects regarding data input,
preprocessing, state and action composition, adopted RL techniques, evaluation
setups, and overall results. Our analysis approach organized around fundamental
RL concepts allowed for a clear identification of current system design best
practices, gaps that require further investigation, and promising research
opportunities. Finally, this review attempts to promote the development of this
field of study by facilitating researchers' commitment to standards adherence
and helping them to avoid straying away from the RL constructs' firm ground.Comment: 38 page
The Recurrent Reinforcement Learning Crypto Agent
We demonstrate a novel application of online transfer learning for a digital assets trading agent. This agent uses a powerful feature space representation in the form of an echo state network, the output of which is made available to a direct, recurrent reinforcement learning agent. The agent learns to trade the XBTUSD (Bitcoin versus US Dollars) perpetual swap derivatives contract on BitMEX on an intraday basis. By learning from the multiple sources of impact on the quadratic risk-adjusted utility that it seeks to maximise, the agent avoids excessive over-trading, captures a funding profit, and can predict the market’s direction. Overall, our crypto agent realises a total return of 350%, net of transaction costs, over roughly five years, 71% of which is down to funding profit. The annualised information ratio that it achieves is 1.46
Robust FOREX Trading with Deep Q Network (DQN)
Financial trading is one of the most attractive areas in finance. Trading systems development is not an easy task because it requires extensive knowledge in several areas such as quantitative analysis, financial skills, and computer programming. A trading systems expert, as a human, also brings in their own bias when developing the system. There should be another, more effective way to develop the system using artificial intelligence. The aim of this study was to compare the performance of AI agents to the performance of the buy-and-hold strategy and the expert trader. The tested market consisted of 15 years of the Forex data market, from two currency pairs (EURUSD, USDJPY) obtained from Dukascopy Bank SA Switzerland. Both hypotheses were tested with a paired t-Test at the 0.05 significance level. The findings showed that AI can beat the buy & hold strategy with significant superiority, in FOREX for both currency pairs (EURUSD, USDJPY), and that AI can also significantly outperform CTA (experienced trader) for trading in EURUSD. However, the AI could not significantly outperform CTA for USDJPY trading. Limitations, contributions, and further research were recommended
Deep Reinforcement Learning for Active High Frequency Trading
We introduce the first end-to-end Deep Reinforcement Learning (DRL) based
framework for active high frequency trading. We train DRL agents to trade one
unit of Intel Corporation stock by employing the Proximal Policy Optimization
algorithm. The training is performed on three contiguous months of high
frequency Limit Order Book data, of which the last month constitutes the
validation data. In order to maximise the signal to noise ratio in the training
data, we compose the latter by only selecting training samples with largest
price changes. The test is then carried out on the following month of data.
Hyperparameters are tuned using the Sequential Model Based Optimization
technique. We consider three different state characterizations, which differ in
their LOB-based meta-features. Analysing the agents' performances on test data,
we argue that the agents are able to create a dynamic representation of the
underlying environment. They identify occasional regularities present in the
data and exploit them to create long-term profitable trading strategies.
Indeed, agents learn trading strategies able to produce stable positive returns
in spite of the highly stochastic and non-stationary environment.Comment: 9 pages, 4 figure
- …