Dynamic Portfolio Management is a domain that concerns the continuous
redistribution of assets within a portfolio to maximize the total return in a
given period of time. With the recent advancement in machine learning and
artificial intelligence, many efforts have been put in designing and
discovering efficient algorithmic ways to manage the portfolio. This paper
presents two different reinforcement learning agents, policy gradient
actor-critic and evolution strategy. The performance of the two agents is
compared during backtesting. We also discuss the problem set up from state
space design, to state value function approximator and policy control design.
We include the short position to give the agent more flexibility during assets
redistribution and a constant trading cost of 0.25%. The agent is able to
achieve 5% return in 10 days daily trading despite 0.25% trading cost