30,787 research outputs found
Deep Reinforcement Learning based Patch Selection for Illuminant Estimation
Previous deep learning based approaches to illuminant estimation either resized the raw image to lower resolution or randomly cropped image patches for the deep learning model. However, such practices would inevitably lead to information loss or the selection of noisy patches that would affect estimation accuracy. In this paper, we regard patch selection in neural network based illuminant estimation as a controlling problem of selecting image patches that could help remove noisy patches and improve estimation accuracy. To achieve this, we construct a selection network (SeNet) to learn a patch selection policy. Based on data statistics and the learning progression state of the deep illuminant estimation network (DeNet), the SeNet decides which training patches should be input to the DeNet, which in turn gives feedback to the SeNet for it to update its selection policy. To achieve such interactive and intelligent learning, we utilize a reinforcement learning approach termed policy gradient to optimize the SeNet. We show that the proposed learning strategy can enhance the illuminant estimation accuracy, speed up the convergence and improve the stability of the training process of DeNet. We evaluate our method on two public datasets and demonstrate our method outperforms state-of-the-art approaches
Practical Deep Reinforcement Learning Approach for Stock Trading
Stock trading strategy plays a crucial role in investment companies. However,
it is challenging to obtain optimal strategy in the complex and dynamic stock
market. We explore the potential of deep reinforcement learning to optimize
stock trading strategy and thus maximize investment return. 30 stocks are
selected as our trading stocks and their daily prices are used as the training
and trading market environment. We train a deep reinforcement learning agent
and obtain an adaptive trading strategy. The agent's performance is evaluated
and compared with Dow Jones Industrial Average and the traditional min-variance
portfolio allocation strategy. The proposed deep reinforcement learning
approach is shown to outperform the two baselines in terms of both the Sharpe
ratio and cumulative returns
Explore, Exploit or Listen: Combining Human Feedback and Policy Model to Speed up Deep Reinforcement Learning in 3D Worlds
We describe a method to use discrete human feedback to enhance the
performance of deep learning agents in virtual three-dimensional environments
by extending deep-reinforcement learning to model the confidence and
consistency of human feedback. This enables deep reinforcement learning
algorithms to determine the most appropriate time to listen to the human
feedback, exploit the current policy model, or explore the agent's environment.
Managing the trade-off between these three strategies allows DRL agents to be
robust to inconsistent or intermittent human feedback. Through experimentation
using a synthetic oracle, we show that our technique improves the training
speed and overall performance of deep reinforcement learning in navigating
three-dimensional environments using Minecraft. We further show that our
technique is robust to highly innacurate human feedback and can also operate
when no human feedback is given
- …