663 research outputs found
Distributional Actor-Critic Ensemble for Uncertainty-Aware Continuous Control
Uncertainty quantification is one of the central challenges for machine
learning in real-world applications. In reinforcement learning, an agent
confronts two kinds of uncertainty, called epistemic uncertainty and aleatoric
uncertainty. Disentangling and evaluating these uncertainties simultaneously
stands a chance of improving the agent's final performance, accelerating
training, and facilitating quality assurance after deployment. In this work, we
propose an uncertainty-aware reinforcement learning algorithm for continuous
control tasks that extends the Deep Deterministic Policy Gradient algorithm
(DDPG). It exploits epistemic uncertainty to accelerate exploration and
aleatoric uncertainty to learn a risk-sensitive policy. We conduct numerical
experiments showing that our variant of DDPG outperforms vanilla DDPG without
uncertainty estimation in benchmark tasks on robotic control and power-grid
optimization.Comment: 10 pages, 6 figures. Accepted to International Joint Conference on
Neural Networks (IJCNN 2022), July 18-23, Padua, Ital
Confronting Reward Model Overoptimization with Constrained RLHF
Large language models are typically aligned with human preferences by
optimizing (RMs) fitted to human feedback. However,
human preferences are multi-faceted, and it is increasingly common to derive
reward from a composition of simpler reward models which each capture a
different aspect of language quality. This itself presents a challenge, as it
is difficult to appropriately weight these component RMs when combining them.
Compounding this difficulty, because any RM is only a proxy for human
evaluation, this process is vulnerable to , wherein
past a certain point, accumulating higher reward is associated with worse human
ratings. In this paper, we perform, to our knowledge, the first study on
overoptimization in composite RMs, showing that correlation between component
RMs has a significant effect on the locations of these points. We then
introduce an approach to solve this issue using constrained reinforcement
learning as a means of preventing the agent from exceeding each RM's threshold
of usefulness. Our method addresses the problem of weighting component RMs by
learning dynamic weights, naturally expressed by Lagrange multipliers. As a
result, each RM stays within the range at which it is an effective proxy,
improving evaluation performance. Finally, we introduce an adaptive method
using gradient-free optimization to identify and optimize towards these points
during a single run
Reinforcement Learning Applied to Trading Systems: A Survey
Financial domain tasks, such as trading in market exchanges, are challenging
and have long attracted researchers. The recent achievements and the consequent
notoriety of Reinforcement Learning (RL) have also increased its adoption in
trading tasks. RL uses a framework with well-established formal concepts, which
raises its attractiveness in learning profitable trading strategies. However,
RL use without due attention in the financial area can prevent new researchers
from following standards or failing to adopt relevant conceptual guidelines. In
this work, we embrace the seminal RL technical fundamentals, concepts, and
recommendations to perform a unified, theoretically-grounded examination and
comparison of previous research that could serve as a structuring guide for the
field of study. A selection of twenty-nine articles was reviewed under our
classification that considers RL's most common formulations and design patterns
from a large volume of available studies. This classification allowed for
precise inspection of the most relevant aspects regarding data input,
preprocessing, state and action composition, adopted RL techniques, evaluation
setups, and overall results. Our analysis approach organized around fundamental
RL concepts allowed for a clear identification of current system design best
practices, gaps that require further investigation, and promising research
opportunities. Finally, this review attempts to promote the development of this
field of study by facilitating researchers' commitment to standards adherence
and helping them to avoid straying away from the RL constructs' firm ground.Comment: 38 page
- …