663 research outputs found

    Distributional Actor-Critic Ensemble for Uncertainty-Aware Continuous Control

    Full text link
    Uncertainty quantification is one of the central challenges for machine learning in real-world applications. In reinforcement learning, an agent confronts two kinds of uncertainty, called epistemic uncertainty and aleatoric uncertainty. Disentangling and evaluating these uncertainties simultaneously stands a chance of improving the agent's final performance, accelerating training, and facilitating quality assurance after deployment. In this work, we propose an uncertainty-aware reinforcement learning algorithm for continuous control tasks that extends the Deep Deterministic Policy Gradient algorithm (DDPG). It exploits epistemic uncertainty to accelerate exploration and aleatoric uncertainty to learn a risk-sensitive policy. We conduct numerical experiments showing that our variant of DDPG outperforms vanilla DDPG without uncertainty estimation in benchmark tasks on robotic control and power-grid optimization.Comment: 10 pages, 6 figures. Accepted to International Joint Conference on Neural Networks (IJCNN 2022), July 18-23, Padua, Ital

    Confronting Reward Model Overoptimization with Constrained RLHF

    Full text link
    Large language models are typically aligned with human preferences by optimizing reward models\textit{reward models} (RMs) fitted to human feedback. However, human preferences are multi-faceted, and it is increasingly common to derive reward from a composition of simpler reward models which each capture a different aspect of language quality. This itself presents a challenge, as it is difficult to appropriately weight these component RMs when combining them. Compounding this difficulty, because any RM is only a proxy for human evaluation, this process is vulnerable to overoptimization\textit{overoptimization}, wherein past a certain point, accumulating higher reward is associated with worse human ratings. In this paper, we perform, to our knowledge, the first study on overoptimization in composite RMs, showing that correlation between component RMs has a significant effect on the locations of these points. We then introduce an approach to solve this issue using constrained reinforcement learning as a means of preventing the agent from exceeding each RM's threshold of usefulness. Our method addresses the problem of weighting component RMs by learning dynamic weights, naturally expressed by Lagrange multipliers. As a result, each RM stays within the range at which it is an effective proxy, improving evaluation performance. Finally, we introduce an adaptive method using gradient-free optimization to identify and optimize towards these points during a single run

    Reinforcement Learning Applied to Trading Systems: A Survey

    Full text link
    Financial domain tasks, such as trading in market exchanges, are challenging and have long attracted researchers. The recent achievements and the consequent notoriety of Reinforcement Learning (RL) have also increased its adoption in trading tasks. RL uses a framework with well-established formal concepts, which raises its attractiveness in learning profitable trading strategies. However, RL use without due attention in the financial area can prevent new researchers from following standards or failing to adopt relevant conceptual guidelines. In this work, we embrace the seminal RL technical fundamentals, concepts, and recommendations to perform a unified, theoretically-grounded examination and comparison of previous research that could serve as a structuring guide for the field of study. A selection of twenty-nine articles was reviewed under our classification that considers RL's most common formulations and design patterns from a large volume of available studies. This classification allowed for precise inspection of the most relevant aspects regarding data input, preprocessing, state and action composition, adopted RL techniques, evaluation setups, and overall results. Our analysis approach organized around fundamental RL concepts allowed for a clear identification of current system design best practices, gaps that require further investigation, and promising research opportunities. Finally, this review attempts to promote the development of this field of study by facilitating researchers' commitment to standards adherence and helping them to avoid straying away from the RL constructs' firm ground.Comment: 38 page
    • …
    corecore