19 research outputs found
SAAC: Safe Reinforcement Learning as an Adversarial Game of Actor-Critics
Although Reinforcement Learning (RL) is effective for sequential
decision-making problems under uncertainty, it still fails to thrive in
real-world systems where risk or safety is a binding constraint. In this paper,
we formulate the RL problem with safety constraints as a non-zero-sum game.
While deployed with maximum entropy RL, this formulation leads to a safe
adversarially guided soft actor-critic framework, called SAAC. In SAAC, the
adversary aims to break the safety constraint while the RL agent aims to
maximize the constrained value function given the adversary's policy. The
safety constraint on the agent's value function manifests only as a repulsion
term between the agent's and the adversary's policies. Unlike previous
approaches, SAAC can address different safety criteria such as safe
exploration, mean-variance risk sensitivity, and CVaR-like coherent risk
sensitivity. We illustrate the design of the adversary for these constraints.
Then, in each of these variations, we show the agent differentiates itself from
the adversary's unsafe actions in addition to learning to solve the task.
Finally, for challenging continuous control tasks, we demonstrate that SAAC
achieves faster convergence, better efficiency, and fewer failures to satisfy
the safety constraints than risk-averse distributional RL and risk-neutral soft
actor-critic algorithms
Robust and Efficient Planning using Adaptive Entropy Tree Search
In this paper, we present the Adaptive EntropyTree Search (ANTS) algorithm.
ANTS builds on recent successes of maximum entropy planning while mitigating
its arguably major drawback - sensitivity to the temperature setting. We endow
ANTS with a mechanism, which adapts the temperature to match a given range of
action selection entropy in the nodes of the planning tree. With this
mechanism, the ANTS planner enjoys remarkable hyper-parameter robustness,
achieves high scores on the Atari benchmark, and is a capable component of a
planning-learning loop akin to AlphaZero. We believe that all these features
make ANTS a compelling choice for a general planner for complex tasks
Efficient Exploration via Epistemic-Risk-Seeking Policy Optimization
Exploration remains a key challenge in deep reinforcement learning (RL).
Optimism in the face of uncertainty is a well-known heuristic with theoretical
guarantees in the tabular setting, but how best to translate the principle to
deep reinforcement learning, which involves online stochastic gradients and
deep network function approximators, is not fully understood. In this paper we
propose a new, differentiable optimistic objective that when optimized yields a
policy that provably explores efficiently, with guarantees even under function
approximation. Our new objective is a zero-sum two-player game derived from
endowing the agent with an epistemic-risk-seeking utility function, which
converts uncertainty into value and encourages the agent to explore uncertain
states. We show that the solution to this game minimizes an upper bound on the
regret, with the 'players' each attempting to minimize one component of a
particular regret decomposition. We derive a new model-free algorithm which we
call 'epistemic-risk-seeking actor-critic' (ERSAC), which is simply an
application of simultaneous stochastic gradient ascent-descent to the game.
Finally, we discuss a recipe for incorporating off-policy data and show that
combining the risk-seeking objective with replay data yields a double benefit
in terms of statistical efficiency. We conclude with some results showing good
performance of a deep RL agent using the technique on the challenging 'DeepSea'
environment, showing significant performance improvements even over other
efficient exploration techniques, as well as improved performance on the Atari
benchmark
Reinforcement learning control of a biomechanical model of the upper extremity
Among the infinite number of possible movements that can be produced, humans
are commonly assumed to choose those that optimize criteria such as minimizing
movement time, subject to certain movement constraints like signal-dependent
and constant motor noise. While so far these assumptions have only been
evaluated for simplified point-mass or planar models, we address the question
of whether they can predict reaching movements in a full skeletal model of the
human upper extremity. We learn a control policy using a motor babbling
approach as implemented in reinforcement learning, using aimed movements of the
tip of the right index finger towards randomly placed 3D targets of varying
size. We use a state-of-the-art biomechanical model, which includes seven
actuated degrees of freedom. To deal with the curse of dimensionality, we use a
simplified second-order muscle model, acting at each degree of freedom instead
of individual muscles. The results confirm that the assumptions of
signal-dependent and constant motor noise, together with the objective of
movement time minimization, are sufficient for a state-of-the-art skeletal
model of the human upper extremity to reproduce complex phenomena of human
movement, in particular Fitts' Law and the 2/3 Power Law. This result supports
the notion that control of the complex human biomechanical system can plausibly
be determined by a set of simple assumptions and can easily be learned.Comment: 19 pages, 7 figure
Guía para el modelo de distribución de especies por Máxima Entropía, estudio de caso de la “lora nuca amarilla” Amazona auropalliata en El Salvador
The aim of this work was to offer a guide for the analysis and interpretation of the model MaxEnt, including the quality requirements to generate solid results and provide researchers the key elements of this powerful ecological tool, in order to improve the conservation and management of the biological diversity of El Salvador. Hence, a potential distributional model of Amazona auropalliata, a species cataloged in danger of extinction in the country, was conducted. The model had an AUC (Area under the Curve) value of 0.856 considered reliable. The variables mean temperature of the wettest month, precipitation of the warmest four-month period and precipitation in the driest period, were mainly contributing to the model. The potential distribution of the species according to the model occurs mainly in the departments of San Salvador, Santa Ana, Ahuachapán, Sonsonate, Usulután and La Libertad. As a result, based on statistical analysis, a bioclimatic profile of the species determined by this contribution will facilitate the development of future studies, including the effects of Climate Change.Esta contribución pretende facilitar una guía para el análisis e interpretación de información modelada con MaxEnt, incluyendo los elementos centrales de calidad para generar resultados robustos de gran utilidad para la conservación y gestión de la diversidad biológica de El Salvador. Para facilitar el proceso de entrenamiento, se efectuó un modelo distribucional de Amazona auropalliata, especie catalogada en peligro de extinción en el país. El modelo obtenido presentó un valor AUC de 0.856 por lo que puede considerarse confiable, con las variables temperatura media del mes más húmedo, la precipitación del cuatrimestre más cálido y la precipitación en el período más seco, aportando en mayor medida al modelo. La distribución potencial de la especie según el modelo, ocurre principalmente en los departamentos de San Salvador, Santa Ana, Ahuachapán, Sonsonate, Usulután y La Libertad. Finalmente, con base en análisis estadísticos, se construyó un perfil bioclimático de la especie determinado por esta contribución, que facilitará el desarrollo de estudios futuros, incluyendo los efectos del Cambio Climátic