165 research outputs found
Analysis of Q-learning with Adaptation and Momentum Restart for Gradient Descent
Existing convergence analyses of Q-learning mostly focus on the vanilla
stochastic gradient descent (SGD) type of updates. Despite the Adaptive Moment
Estimation (Adam) has been commonly used for practical Q-learning algorithms,
there has not been any convergence guarantee provided for Q-learning with such
type of updates. In this paper, we first characterize the convergence rate for
Q-AMSGrad, which is the Q-learning algorithm with AMSGrad update (a commonly
adopted alternative of Adam for theoretical analysis). To further improve the
performance, we propose to incorporate the momentum restart scheme to
Q-AMSGrad, resulting in the so-called Q-AMSGradR algorithm. The convergence
rate of Q-AMSGradR is also established. Our experiments on a linear quadratic
regulator problem show that the two proposed Q-learning algorithms outperform
the vanilla Q-learning with SGD updates. The two algorithms also exhibit
significantly better performance than the DQN learning method over a batch of
Atari 2600 games.Comment: This paper extends the work presented at the 2020 International Joint
Conferences on Artificial Intelligence with supplementary material
Rethink the Adversarial Scenario-based Safety Testing of Robots: the Comparability and Optimal Aggressiveness
This paper studies the class of scenario-based safety testing algorithms in
the black-box safety testing configuration. For algorithms sharing the same
state-action set coverage with different sampling distributions, it is commonly
believed that prioritizing the exploration of high-risk state-actions leads to
a better sampling efficiency. Our proposal disputes the above intuition by
introducing an impossibility theorem that provably shows all safety testing
algorithms of the aforementioned difference perform equally well with the same
expected sampling efficiency. Moreover, for testing algorithms covering
different sets of state-actions, the sampling efficiency criterion is no longer
applicable as different algorithms do not necessarily converge to the same
termination condition. We then propose a testing aggressiveness definition
based on the almost safe set concept along with an unbiased and efficient
algorithm that compares the aggressiveness between testing algorithms.
Empirical observations from the safety testing of bipedal locomotion
controllers and vehicle decision-making modules are also presented to support
the proposed theoretical implications and methodologies
On Safety Testing, Validation, and Characterization with Scenario-Sampling: A Case Study of Legged Robots
The dynamic response of the legged robot locomotion is non-Lipschitz and can
be stochastic due to environmental uncertainties. To test, validate, and
characterize the safety performance of legged robots, existing solutions on
observed and inferred risk can be incomplete and sampling inefficient. Some
formal verification methods suffer from the model precision and other surrogate
assumptions. In this paper, we propose a scenario sampling based testing
framework that characterizes the overall safety performance of a legged robot
by specifying (i) where (in terms of a set of states) the robot is potentially
safe, and (ii) how safe the robot is within the specified set. The framework
can also help certify the commercial deployment of the legged robot in
real-world environment along with human and compare safety performance among
legged robots with different mechanical structures and dynamic properties. The
proposed framework is further deployed to evaluate a group of state-of-the-art
legged robot locomotion controllers from various model-based, deep neural
network involved, and reinforcement learning based methods in the literature.
Among a series of intended work domains of the studied legged robots (e.g.
tracking speed on sloped surface, with abrupt changes on demanded velocity, and
against adversarial push-over disturbances), we show that the method can
adequately capture the overall safety characterization and the subtle
performance insights. Many of the observed safety outcomes, to the best of our
knowledge, have never been reported by the existing work in the legged robot
literature
- …