13,537 research outputs found
Signal Temporal Logic Neural Predictive Control
Ensuring safety and meeting temporal specifications are critical challenges
for long-term robotic tasks. Signal temporal logic (STL) has been widely used
to systematically and rigorously specify these requirements. However,
traditional methods of finding the control policy under those STL requirements
are computationally complex and not scalable to high-dimensional or systems
with complex nonlinear dynamics. Reinforcement learning (RL) methods can learn
the policy to satisfy the STL specifications via hand-crafted or STL-inspired
rewards, but might encounter unexpected behaviors due to ambiguity and sparsity
in the reward. In this paper, we propose a method to directly learn a neural
network controller to satisfy the requirements specified in STL. Our controller
learns to roll out trajectories to maximize the STL robustness score in
training. In testing, similar to Model Predictive Control (MPC), the learned
controller predicts a trajectory within a planning horizon to ensure the
satisfaction of the STL requirement in deployment. A backup policy is designed
to ensure safety when our controller fails. Our approach can adapt to various
initial conditions and environmental parameters. We conduct experiments on six
tasks, where our method with the backup policy outperforms the classical
methods (MPC, STL-solver), model-free and model-based RL methods in STL
satisfaction rate, especially on tasks with complex STL specifications while
being 10X-100X faster than the classical methods.Comment: Accepted by IEEE Robotics and Automation Letters (RA-L) and ICRA202
Prescribed Performance Control Guided Policy Improvement for Satisfying Signal Temporal Logic Tasks
Signal temporal logic (STL) provides a user-friendly interface for defining
complex tasks for robotic systems. Recent efforts aim at designing control laws
or using reinforcement learning methods to find policies which guarantee
satisfaction of these tasks. While the former suffer from the trade-off between
task specification and computational complexity, the latter encounter
difficulties in exploration as the tasks become more complex and challenging to
satisfy. This paper proposes to combine the benefits of the two approaches and
use an efficient prescribed performance control (PPC) base law to guide
exploration within the reinforcement learning algorithm. The potential of the
method is demonstrated in a simulated environment through two sample
navigational tasks.Comment: This is the extended version of the paper accepted to the 2019
American Control Conference (ACC), Philadelphia (to be published
Q-learning for robust satisfaction of signal temporal logic specifications
This paper addresses the problem of learning optimal policies for satisfying signal temporal logic (STL) specifications by agents with unknown stochastic dynamics. The system is modeled as a Markov decision process, in which the states represent partitions of a continuous space and the transition probabilities are unknown. We formulate two synthesis problems where the desired STL specification is enforced by maximizing the probability of satisfaction, and the expected robustness degree, that is, a measure quantifying the quality of satisfaction. We discuss that Q-learning is not directly applicable to these problems because, based on the quantitative semantics of STL, the probability of satisfaction and expected robustness degree are not in the standard objective form of Q-learning. To resolve this issue, we propose an approximation of STL synthesis problems that can be solved via Q-learning, and we derive some performance bounds for the policies obtained by the approximate approach. The performance of the proposed method is demonstrated via simulations
A provably correct MPC approach to safety control of urban traffic networks
Model predictive control (MPC) is a popular strategy for urban traffic management that is able to incorporate physical and user defined constraints. However, the current MPC methods rely on finite horizon predictions that are unable to guarantee desirable behaviors over long periods of time. In this paper we design an MPC strategy that is guaranteed to keep the evolution of a network in a desirable yet arbitrary -safe- set, while optimizing a finite horizon cost function. Our approach relies on finding a robust controlled invariant set inside the safe set that provides an appropriate terminal constraint for the MPC optimization problem. An illustrative example is included.This work was partially supported by the NSF under grants CPS-1446151 and CMMI-1400167. (CPS-1446151 - NSF; CMMI-1400167 - NSF
- …