16,625 research outputs found
Prescribed Performance Control Guided Policy Improvement for Satisfying Signal Temporal Logic Tasks
Signal temporal logic (STL) provides a user-friendly interface for defining
complex tasks for robotic systems. Recent efforts aim at designing control laws
or using reinforcement learning methods to find policies which guarantee
satisfaction of these tasks. While the former suffer from the trade-off between
task specification and computational complexity, the latter encounter
difficulties in exploration as the tasks become more complex and challenging to
satisfy. This paper proposes to combine the benefits of the two approaches and
use an efficient prescribed performance control (PPC) base law to guide
exploration within the reinforcement learning algorithm. The potential of the
method is demonstrated in a simulated environment through two sample
navigational tasks.Comment: This is the extended version of the paper accepted to the 2019
American Control Conference (ACC), Philadelphia (to be published
Incremental Sampling-based Algorithm for Minimum-violation Motion Planning
This paper studies the problem of control strategy synthesis for dynamical
systems with differential constraints to fulfill a given reachability goal
while satisfying a set of safety rules. Particular attention is devoted to
goals that become feasible only if a subset of the safety rules are violated.
The proposed algorithm computes a control law, that minimizes the level of
unsafety while the desired goal is guaranteed to be reached. This problem is
motivated by an autonomous car navigating an urban environment while following
rules of the road such as "always travel in right lane'' and "do not change
lanes frequently''. Ideas behind sampling based motion-planning algorithms,
such as Probabilistic Road Maps (PRMs) and Rapidly-exploring Random Trees
(RRTs), are employed to incrementally construct a finite concretization of the
dynamics as a durational Kripke structure. In conjunction with this, a weighted
finite automaton that captures the safety rules is used in order to find an
optimal trajectory that minimizes the violation of safety rules. We prove that
the proposed algorithm guarantees asymptotic optimality, i.e., almost-sure
convergence to optimal solutions. We present results of simulation experiments
and an implementation on an autonomous urban mobility-on-demand system.Comment: 8 pages, final version submitted to CDC '1
Technical Report: A Receding Horizon Algorithm for Informative Path Planning with Temporal Logic Constraints
This technical report is an extended version of the paper 'A Receding Horizon
Algorithm for Informative Path Planning with Temporal Logic Constraints'
accepted to the 2013 IEEE International Conference on Robotics and Automation
(ICRA). This paper considers the problem of finding the most informative path
for a sensing robot under temporal logic constraints, a richer set of
constraints than have previously been considered in information gathering. An
algorithm for informative path planning is presented that leverages tools from
information theory and formal control synthesis, and is proven to give a path
that satisfies the given temporal logic constraints. The algorithm uses a
receding horizon approach in order to provide a reactive, on-line solution
while mitigating computational complexity. Statistics compiled from multiple
simulation studies indicate that this algorithm performs better than a baseline
exhaustive search approach.Comment: Extended version of paper accepted to 2013 IEEE International
Conference on Robotics and Automation (ICRA
MDP Optimal Control under Temporal Logic Constraints
In this paper, we develop a method to automatically generate a control policy
for a dynamical system modeled as a Markov Decision Process (MDP). The control
specification is given as a Linear Temporal Logic (LTL) formula over a set of
propositions defined on the states of the MDP. We synthesize a control policy
such that the MDP satisfies the given specification almost surely, if such a
policy exists. In addition, we designate an "optimizing proposition" to be
repeatedly satisfied, and we formulate a novel optimization criterion in terms
of minimizing the expected cost in between satisfactions of this proposition.
We propose a sufficient condition for a policy to be optimal, and develop a
dynamic programming algorithm that synthesizes a policy that is optimal under
some conditions, and sub-optimal otherwise. This problem is motivated by
robotic applications requiring persistent tasks, such as environmental
monitoring or data gathering, to be performed.Comment: Technical report accompanying the CDC2011 submissio
Robust Motion Planning employing Signal Temporal Logic
Motion planning classically concerns the problem of accomplishing a goal
configuration while avoiding obstacles. However, the need for more
sophisticated motion planning methodologies, taking temporal aspects into
account, has emerged. To address this issue, temporal logics have recently been
used to formulate such advanced specifications. This paper will consider Signal
Temporal Logic in combination with Model Predictive Control. A robustness
metric, called Discrete Average Space Robustness, is introduced and used to
maximize the satisfaction of specifications which results in a natural
robustness against noise. The comprised optimization problem is convex and
formulated as a Linear Program.Comment: 6 page
Probably Approximately Correct MDP Learning and Control With Temporal Logic Constraints
We consider synthesis of control policies that maximize the probability of
satisfying given temporal logic specifications in unknown, stochastic
environments. We model the interaction between the system and its environment
as a Markov decision process (MDP) with initially unknown transition
probabilities. The solution we develop builds on the so-called model-based
probably approximately correct Markov decision process (PAC-MDP) methodology.
The algorithm attains an -approximately optimal policy with
probability using samples (i.e. observations), time and space that
grow polynomially with the size of the MDP, the size of the automaton
expressing the temporal logic specification, ,
and a finite time horizon. In this approach, the system
maintains a model of the initially unknown MDP, and constructs a product MDP
based on its learned model and the specification automaton that expresses the
temporal logic constraints. During execution, the policy is iteratively updated
using observation of the transitions taken by the system. The iteration
terminates in finitely many steps. With high probability, the resulting policy
is such that, for any state, the difference between the probability of
satisfying the specification under this policy and the optimal one is within a
predefined bound.Comment: 9 pages, 5 figures, Accepted by 2014 Robotics: Science and Systems
(RSS
- …