3,626 research outputs found
Probabilistically Safe Policy Transfer
Although learning-based methods have great potential for robotics, one
concern is that a robot that updates its parameters might cause large amounts
of damage before it learns the optimal policy. We formalize the idea of safe
learning in a probabilistic sense by defining an optimization problem: we
desire to maximize the expected return while keeping the expected damage below
a given safety limit. We study this optimization for the case of a robot
manipulator with safety-based torque limits. We would like to ensure that the
damage constraint is maintained at every step of the optimization and not just
at convergence. To achieve this aim, we introduce a novel method which predicts
how modifying the torque limit, as well as how updating the policy parameters,
might affect the robot's safety. We show through a number of experiments that
our approach allows the robot to improve its performance while ensuring that
the expected damage constraint is not violated during the learning process
When propriety is improper
We argue that philosophers ought to distinguish epistemic decision theory and epistemology, in just the way ordinary decision theory is distinguished from ethics. Once one does this, the internalist arguments that motivate much of epistemic decision theory make sense, given specific interpretations of the formalism. Making this distinction also causes trouble for the principle called Propriety, which says, roughly, that the only acceptable epistemic utility functions make probabilistically coherent credence functions immodest. We cast doubt on this requirement, but then argue that epistemic decision theorists should never have wanted such a strong principle in any case
Probabilistically safe vehicle control in a hostile environment
In this paper we present an approach to control a vehicle in a hostile environment with static obstacles and moving adversaries. The vehicle is required to satisfy a mission objective expressed as a temporal logic specification over a set of properties satisfied at regions of a partitioned environment. We model the movements of adversaries in between regions of the environment as Poisson processes. Furthermore, we assume that the time it takes for the vehicle to traverse in between two facets of each region is exponentially distributed, and we obtain the rate of this exponential distribution from a simulator of the environment. We capture the motion of the vehicle and the vehicle updates of adversaries distributions as a Markov Decision Process. Using tools in Probabilistic Computational Tree Logic, we find a control strategy for the vehicle that maximizes the probability of accomplishing the mission objective. We demonstrate our approach with illustrative case studies
Certified Reinforcement Learning with Logic Guidance
This paper proposes the first model-free Reinforcement Learning (RL)
framework to synthesise policies for unknown, and continuous-state Markov
Decision Processes (MDPs), such that a given linear temporal property is
satisfied. We convert the given property into a Limit Deterministic Buchi
Automaton (LDBA), namely a finite-state machine expressing the property.
Exploiting the structure of the LDBA, we shape a synchronous reward function
on-the-fly, so that an RL algorithm can synthesise a policy resulting in traces
that probabilistically satisfy the linear temporal property. This probability
(certificate) is also calculated in parallel with policy learning when the
state space of the MDP is finite: as such, the RL algorithm produces a policy
that is certified with respect to the property. Under the assumption of finite
state space, theoretical guarantees are provided on the convergence of the RL
algorithm to an optimal policy, maximising the above probability. We also show
that our method produces ''best available'' control policies when the logical
property cannot be satisfied. In the general case of a continuous state space,
we propose a neural network architecture for RL and we empirically show that
the algorithm finds satisfying policies, if there exist such policies. The
performance of the proposed framework is evaluated via a set of numerical
examples and benchmarks, where we observe an improvement of one order of
magnitude in the number of iterations required for the policy synthesis,
compared to existing approaches whenever available.Comment: This article draws from arXiv:1801.08099, arXiv:1809.0782
Safety-Aware Apprenticeship Learning
Apprenticeship learning (AL) is a kind of Learning from Demonstration
techniques where the reward function of a Markov Decision Process (MDP) is
unknown to the learning agent and the agent has to derive a good policy by
observing an expert's demonstrations. In this paper, we study the problem of
how to make AL algorithms inherently safe while still meeting its learning
objective. We consider a setting where the unknown reward function is assumed
to be a linear combination of a set of state features, and the safety property
is specified in Probabilistic Computation Tree Logic (PCTL). By embedding
probabilistic model checking inside AL, we propose a novel
counterexample-guided approach that can ensure safety while retaining
performance of the learnt policy. We demonstrate the effectiveness of our
approach on several challenging AL scenarios where safety is essential.Comment: Accepted by International Conference on Computer Aided Verification
(CAV) 201
Sampling-Based Methods for Factored Task and Motion Planning
This paper presents a general-purpose formulation of a large class of
discrete-time planning problems, with hybrid state and control-spaces, as
factored transition systems. Factoring allows state transitions to be described
as the intersection of several constraints each affecting a subset of the state
and control variables. Robotic manipulation problems with many movable objects
involve constraints that only affect several variables at a time and therefore
exhibit large amounts of factoring. We develop a theoretical framework for
solving factored transition systems with sampling-based algorithms. The
framework characterizes conditions on the submanifold in which solutions lie,
leading to a characterization of robust feasibility that incorporates
dimensionality-reducing constraints. It then connects those conditions to
corresponding conditional samplers that can be composed to produce values on
this submanifold. We present two domain-independent, probabilistically complete
planning algorithms that take, as input, a set of conditional samplers. We
demonstrate the empirical efficiency of these algorithms on a set of
challenging task and motion planning problems involving picking, placing, and
pushing
Pay Attention to How You Drive: Safe and Adaptive Model-Based Reinforcement Learning for Off-Road Driving
Autonomous off-road driving is challenging as risky actions taken by the
robot may lead to catastrophic damage. As such, developing controllers in
simulation is often desirable as it provides a safer and more economical
alternative. However, accurately modeling robot dynamics is difficult due to
the complex robot dynamics and terrain interactions in unstructured
environments. Domain randomization addresses this problem by randomizing
simulation dynamics parameters, however this approach sacrifices performance
for robustness leading to policies that are sub-optimal for any target
dynamics. We introduce a novel model-based reinforcement learning approach that
aims to balance robustness with adaptability. Our approach trains a System
Identification Transformer (SIT) and an Adaptive Dynamics Model (ADM) under a
variety of simulated dynamics. The SIT uses attention mechanisms to distill
state-transition observations from the target system into a context vector,
which provides an abstraction for its target dynamics. Conditioned on this, the
ADM probabilistically models the system's dynamics. Online, we use a Risk-Aware
Model Predictive Path Integral controller (MPPI) to safely control the robot
under its current understanding of the dynamics. We demonstrate in simulation
as well as in multiple real-world environments that this approach enables safer
behaviors upon initialization and becomes less conservative (i.e. faster) as
its understanding of the target system dynamics improves with more
observations. In particular, our approach results in an approximately 41%
improvement in lap-time over the non-adaptive baseline while remaining safe
across different environments
Improving time predictability of shared hardware resources in real-time multicore systems : emphasis on the space domain
Critical Real-Time Embedded Systems (CRTES) follow a verification and validation process on the timing and functional correctness. This process includes the timing analysis that provides Worst-Case Execution Time (WCET) estimates to provide evidence that the execution time of the system, or parts of it, remain within the deadlines. A key design principle for CRTES is the incremental qualification, whereby each software component can be subject to verification and validation independently of any other component, with obvious benefits for cost. At timing level, this requires time composability, such that the timing behavior of a function is not affected by other functions. CRTES are experiencing an unprecedented growth with rising performance demands that have motivated the use of multicore architectures. Multicores can provide the performance required and bring the potential of integrating several software functions onto the same hardware. However, multicore contention in the access to shared hardware resources creates a dependence of the execution time of a task with the rest of the tasks running simultaneously. This dependence threatens time predictability and jeopardizes time composability. In this thesis we analyze and propose hardware solutions to be applied on current multicore designs for CRTES to improve time predictability and time composability, focusing on the on-chip bus and the memory controller. At hardware level, we propose new bus and memory controller designs that control and mitigate contention between different cores and allow to have time composability by design, also in the context of mixed-criticality systems. At analysis level, we propose contention prediction models that factor the impact of contenders and don¿t need modifications to the hardware. We also propose a set of Performance Monitoring Counters (PMC) that provide evidence about the contention. We give an special emphasis on the Space domain focusing on the Cobham Gaisler NGMP multicore processor, which is currently assessed by the European Space Agency for its future missions.Los Sistemas CrÃticos Empotrados de Tiempo Real (CRTES) siguen un proceso de verificación y validación para su correctitud funcional y temporal. Este proceso incluye el análisis temporal que proporciona estimaciones de el peor caso del tiempo de ejecución (WCET) para dar evidencia de que el tiempo de ejecución del sistema, o partes de él, permanecen dentro de los lÃmites temporales. Un principio de diseño clave para los CRTES es la cualificación incremental, por la que cada componente de software puede ser verificado y validado independientemente del resto de componentes, con beneficios obvios para el coste. A nivel temporal, esto requiere composabilidad temporal, por la que el comportamiento temporal de una función no se ve afectado por otras funciones. CRTES están experimentando un crecimiento sin precedentes con crecientes demandas de rendimiento que han motivado el uso the arquitecturas multi-núcleo (multicore). Los procesadores multi-núcleo pueden proporcionar el rendimiento requerido y tienen el potencial de integrar varias funcionalidades software en el mismo hardware. A pesar de ello, la interferencia entre los diferentes núcleos que aparece en los recursos compartidos de os procesadores multi núcleo crea una dependencia del tiempo de ejecución de una tarea con el resto de tareas ejecutándose simultáneamente en el procesador. Esta dependencia amenaza la predictabilidad temporal y compromete la composabilidad temporal. En esta tésis analizamos y proponemos soluciones hardware para ser aplicadas en los diseños multi núcleo actuales para CRTES que mejoran la predictabilidad y composabilidad temporal, centrándose en el bus y el controlador de memoria internos al chip. A nivel de hardware, proponemos nuevos diseños de buses y controladores de memoria que controlan y mitigan la interferencia entre los diferentes núcleos y permiten tener composabilidad temporal por diseño, también en el contexto de sistemas de criticalidad mixta. A nivel de análisis, proponemos modelos de predicción de la interferencia que factorizan el impacto de los núcleos y no necesitan modificaciones hardware. También proponemos un conjunto de Contadores de Control del Rendimiento (PMC) que proporcionoan evidencia de la interferencia. En esta tésis, damós especial importancia al dominio espacial, centrándonos en el procesador mutli núcleo Cobham Gaisler NGMP, que está siendo actualmente evaluado por la Agencia Espacial Europea para sus futuras misiones
- …