10 research outputs found

    Risk-sensitive reinforcement learning applied to control under constraints

    Get PDF
    In this paper, we consider Markov Decision Processes (MDPs) with error states. Error states are those states entering which is undesirable or dangerous. We define the risk with respect to a policy as the probability of entering such a state when the policy is pursued. We consider the problem of finding good policies whose risk is smaller than some user-specified threshold, and formalize it as a constrained MDP with two criteria. The first criterion corresponds to the value function originally given. We will show that the risk can be formulated as a second criterion function based on a cumulative return, whose definition is independent of the original value function. We present a model free, heuristic reinforcement learning algorithm that aims at finding good deterministic policies. It is based on weighting the original value function and the risk. The weight parameter is adapted in order to find a feasible solution for the constrained problem that has a good performance with respect to the value function. The algorithm was successfully applied to the control of a feed tank with stochastic inflows that lies upstream of a distillation column. This control task was originally formulated as an optimal control problem with chance constraints, and it was solved under certain assumptions on the model to obtain an optimal solution. The power of our learning algorithm is that it can be used even when some of these restrictive assumptions are relaxed. 1

    HTN planning: Overview, comparison, and beyond

    Get PDF
    Hierarchies are one of the most common structures used to understand and conceptualise the world. Within the field of Artificial Intelligence (AI) planning, which deals with the automation of world-relevant problems, Hierarchical Task Network (HTN) planning is the branch that represents and handles hierarchies. In particular, the requirement for rich domain knowledge to characterise the world enables HTN planning to be very useful, and also to perform well. However, the history of almost 40 years obfuscates the current understanding of HTN planning in terms of accomplishments, planning models, similarities and differences among hierarchical planners, and its current and objective image. On top of these issues, the ability of hierarchical planning to truly cope with the requirements of real-world applications has been often questioned. As a remedy, we propose a framework-based approach where we first provide a basis for defining different formal models of hierarchical planning, and define two models that comprise a large portion of HTN planners. Second, we provide a set of concepts that helps in interpreting HTN planners from the aspect of their search space. Then, we analyse and compare the planners based on a variety of properties organised in five segments, namely domain authoring, expressiveness, competence, computation and applicability. Furthermore, we select Web service composition as a real-world and current application, and classify and compare the approaches that employ HTN planning to solve the problem of service composition. Finally, we conclude with our findings and present directions for future work. In summary, we provide a novel and comprehensive viewpoint on a core AI planning technique.<br/

    Safe Robot Planning and Control Using Uncertainty-Aware Deep Learning

    Get PDF
    In order for robots to autonomously operate in novel environments over extended periods of time, they must learn and adapt to changes in the dynamics of their motion and the environment. Neural networks have been shown to be a versatile and powerful tool for learning dynamics and semantic information. However, there is reluctance to deploy these methods on safety-critical or high-risk applications, since neural networks tend to be black-box function approximators. Therefore, there is a need for investigation into how these machine learning methods can be safely leveraged for learning-based controls, planning, and traversability. The aim of this thesis is to explore methods for both establishing safety guarantees as well as accurately quantifying risks when using deep neural networks for robot planning, especially in high-risk environments. First, we consider uncertainty-aware Bayesian Neural Networks for adaptive control, and introduce a method for guaranteeing safety under certain assumptions. Second, we investigate deep quantile regression learning methods for learning time-and-state varying uncertainties, which we use to perform trajectory optimization with Model Predictive Control. Third, we introduce a complete framework for risk-aware traversability and planning, which we use to enable safe exploration of extreme environments. Fourth, we again leverage deep quantile regression and establish a method for accurately learning the distribution of traversability risks in these environments, which can be used to create safety constraints for planning and control.Ph.D

    Hybrid Mission Planning with Coalition Formation

    Get PDF

    Safety of Autonomous Cognitive-oriented Robots

    Get PDF
    Service robots shall very soon autonomously provide services in all spheres of life by executing demanding and complex tasks in dynamic, complex environments and by collaborating with human users. In order to push forward the understanding of the safety problem a novel classification of robot hazards is provided. The so-called object interaction hazards are derived which arise when environment objects interact with objects that are manipulated by a robot. Taking into account the current state-of-the-art, it can be stated that this denotes a novel problem area. However, it is already proposed the so-called dynamic risk assessment approach, which shall enable the robot to perceive the risk of current and upcoming situations. In order to realize such a risk-aware planning system for the first time, dynamic risk assessment is integrated within a cognitive architecture serving cognitive functions like anticipation, planning and learning. In this connection, action spaces (sets of possible upcoming situations) are dynamically anticipated assessed with regard to comprised risks. Though, (initial) knowledge about hazards is required in order to realize this. Therefore, a novel procedural model is developed for systematically generating a safety knowledge base. However, it can be assumed that the safety knowledge potentially lacks completeness. The application of AI-based approaches constitutes a noteworthy opportunity. For this reason, light is shed on strategically influential learning methods in safety-critical contexts. Finally, this work describes the generation, integration, utilization, and maintenance of a system-internal safety knowledge base for dynamic risk assessment. It denotes an overall concept toward solving the advanced safety problem and confirms in principle the realization of a safe behavior of autonomous and intelligent systems.Sicherheit autonomer kognitivorientierter Roboter Autonome mobile Serviceroboter sollen zukünftig selbstständig Dienstleistungen in allen Lebensbereichen erbringen, auch in direkter Nähe zum Menschen. Um das Verständnis für Sicherheit in der Robotik zu erwei-tern, wird zunächst eine neue Klassifizierung der möglichen Gefahren vorgenommen. Hiervon wird die Klasse der Objektinteraktionsgefahren abgeleitet. Diese Gefahren entstehen, wenn Objekte der Umgebung mit denen interagieren, die der Roboter greift und transportiert. In Anbetracht des aktuellen Standes der Sicherheits-technik in der Robotik wird klar, dass sich hier ein neues Problemfeld auftut. Grundsätzlich wurde bereits ein dynamischer Risikountersuchungsansatz vorgeschlagen, welcher den Roboter selbst befähigen soll, Situatio-nen hinsichtlich möglicher Gefahren zu untersuchen. Um dadurch eine risikobewusste Handlungsplanung erstmals zu realisieren, wird dieser in eine kognitive Architektur integriert, um kognitive Funktionen, wie Anti-zipation, Planen und Lernen zu nutzen. Hierbei werden mögliche Handlungsräume dynamisch antizipiert und mittels dynamischer Risikoanalyse auf mögliche Gefahren untersucht. Um (Objektinteraktions-) Gefahren mit Hilfe der dynamischer Risikountersuchung bestimmen zu können, bedarf es eines (initialen) Wissens über mögliche Gefahren. Aus diesem Grund wird ein Vorgehensmodell zur systematischen Erzeugung einer solchen Sicherheitswissensbasis entwickelt. Dieses Sicherheitswissen ist jedoch potentiell unvollständig. Daher stellt die Erweiterung und Verfeinerung desselben eine Notwendigkeit dar. Hierbei können die Ansätze aus dem Bereich der künstlichen Intelligenz als nützliche Möglichkeit wahrgenommen werden. Daher werden strate-gisch wichtige Lernmethoden hinsichtlich der Anwendung in einem sicherheitskritischen Kontext untersucht. Die vorliegende Arbeit beschreibt die Erzeugung, die Integration, die Verwendung und die Aufrechterhaltung einer systeminternen Sicherheitswissensbasis zum Zwecke der dynamischen Risikountersuchung. Sie stellt hierbei ein Gesamtkonzept dar, dass zur Lösung des erweiterten Sicherheitsproblems beiträgt und somit die prinzipielle Realisierung des sicheren Betriebs von autonomen und intelligenten bestätigt
    corecore