440 research outputs found

    Probabilistic Plan Synthesis for Coupled Multi-Agent Systems

    Full text link
    This paper presents a fully automated procedure for controller synthesis for multi-agent systems under the presence of uncertainties. We model the motion of each of the NN agents in the environment as a Markov Decision Process (MDP) and we assign to each agent one individual high-level formula given in Probabilistic Computational Tree Logic (PCTL). Each agent may need to collaborate with other agents in order to achieve a task. The collaboration is imposed by sharing actions between the agents. We aim to design local control policies such that each agent satisfies its individual PCTL formula. The proposed algorithm builds on clustering the agents, MDP products construction and controller policies design. We show that our approach has better computational complexity than the centralized case, which traditionally suffers from very high computational demands.Comment: IFAC WC 2017, Toulouse, Franc

    Active Finite Reward Automaton Inference and Reinforcement Learning Using Queries and Counterexamples

    Full text link
    Despite the fact that deep reinforcement learning (RL) has surpassed human-level performances in various tasks, it still has several fundamental challenges. First, most RL methods require intensive data from the exploration of the environment to achieve satisfactory performance. Second, the use of neural networks in RL renders it hard to interpret the internals of the system in a way that humans can understand. To address these two challenges, we propose a framework that enables an RL agent to reason over its exploration process and distill high-level knowledge for effectively guiding its future explorations. Specifically, we propose a novel RL algorithm that learns high-level knowledge in the form of a finite reward automaton by using the L* learning algorithm. We prove that in episodic RL, a finite reward automaton can express any non-Markovian bounded reward functions with finitely many reward values and approximate any non-Markovian bounded reward function (with infinitely many reward values) with arbitrary precision. We also provide a lower bound for the episode length such that the proposed RL approach almost surely converges to an optimal policy in the limit. We test this approach on two RL environments with non-Markovian reward functions, choosing a variety of tasks with increasing complexity for each environment. We compare our algorithm with the state-of-the-art RL algorithms for non-Markovian reward functions, such as Joint Inference of Reward machines and Policies for RL (JIRP), Learning Reward Machine (LRM), and Proximal Policy Optimization (PPO2). Our results show that our algorithm converges to an optimal policy faster than other baseline methods

    Modular Learning and Optimization for Planning of Discrete Event Systems

    Get PDF
    Optimization of industrial processes, such as manufacturing cells, can have great impact on their performance. Finding optimal solutions to these large-scale systems is, however, a complex problem. They typically include multiple subsystems, and the search space generally grows exponentially with each subsystem. This is usually referred to as the state explosion problem and is a well-known problem within the control and optimization of automation systems. This thesis proposes two main contributions to improve and to simplify the optimization of these systems. The first is a new method of solving these optimization problems using a compositional optimization approach. This integrates optimization with techniques from compositional supervisory control using modular formal models, dividing the optimization of subsystems into separate subproblems. The second is a modular learning approach that alleviates the need for prior knowledge of the systems and system experts when applying compositional optimization. The key to both techniques is the division of the large system into smaller subsystems and the identification of local behavior in these subsystems, i.e. behavior that is independent of all other subsystems. It is proven in this thesis that this local behavior can be partially optimized individually without affecting the global optimal solution. This is used to reduce the state space in each subsystem, and to construct the global optimal solution compositionally.The thesis also shows that the proposed techniques can be integrated to compute global optimal solutions to large-scale optimization problems, too big to solve based on traditional monolithic models

    Efficient Symbolic Supervisory Synthesis and Guard Generation: Evaluating partitioning techniques for the state-space exploration

    Get PDF
    The supervisory control theory (SCT) is a model-based framework, which automatically synthesizes a supervisor that restricts a plant to be controlled based on specifications to be fulfilled. Two main problems, typically encountered in industrial applications, prevent SCT from having a major breakthrough. First, the supervisor which is synthesized automatically from the given plant and specification models might be incomprehensible to the users. To tackle this problem, an approach was recently presented to extract compact propositional formulae (guards) from the supervisor, represented symbolically by binary decision diagrams (BDD). These guards are then attached to the original models, which results in a modular and comprehensible representation of the supervisor. However, this approach, which computes the supervisor symbolically in the conjunctive way, might lead to another problem: the state-space explosion, because of the large number of intermediate BDD nodes during computation. To alleviate this problem, we introduce in this paper an alternative approach that is based on the disjunctive partitioning technique, including a set of selection heuristics. Then this approach is adapted to the guard generation procedure. Finally, the efficiency of the presented approach is demonstrated on a set of benchmark examples

    Safe Multi-Agent Reinforcement Learning with Quantitatively Verified Constraints

    Get PDF
    Multi-agent reinforcement learning is a machine learning technique that involves multiple agents attempting to solve sequential decision-making problems. This learn- ing is driven by objectives and failures modelled as positive numerical rewards and negative numerical punishments, respectively. These multi-agent systems explore shared environments in order to find the highest cumulative reward for the sequential decision-making problem. Multi-agent reinforcement learning within autonomous systems has become a prominent research area with many examples of success and potential applications. However, the safety-critical nature of many of these potential applications is currently underexplored—and under-supported. Reinforcement learn- ing, being a stochastic process, is unpredictable, meaning there are no assurances that these systems will not harm themselves, other expensive equipment, or humans. This thesis introduces Assured Multi-Agent Reinforcement Learning (AMARL) to mitigate these issues. This approach constrains the actions of learning systems during and after a learning process. Unlike previous multi-agent reinforcement learning methods, AMARL synthesises constraints through the formal verification of abstracted multi- agent Markov decision processes that model the environment’s functional and safety aspects. Learned policies guided by these constraints are guaranteed to satisfy strict functional and safety requirements and are Pareto-optimal with respect to a set of op- timisation objectives. Two AMARL extensions are also introduced in the thesis. Firstly, the thesis presents a Partial Policy Reuse method that allows the use of previously learned knowledge to reduce AMARL learning time significantly when initial models are inaccurate. Secondly, an Adaptive Constraints method is introduced to enable agents to adapt to environmental changes by constraining their learning through a procedure that follows the styling of monitoring, analysis, planning, and execution during runtime. AMARL and its extensions are evaluated within three case studies from different navigation-based domains and shown to produce policies that meet strict safety and functional requirements

    Mobile-Based Interactive Music for Public Spaces

    Get PDF
    With the emergence of modern mobile devices equipped with various types of built-in sensors, interactive art has become easily accessible to everyone, musicians and non-musicians alike. These efficient computers are able to analyze human activity, location, gesture, etc., and based on this information dynamically change, or create an artwork in realtime. This thesis presents an interactive mobile system that solely uses the standard embedded sensors available in current typical smart devices such as phones, and tablets to create an audio-only augmented reality for a singled out public space in order to explore the potential for social-musical interaction, without the need for any significant external infrastructure
    • …
    corecore