20 research outputs found

    Learning Models of Behavior From Demonstration and Through Interaction

    Get PDF
    This dissertation is concerned with the autonomous learning of behavioral models for sequential decision-making. It addresses both the theoretical aspects of behavioral modeling — like the learning of appropriate task representations — and the practical difficulties regarding algorithmic implementation. The first half of the dissertation deals with the problem of learning from demonstration, which consists in generalizing the behavior of an expert demonstrator based on observation data. Two alternative modeling paradigms are discussed. First, a nonparametric inference framework is developed to capture the behavior of the expert at the policy level. A key challenge in the design of the framework is the objective of making minimal assumptions about the observed behavior type while dealing with a potentially infinite number of system states. Due to the automatic adaptation of the model order to the complexity of the shown behavior, the proposed approach is able to pick up stochastic expert policies of arbitrary structure. Second, a nonparametric inverse reinforcement learning framework based on subgoal modeling is proposed, which allows to efficiently reconstruct the expert behavior at the intentional level. Other than most existing approaches, the proposed methodology naturally handles periodic tasks and situations where the intentions of the expert change over time. By adaptively decomposing the decision-making problem into a series of task-related subproblems, both inference frameworks are suitable for learning compact encodings of the expert behavior. For performance evaluation, the models are compared with existing frameworks on synthetic benchmark scenarios and real-world data recorded on a KUKA lightweight robotic arm. In the second half of the work, the focus shifts to multi-agent modeling, with the aim of analyzing the decision-making process in large-scale homogeneous agent networks. To fill the gap of decentralized system models with explicit agent homogeneity, a new class of agent systems is introduced. For this system class, the problem of inverse reinforcement learning is discussed and a meta learning algorithm is devised that makes explicit use of the system symmetries. As part of the algorithm, a heterogeneous reinforcement learning scheme is proposed for optimizing the collective behavior of the system based on the local state observations made at the agent level. Finally, to scale the simulation of the network to large agent numbers, a continuum version of the model is derived. After discussing the system components and associated optimality criteria, numerical examples of collective tasks are given that demonstrate the capabilities of the continuum approach and show its advantages over large-scale agent-based modeling

    Kernel Density Bayesian Inverse Reinforcement Learning

    Full text link
    Inverse reinforcement learning~(IRL) is a powerful framework to infer an agent's reward function by observing its behavior, but IRL algorithms that learn point estimates of the reward function can be misleading because there may be several functions that describe an agent's behavior equally well. A Bayesian approach to IRL models a distribution over candidate reward functions, alleviating the shortcomings of learning a point estimate. However, several Bayesian IRL algorithms use a QQ-value function in place of the likelihood function. The resulting posterior is computationally intensive to calculate, has few theoretical guarantees, and the QQ-value function is often a poor approximation for the likelihood. We introduce kernel density Bayesian IRL (KD-BIRL), which uses conditional kernel density estimation to directly approximate the likelihood, providing an efficient framework that, with a modified reward function parameterization, is applicable to environments with complex and infinite state spaces. We demonstrate KD-BIRL's benefits through a series of experiments in Gridworld environments and a simulated sepsis treatment task

    Novel Methods For Human-robot Shared Control In Collaborative Robotics

    Get PDF
    Blended shared control is a method to continuously combine control inputs from traditional automatic control systems and human operators for control of machines. An automatic control system generates control input based on feedback of measured signals, whereas a human operator generates control input based on experience, task knowledge, and awareness and sensing of the environment in which the machine is operating. Such active blending of inputs from the automatic control agent and the human agent to jointly control machines is expected to provide benefits in terms of utilizing the unique features of both agents, i.e., better task execution performance of automatic control systems based on sensed signals and maintaining situation awareness by having the human in the loop to handle safety concerns and environmental uncertainties. The shared control approach in this sense provides an alternative to full autonomy. Many existing and future applications of such an approach include automobiles, underwater vehicles, ships, airplanes, construction machines, space manipulators, surgery robots, and power wheelchairs, where machines are still mostly operated by human operators for safety concerns. Developing machines for full autonomy requires not only advances in machines but also the ability to sense the environment by placing sensors in it; the latter could be a very difficult task for many such applications due to perceived uncertainties and changing conditions. The notion of blended shared control, as a more practical alternative to full autonomy, enables keeping the human operator in the loop to initiate machine actions with real-time intelligent assistance provided by automatic control. The problem of how to blend the two inputs and development of associated scientific tools to formalize and achieve blended shared control is the focus of this work. Specifically, the following essential aspects are investigated and studied. Task learning: modeling of a human-operated robotic task from demonstration into subgoals such that execution patterns are captured in a simple manner and provide reference for human intent prediction and automatic control generation. Intent prediction: prediction of human operator's intent in the framework of subgoal models such that it encodes the probability of a human operator seeking a particular subgoal. Input blending: generating automatic control input and dynamically combining it with human operator's input based on prediction probability; this will also account for situations where the human operator may take unexpected actions to avoid danger by yielding full control authority to the human operator. Subgoal adjustment: adjusting the learned, nominal task model dynamically to adapt to task changes, such as changes to target object, which will cause the nominal model learned from demonstration to lose its effectiveness. This dissertation formalizes these notions and develops novel tools and algorithms for enabling blended shared control. To evaluate the developed scientific tools and algorithms, a scaled hydraulic excavator for a typical trenching and truck-loading task is employed as a specific example. Experimental results are provided to corroborate the tools and methods. To expand the developed methods and further explore shared control with different applications, this dissertation also studied the collaborative operation of robot manipulators. Specifically, various operational interfaces are systematically designed, a hybrid force-motion controller is integrated with shared control in a mixed world-robot frame to facilitate human-robot collaboration, and a method that utilizes vision-based feedback to predict the human operator's intent and provides shared control assistance is proposed. These methods provide ways for human operators to remotely control robotic manipulators effectively while receiving assistance by intelligent shared control in different applications. Several robotic manipulation experiments were conducted to corroborate the expanded shared control methods by utilizing different industrial robots

    Model-Based Bayesian Inference, Learning, and Decision-Making with Applications in Communication Systems

    Get PDF
    This dissertation discusses the mathematical modeling of dynamical systems under uncertainty, Bayesian inference and learning of the unknown quantities, such as the system’s state and its parameters, and computing optimal decisions within these models. Probabilistic dynamical models achieve substantial performance gains for decision-making. Their ability to predict the system state depending on the decisions enables efficient learning with small amounts of data, and therefore make guided optimal decisions possible. Multiple probabilistic models for dynamical state-space systems under discrete-time and continuous-time assumptions are presented. They provide the basis to compute Bayesian beliefs and optimal decisions under uncertainty. Numerical algorithms are developed, by starting with the exact system description and making principled approximations to arrive at tractable algorithms for both inference and learning, as well as decision-making. The developed methods are showcased on communication systems and other commonplace applications. The specific contributions to modeling, inference and decision-making are outlined in the following. The first contribution is an inference method for non-stationary point process data, which is common, for example, in queues within communication systems. A hierarchical Bayesian non-parametric model with a gamma-distributional assumption on the holding times of the process serves as a basis. For inference, a computationally tractable method based on a Markov chain Monte Carlo sampler is derived and subsequently validated under the modeling assumption using synthetic data and in a real-data scenario. The second contribution is a fast algorithm for adapting bitrates in video streaming. This is achieved by a new algorithm for adaptive bitrate video streaming that uses a sparse Bayesian linear model for a quality-of-experience score. The algorithm uses a tractable inference scheme to extract relevant features from network data and builds on a contextual bandit strategy for decision making. The algorithm is validated numerically and an implementation and evaluation in a named data networking scenario is given. The third contribution is a novel method that exploits correlations in decision-making problems. Underlying model parameters can be inferred very data-efficiently, by building a Bayesian model for correlated count data from Markov decision processes. To overcome intractabilities arising in exact Bayesian inference, a tractable variational inference algorithm is presented exploiting an augmentation scheme. The method is extensively evaluated in various decision-making scenarios, such as, reinforcement learning in a queueing system. The final contribution is concerned with simultaneous state inference and decision-making in continuous-time partially observed environments. A new model for discrete state and action space systems is presented and the corresponding equations for exact Bayesian inference are discussed. The optimality conditions for decision-making are derived. Two tractable numerical schemes are presented, which exploit function approximators to learn the solution in the belief space. Applicability of the method is shown on several examples, including a scheduling algorithm under partial observability

    Automated Video Game Testing Using Synthetic and Human-Like Agents

    Get PDF
    In this paper, we present a new methodology that employs tester agents to automate video game testing. We introduce two types of agents -synthetic and human-like- and two distinct approaches to create them. Our agents are derived from Reinforcement Learning (RL) and Monte Carlo Tree Search (MCTS) agents, but focus on finding defects. The synthetic agent uses test goals generated from game scenarios, and these goals are further modified to examine the effects of unintended game transitions. The human-like agent uses test goals extracted by our proposed multiple greedy-policy inverse reinforcement learning (MGP-IRL) algorithm from tester trajectories. MGPIRL captures multiple policies executed by human testers. These testers' aims are finding defects while interacting with the game to break it, which is considerably different from game playing. We present interaction states to model such interactions. We use our agents to produce test sequences, run the game with these sequences, and check the game for each run with an automated test oracle. We analyze the proposed method in two parts: we compare the success of human-like and synthetic agents in bug finding, and we evaluate the similarity between humanlike agents and human testers. We collected 427 trajectories from human testers using the General Video Game Artificial Intelligence (GVG-AI) framework and created three games with 12 levels that contain 45 bugs. Our experiments reveal that human-like and synthetic agents compete with human testers' bug finding performances. Moreover, we show that MGP-IRL increases the human-likeness of agents while improving the bug finding performance

    18th IEEE Workshop on Nonlinear Dynamics of Electronic Systems: Proceedings

    Get PDF
    Proceedings of the 18th IEEE Workshop on Nonlinear Dynamics of Electronic Systems, which took place in Dresden, Germany, 26 – 28 May 2010.:Welcome Address ........................ Page I Table of Contents ........................ Page III Symposium Committees .............. Page IV Special Thanks ............................. Page V Conference program (incl. page numbers of papers) ................... Page VI Conference papers Invited talks ................................ Page 1 Regular Papers ........................... Page 14 Wednesday, May 26th, 2010 ......... Page 15 Thursday, May 27th, 2010 .......... Page 110 Friday, May 28th, 2010 ............... Page 210 Author index ............................... Page XII

    Gaze control for visually guided manipulation

    Get PDF
    Human studies have shown that gaze shifts are mostly driven by the task. One explanation is that fixations gather information about task relevant properties, where task relevance is signalled by reward. This thesis pursues primarily an engineering science goal to determine what mechanisms a rational decision maker could employ to select a gaze location optimally, or near optimally, given limited information and limited computation time. To do so we formulate and characterise three computational models of gaze shifting (implemented on a simulated humanoid robot), which use lookahead to imagine the informational effects of possible gaze fixations. Our first model selects the gaze that most reduces uncertainty in the scene (Unc), the second maximises expected rewards by reducing uncertainty (Rew+Unc), and the third maximises the expected gain in cumulative reward by reducing uncertainty (Rew+Unc+Gain). We also present an integrated account of a visual search process into the Rew+Unc+Gain gaze scheme. Our secondary goal is concerned with the way in which humans might select the next gaze location. We compare the hand-eye coordination timings of our models to previously published human data, and we provide evidence that only the models that incorporate both uncertainty and reward (Rew+Unc and Rew+Unc+Gain) match human data
    corecore