129 research outputs found
Robot introspection through learned hidden Markov models
In this paper we describe a machine learning approach for acquiring a model of a robot behaviour from raw sensor data. We are interested in automating the acquisition of behavioural models to provide a robot with an introspective capability. We assume that the behaviour of a robot in achieving a task can be modelled as a finite stochastic state transition system. Beginning with data recorded by a robot in the execution of a task, we use unsupervised learning techniques to estimate a hidden Markov model (HMM) that can be used both for predicting and explaining the behaviour of the robot in subsequent executions of the task. We demonstrate that it is feasible to automate the entire process of learning a high quality HMM from the data recorded by the robot during execution of its task.The learned HMM can be used both for monitoring and controlling the behaviour of the robot. The ultimate purpose of our work is to learn models for the full set of tasks associated with a given problem domain, and to integrate these models with a generative task planner. We want to show that these models can be used successfully in controlling the execution of a plan. However, this paper does not develop the planning and control aspects of our work, focussing instead on the learning methodology and the evaluation of a learned model. The essential property of the models we seek to construct is that the most probable trajectory through a model, given the observations made by the robot, accurately diagnoses, or explains, the behaviour that the robot actually performed when making these observations. In the work reported here we consider a navigation task. We explain the learning process, the experimental setup and the structure of the resulting learned behavioural models. We then evaluate the extent to which explanations proposed by the learned models accord with a human observer's interpretation of the behaviour exhibited by the robot in its execution of the task
Recommended from our members
Multi-SLAM Systems for Fault-Tolerant Simultaneous Localization and Mapping
Mobile robots need accurate, high fidelity models of their operating environments in order to complete their tasks safely and efficiently. Generating these models is most often done via Simultaneous Localization and Mapping (SLAM), a paradigm where the robot alternatively estimates the most up-to-date model of the environment and its position relative to this model as it acquires new information from its sensors over time. Because robots operate in many different environments with different compute, memory, sensing, and form constraints, the nature and quality of information available to individual instances of different SLAM systems varies substantially. `One-size-fits-all\u27 solutions are thus exceedingly difficult to engineer, and highly specialized systems, which represent the state-of-the-art for most types of deployments, are not robust to operating conditions in which their assumptions are not met. This thesis seeks to investigate an alternative approach to these robustness and universality problems by incorporating existing SLAM solutions within a larger framework supported by planning and learning. The central idea is to combine learned models that estimate SLAM algorithm performance under a variety of sensory conditions, in this case neural networks, with planners designed for planning under uncertainty and partial observability, in this case partially observable Markov decision problems (POMDPs). Models of existing SLAM algorithms can be learned, and these models can then be used online to estimate the performance of a range of solutions to the SLAM problem at hand. The POMDP policy then selects the appropriate algorithm, given the estimated performance, cost of switching methods, and other information. This general approach may also be applicable to many other robotics problems that rely on data-fusion, such as grasp planning, motion planning, or object identification
Fast exploration and learning of latent graphs with aliased observations
We consider the problem of recovering a latent graph where the observations
at each node are \emph{aliased}, and transitions are stochastic. Observations
are gathered by an agent traversing the graph. Aliasing means that multiple
nodes emit the same observation, so the agent can not know in which node it is
located. The agent needs to uncover the hidden topology as accurately as
possible and in as few steps as possible. This is equivalent to efficient
recovery of the transition probabilities of a partially observable Markov
decision process (POMDP) in which the observation probabilities are known. An
algorithm for efficiently exploring (and ultimately recovering) the latent
graph is provided. Our approach is exponentially faster than naive exploration
in a variety of challenging topologies with aliased observations while
remaining competitive with existing baselines in the unaliased regime
Probabilistic Guarantees for Safe Deep Reinforcement Learning
Deep reinforcement learning has been successfully applied to many control
tasks, but the application of such agents in safety-critical scenarios has been
limited due to safety concerns. Rigorous testing of these controllers is
challenging, particularly when they operate in probabilistic environments due
to, for example, hardware faults or noisy sensors. We propose MOSAIC, an
algorithm for measuring the safety of deep reinforcement learning agents in
stochastic settings. Our approach is based on the iterative construction of a
formal abstraction of a controller's execution in an environment, and leverages
probabilistic model checking of Markov decision processes to produce
probabilistic guarantees on safe behaviour over a finite time horizon. It
produces bounds on the probability of safe operation of the controller for
different initial configurations and identifies regions where correct behaviour
can be guaranteed. We implement and evaluate our approach on agents trained for
several benchmark control problems
Finding Approximate POMDP solutions Through Belief Compression
Standard value function approaches to finding policies for Partially
Observable Markov Decision Processes (POMDPs) are generally considered to be
intractable for large models. The intractability of these algorithms is to a
large extent a consequence of computing an exact, optimal policy over the
entire belief space. However, in real-world POMDP problems, computing the
optimal policy for the full belief space is often unnecessary for good control
even for problems with complicated policy classes. The beliefs experienced by
the controller often lie near a structured, low-dimensional subspace embedded
in the high-dimensional belief space. Finding a good approximation to the
optimal value function for only this subspace can be much easier than computing
the full value function. We introduce a new method for solving large-scale
POMDPs by reducing the dimensionality of the belief space. We use Exponential
family Principal Components Analysis (Collins, Dasgupta and Schapire, 2002) to
represent sparse, high-dimensional belief spaces using small sets of learned
features of the belief state. We then plan only in terms of the low-dimensional
belief features. By planning in this low-dimensional space, we can find
policies for POMDP models that are orders of magnitude larger than models that
can be handled by conventional techniques. We demonstrate the use of this
algorithm on a synthetic problem and on mobile robot navigation tasks
Policy-Based Planning for Robust Robot Navigation
This thesis proposes techniques for constructing and implementing an extensible navigation framework suitable for operating alongside or in place of traditional navigation systems. Robot navigation is only possible when many subsystems work in tandem such as localization and mapping, motion planning, control, and object tracking. Errors in any one of these subsystems can result in the robot failing to accomplish its task, oftentimes requiring human interventions that diminish the benefits theoretically provided by autonomous robotic systems.
Our first contribution is Direction Approximation through Random Trials (DART), a method for generating human-followable navigation instructions optimized for followability instead of traditional metrics such as path length. We show how this strategy can be extended to robot navigation planning, allowing the robot to compute the sequence of control policies and switching conditions maximizing the likelihood with which the robot will reach its goal. This technique allows robots to select plans based on reliability in addition to efficiency, avoiding error-prone actions or areas of the environment. We also show how DART can be used to build compact, topological maps of its environments, offering opportunities to scale to larger environments.
DART depends on the existence of a set of behaviors and switching conditions describing ways the robot can move through an environment. In the remainder of this thesis, we present methods for learning these behaviors and conditions in indoor environments. To support landmark-based navigation, we show how to train a Convolutional Neural Network (CNN) to distinguish between semantically labeled 2D
occupancy grids generated from LIDAR data. By providing the robot the ability to recognize specific classes of places based on human labels, not only do we support transitioning between control laws, but also provide hooks for human-aided instruction and direction.
Additionally, we suggest a subset of behaviors that provide DART with a sufficient set of actions to navigate in most indoor environments and introduce a method to learn these behaviors from teleloperated demonstrations. Our method learns a cost function suitable for integration into gradient-based control schemes. This enables the robot to execute behaviors in the absence of global knowledge. We present results demonstrating these behaviors working in several environments with varied structure, indicating that they generalize well to new environments.
This work was motivated by the weaknesses and brittleness of many state-of-the-art navigation systems. Reliable navigation is the foundation of any mobile robotic system. It provides access to larger work spaces and enables a wide variety of tasks. Even though navigation systems have continued to improve, catastrophic failures can still occur (e.g. due to an incorrect loop closure) that limit their reliability. Furthermore, as work areas approach the
scale of kilometers, constructing and operating on precise localization maps becomes expensive. These limitations prevent large scale deployments of robots outside of controlled settings and laboratory environments.
The work presented in this thesis is intended to augment or replace traditional navigation systems to mitigate concerns about scalability and reliability by considering the effects of navigation failures for particular actions. By considering these effects when evaluating the actions to take, our framework can adapt navigation strategies to best take advantage of the capabilities of the robot in a given environment. A natural output of our framework is a topological network of actions and switching conditions, providing compact representations of work areas suitable for fast, scalable planning.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/144073/1/rgoeddel_1.pd
Spatial and Temporal Hierarchy for Autonomous Navigation using Active Inference in Minigrid Environment
Robust evidence suggests that humans explore their environment using a
combination of topological landmarks and coarse-grained path integration. This
approach relies on identifiable environmental features (topological landmarks)
in tandem with estimations of distance and direction (coarse-grained path
integration) to construct cognitive maps of the surroundings. This cognitive
map is believed to exhibit a hierarchical structure, allowing efficient
planning when solving complex navigation tasks. Inspired by human behaviour,
this paper presents a scalable hierarchical active inference model for
autonomous navigation, exploration, and goal-oriented behaviour. The model uses
visual observation and motion perception to combine curiosity-driven
exploration with goal-oriented behaviour. Motion is planned using different
levels of reasoning, i.e., from context to place to motion. This allows for
efficient navigation in new spaces and rapid progress toward a target. By
incorporating these human navigational strategies and their hierarchical
representation of the environment, this model proposes a new solution for
autonomous navigation and exploration. The approach is validated through
simulations in a mini-grid environment.Comment: arXiv admin note: text overlap with arXiv:2309.0986
Bayesian learning for multi-agent coordination
Multi-agent systems draw together a number of significant trends in modern technology: ubiquity, decentralisation, openness, dynamism and uncertainty. As work in these fields develops, such systems face increasing challenges. Two particular challenges are decision making in uncertain and partially-observable environments, and coordination with other agents in such environments. Although uncertainty and coordination have been tackled as separate problems, formal models for an integrated approach are typically restricted to simple classes of problem and are not scalable to problems with tens of agents and millions of states.We improve on these approaches by extending a principled Bayesian model into more challenging domains, using Bayesian networks to visualise specific cases of the model and thus as an aid in deriving the update equations for the system. One approach which has been shown to scale well for networked offline problems uses finite state machines to model other agents. We used this insight to develop an approximate scalable algorithm applicable to our general model, in combination with adapting a number of existing approximation techniques, including state clustering.We examine the performance of this approximate algorithm on several cases of an urban rescue problem with respect to differing problem parameters. Specifically, we consider first scenarios where agents are aware of the complete situation, but are not certain about the behaviour of others; that is, our model with all elements but the actions observable. Secondly, we examine the more complex case where agents can see the actions of others, but cannot see the full state and thus are not sure about the beliefs of others. Finally, we look at the performance of the partially observable state model when the system is dynamic or open. We find that our best response algorithm consistently outperforms a handwritten strategy for the problem, more noticeably as the number of agents and the number of states involved in the problem increase
Interactive Learning of Probabilistic Decision Making by Service Robots with Multiple Skill Domains
This thesis makes a contribution to autonomous service robots, centered around two aspects. The first is modeling decision making in the face of incomplete information on top of diverse basic skills of a service robot. Second, based on such a model, it is investigated, how to transfer complex decision-making knowledge into the system. Interactive learning, naturally from both demonstrations of human teachers and in interaction with objects, yields decision-making models applicable by the robot
- ā¦