49 research outputs found

    Perseus: Randomized Point-based Value Iteration for POMDPs

    Full text link
    Partially observable Markov decision processes (POMDPs) form an attractive and principled framework for agent planning under uncertainty. Point-based approximate techniques for POMDPs compute a policy based on a finite set of points collected in advance from the agents belief space. We present a randomized point-based value iteration algorithm called Perseus. The algorithm performs approximate value backup stages, ensuring that in each backup stage the value of each point in the belief set is improved; the key observation is that a single backup may improve the value of many belief points. Contrary to other point-based methods, Perseus backs up only a (randomly selected) subset of points in the belief set, sufficient for improving the value of each belief point in the set. We show how the same idea can be extended to dealing with continuous action spaces. Experimental results show the potential of Perseus in large scale POMDP problems

    Compact Representation of Value Function in Partially Observable Stochastic Games

    Full text link
    Value methods for solving stochastic games with partial observability model the uncertainty about states of the game as a probability distribution over possible states. The dimension of this belief space is the number of states. For many practical problems, for example in security, there are exponentially many possible states which causes an insufficient scalability of algorithms for real-world problems. To this end, we propose an abstraction technique that addresses this issue of the curse of dimensionality by projecting high-dimensional beliefs to characteristic vectors of significantly lower dimension (e.g., marginal probabilities). Our two main contributions are (1) novel compact representation of the uncertainty in partially observable stochastic games and (2) novel algorithm based on this compact representation that is based on existing state-of-the-art algorithms for solving stochastic games with partial observability. Experimental evaluation confirms that the new algorithm over the compact representation dramatically increases the scalability compared to the state of the art

    Compact parametric models for efficient sequential decision making in high-dimensional, uncertain domains

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.Cataloged from PDF version of thesis.Includes bibliographical references (p. 137-144).Within artificial intelligence and robotics there is considerable interest in how a single agent can autonomously make sequential decisions in large, high-dimensional, uncertain domains. This thesis presents decision-making algorithms for maximizing the expected sum of future rewards in two types of large, high-dimensional, uncertain situations: when the agent knows its current state but does not have a model of the world dynamics within a Markov decision process (MDP) framework, and in partially observable Markov decision processes (POMDPs), when the agent knows the dynamics and reward models, but only receives information about its state through its potentially noisy sensors. One of the key challenges in the sequential decision making field is the tradeoff between optimality and tractability. To handle high-dimensional (many variables), large (many potential values per variable) domains, an algorithm must have a computational complexity that scales gracefully with the number of dimensions. However, many prior approaches achieve such scalability through the use of heuristic methods with limited or no guarantees on how close to optimal, and under what circumstances, are the decisions made by the algorithm. Algorithms that do provide rigorous optimality bounds often do so at the expense of tractability. This thesis proposes that the use of parametric models of the world dynamics, rewards and observations can enable efficient, provably close to optimal, decision making in large, high-dimensional uncertain environments.(cont.) In support of this, we present a reinforcement learning (RL) algorithm where the use of a parametric model allows the algorithm to make close to optimal decisions on all but a number of samples that scales polynomially with the dimension, a significant improvement over most prior RL provably approximately optimal algorithms. We also show that parametric models can be used to reduce the computational complexity from an exponential to polynomial dependence on the state dimension in forward search partially observable MDP planning. Under mild conditions our new forward-search POMDP planner maintains prior optimality guarantees on the resulting decisions. We present experimental results on a robot navigation over varying terrain RL task and a large global driving POMDP planning simulation.by Emma Patricia Brunskill.Ph.D

    Can bounded and self-interested agents be teammates? Application to planning in ad hoc teams

    Get PDF
    Planning for ad hoc teamwork is challenging because it involves agents collaborating without any prior coordination or communication. The focus is on principled methods for a single agent to cooperate with others. This motivates investigating the ad hoc teamwork problem in the context of self-interested decision-making frameworks. Agents engaged in individual decision making in multiagent settings face the task of having to reason about other agents’ actions, which may in turn involve reasoning about others. An established approximation that operationalizes this approach is to bound the infinite nesting from below by introducing level 0 models. For the purposes of this study, individual, self-interested decision making in multiagent settings is modeled using interactive dynamic influence diagrams (I-DID). These are graphical models with the benefit that they naturally offer a factored representation of the problem, allowing agents to ascribe dynamic models to others and reason about them. We demonstrate that an implication of bounded, finitely-nested reasoning by a self-interested agent is that we may not obtain optimal team solutions in cooperative settings, if it is part of a team. We address this limitation by including models at level 0 whose solutions involve reinforcement learning. We show how the learning is integrated into planning in the context of I-DIDs. This facilitates optimal teammate behavior, and we demonstrate its applicability to ad hoc teamwork on several problem domains and configurations

    Target Search Planning in Uncertain Environments

    Get PDF
    The autonomous robots are useful for lot of things, such as rescue in dangerous environments. In this thesis, we consider how autonomous robots, the Unmanned Aerial Vehicles (UAVs), make a plan to travel in an indoor uncertain environment. At the same time, the robots will observe and update the environment representations with their on-board sensors and plan the path for each robot in the robot group. They will avoid collisions and cooperate with others in the Complete Mission Process (CMP), which includes all operations of robots before the mission is completed (all targets are visited). The environment cannot be represented exactly because of the inaccurate representation model and the sensor noises. In order to complete the mission efficiently, single robot requires a method to plan a path for efficient travelling from a start point to a target point, plan an assignment for visiting all its targets one by one. For multiple robots in a robot group, we need to plan an allocation for allocating multiple targets to multiple robots in order to make sure that all robots can cooperate together. All these planning operations have to be done based on an inexact representation of the environment. This thesis focuses on the path/assignment/allocation planning problem in environments which are not completely known, based on a reduced/simplified —Partially Observable Markov Decision Process (POMDP) — framework. The former researches only consider the initial plan but neglect the later replans. Our approach considers the plan and the re-plans from the start to the completion of the mission. Our novel Monte Carlo based planning approaches will plan a path for one robot to move efficiently from one point to one target, plan an assignment for one robot to visit multiple targets by travelling the shortest route and plan an allocation for multiple robots to cooperate and visit multiple targets as soon as possible (the planning time plus the travelling time is minimized). Our approach is based on a Monte Carlo sampling strategy. In order to decrease its computational cost, two strategies are proposed. We then extend our approach to multiple robots and multiple targets scenario. Finally, the approaches are extended to multiple robots and multiple targets scenario. They are characterised and evaluated experimentally through simulation. When we compare it with similar methods from the literatures, our approach can provide the better solution

    Value of information in decision systems

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Belief-space Planning for Active Visual SLAM in Underwater Environments.

    Full text link
    Autonomous mobile robots operating in a priori unknown environments must be able to integrate path planning with simultaneous localization and mapping (SLAM) in order to perform tasks like exploration, search and rescue, inspection, reconnaissance, target-tracking, and others. This level of autonomy is especially difficult in underwater environments, where GPS is unavailable, communication is limited, and environment features may be sparsely- distributed. In these situations, the path taken by the robot can drastically affect the performance of SLAM, so the robot must plan and act intelligently and efficiently to ensure successful task completion. This document proposes novel research in belief-space planning for active visual SLAM in underwater environments. Our motivating application is ship hull inspection with an autonomous underwater robot. We design a Gaussian belief-space planning formulation that accounts for the randomness of the loop-closure measurements in visual SLAM and serves as the mathematical foundation for the research in this thesis. Combining this planning formulation with sampling-based techniques, we efficiently search for loop-closure actions throughout the environment and present a two-step approach for selecting revisit actions that results in an opportunistic active SLAM framework. The proposed active SLAM method is tested in hybrid simulations and real-world field trials of an underwater robot performing inspections of a physical modeling basin and a U.S. Coast Guard cutter. To reduce computational load, we present research into efficient planning by compressing the representation and examining the structure of the underlying SLAM system. We propose the use of graph sparsification methods online to reduce complexity by planning with an approximate distribution that represents the original, full pose graph. We also propose the use of the Bayes tree data structure—first introduced for fast inference in SLAM—to perform efficient incremental updates when evaluating candidate plans that are similar. As a final contribution, we design risk-averse objective functions that account for the randomness within our planning formulation. We show that this aversion to uncertainty in the posterior belief leads to desirable and intuitive behavior within active SLAM.PhDMechanical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/133303/1/schaves_1.pd

    Learning Search Strategies from Human Demonstrations

    Get PDF
    Decision making and planning with partial state information is a problem faced by all forms of intelligent entities. The formulation of a problem under partial state information leads to an exorbitant set of choices with associated probabilistic outcomes making its resolution difficult when using traditional planning methods. Human beings have acquired the ability of acting under uncertainty through education and self-learning. Transferring our know-how to artificial agents and robots will make it faster for them to learn and even improve upon us in tasks in which incomplete knowledge is available, which is the objective of this thesis. We model how humans reason with respect to their beliefs and transfer this knowledge in the form of a parameterised policy, following a Programming by Demonstration framework, to a robot apprentice for two spatial navigation tasks: the first task consists of localising a wooden block on a table and for the second task a power socket must be found and connected. In both tasks the human teacher and robot apprentice only rely on haptic and tactile information. We model the human and robot's beliefs by a probability density function which we update through recursive Bayesian state space estimation. To model the reasoning processes of human subjects performing the search tasks we learn a generative joint distribution over beliefs and actions (end-effector velocities) which were recorded during the executions of the task. For the first search task the direct mapping from belief to actions is learned whilst for the second task we incorporate a cost function used to adapt the policy parameters in a Reinforcement Learning framework and show a considerable improvement over solely learning the behaviour with respect to the distance taken to accomplish the task. Both search tasks above can be considered as active localisation as the uncertainty originates only from the position of the agent in the world. We consider searches in which both the position of the robot and features of the environment are uncertain. Given the unstructured nature of the belief a histogram parametrisation of the joint distribution of the robots position and features is necessary. However, naively doing so becomes quickly intractable as the space and time complexity is exponential. We demonstrate that by only parametrising the marginals and by memorising the parameters of the measurement likelihood functions we can recover the exact same solution as the naive parametrisations at a cost which is linear in space and time complexity

    Stochastic Tools for Network Security: Anonymity Protocol Analysis and Network Intrusion Detection

    Get PDF
    With the rapid development of Internet and the sharp increase of network crime, network security has become very important and received a lot of attention. In this dissertation, we model security issues as stochastic systems. This allows us to find weaknesses in existing security systems and propose new solutions. Exploring the vulnerabilities of existing security tools can prevent cyber-attacks from taking advantages of the system weaknesses. We consider The Onion Router (Tor), which is one of the most popular anonymity systems in use today, and show how to detect a protocol tunnelled through Tor. A hidden Markov model (HMM) is used to represent the protocol. Hidden Markov models are statistical models of sequential data like network traffic, and are an effective tool for pattern analysis. New, flexible and adaptive security schemes are needed to cope with emerging security threats. We propose a hybrid network security scheme including intrusion detection systems (IDSs) and honeypots scattered throughout the network. This combines the advantages of two security technologies. A honeypot is an activity-based network security system, which could be the logical supplement of the passive detection policies used by IDSs. This integration forces us to balance security performance versus cost by scheduling device activities for the proposed system. By formulating the scheduling problem as a decentralized partially observable Markov decision process (DEC-POMDP), decisions are made in a distributed manner at each device without requiring centralized control. When using a HMM, it is important to ensure that it accurately represents both the data used to train the model and the underlying process. Current methods assume that observations used to construct a HMM completely represent the underlying process. It is often the case that the training data size is not large enough to adequately capture all statistical dependencies in the system. It is therefore important to know the statistical significance level that the constructed model represents the underlying process, not only the training set. We present a method to determine if the observation data and constructed model fully express the underlying process with a given level of statistical significance. We apply this approach to detecting the existence of protocols tunnelled through Tor. While HMMs are a powerful tool for representing patterns allowing for uncertainties, they cannot be used for system control. The partially observable Markov decision process (POMDP) is a useful choice for controlling stochastic systems. As a combination of two Markov models, POMDPs combine the strength of HMM (capturing dynamics that depend on unobserved states) and that of Markov decision process (MDP) (taking the decision aspect into account). Decision making under uncertainty is used in many parts of business and science. We use here for security tools. We propose three approximation methods for discrete-time infinite-horizon POMDPs. One of the main contributions of our work is high-quality approximation solution for finite-space POMDPs with the average cost criterion, and their extension to DEC-POMDPs. The solution of the first algorithm is built out of the observable portion when the underlying MDP operates optimally. The other two methods presented here can be classified as the policy-based approximation schemes, in which we formulate the POMDP planning as a quadratically constrained linear program (QCLP), which defines an optimal controller of a desired size. This representation allows a wide range of powerful nonlinear programming (NLP) algorithms to be used to solve POMDPs. Simulation results for a set of benchmark problems illustrate the effectiveness of the proposed method. We show how this tool could be used to design a network security framework
    corecore