11 research outputs found

    Perseus: Randomized Point-based Value Iteration for POMDPs

    Full text link
    Partially observable Markov decision processes (POMDPs) form an attractive and principled framework for agent planning under uncertainty. Point-based approximate techniques for POMDPs compute a policy based on a finite set of points collected in advance from the agents belief space. We present a randomized point-based value iteration algorithm called Perseus. The algorithm performs approximate value backup stages, ensuring that in each backup stage the value of each point in the belief set is improved; the key observation is that a single backup may improve the value of many belief points. Contrary to other point-based methods, Perseus backs up only a (randomly selected) subset of points in the belief set, sufficient for improving the value of each belief point in the set. We show how the same idea can be extended to dealing with continuous action spaces. Experimental results show the potential of Perseus in large scale POMDP problems

    Representing hierarchical POMDPs as DBNs for multi-scale robot localization

    No full text
    We explore the advantages of representing hierarchical partially observable Markov decision processes (H-POMDPs) as dynamic Bayesian networks (DBNs). In particular, we focus on the special case of using H-POMDPs to represent multiresolution spatial maps for indoor robot navigation. Our results show that a DBN representation of H-POMDPs can train significantly faster than the original learning algorithm for H-POMDPs or the equivalent flat POMDP, and requires much less data. In addition, the DBN formulation can easily be extended to parameter tying and factoring of variables, which further reduces the time and sample complexity. This enables us to apply H-POMDP methods to much larger problems than previously possible. 1

    Representing hierarchical POMDPs as DBNs for multi-scale robot localization

    No full text
    Abstract — We explore the advantages of representing hierarchical partially observable Markov decision processes (H-POMDPs) as dynamic Bayesian networks (DBNs). In particular, we focus on the special case of using H-POMDPs to represent multi-resolution spatial maps for indoor robot navigation. Our results show that a DBN representation of H-POMDPs can train significantly faster than the original learning algorithm for H-POMDPs or the equivalent flat POMDP, and requires much less data. In addition, the DBN formulation can easily be extended to parameter tying and factoring of variables, which further reduces the time and sample complexity. This enables us to apply H-POMDP methods to much larger problems than previously possible. I

    An Approach for Intention-Driven, Dialogue-Based Web Search

    Get PDF
    Web search engines facilitate the achievement of Web-mediated tasks, including information retrieval, Web page navigation, and online transactions. These tasks often involve goals that pertain to multiple topics, or domains. Current search engines are not suitable for satisfying complex, multi-domain needs due to their lack of interactivity and knowledge. This thesis presents a novel intention-driven, dialogue-based Web search approach that uncovers and combines users\u27 multi-domain goals to provide helpful virtual assistance. The intention discovery procedure uses a hierarchy of Partially Observable Markov Decision Process-based dialogue managers and a backing knowledge base to systematically explore the dialogue\u27s information space, probabilistically refining the perception of user goals. The search approach has been implemented in IDS, a search engine for online gift shopping. A usability study comparing IDS-based searching with Google-based searching found that the IDS-based approach takes significantly less time and effort, and results in higher user confidence in the retrieved results

    Probabilistic Human-Robot Information Fusion

    Get PDF
    This thesis is concerned with combining the perceptual abilities of mobile robots and human operators to execute tasks cooperatively. It is generally agreed that a synergy of human and robotic skills offers an opportunity to enhance the capabilities of today’s robotic systems, while also increasing their robustness and reliability. Systems which incorporate both human and robotic information sources have the potential to build complex world models, essential for both automated and human decision making. In this work, humans and robots are regarded as equal team members who interact and communicate on a peer-to-peer basis. Human-robot communication is addressed using probabilistic representations common in robotics. While communication can in general be bidirectional, this work focuses primarily on human-to-robot information flow. More specifically, the approach advocated in this thesis is to let robots fuse their sensor observations with observations obtained from human operators. While robotic perception is well-suited for lower level world descriptions such as geometric properties, humans are able to contribute perceptual information on higher abstraction levels. Human input is translated into the machine representation via Human Sensor Models. A common mathematical framework for humans and robots reinforces the notion of true peer-to-peer interaction. Human-robot information fusion is demonstrated in two application domains: (1) scalable information gathering, and (2) cooperative decision making. Scalable information gathering is experimentally demonstrated on a system comprised of a ground vehicle, an unmanned air vehicle, and two human operators in a natural environment. Information from humans and robots was fused in a fully decentralised manner to build a shared environment representation on multiple abstraction levels. Results are presented in the form of information exchange patterns, qualitatively demonstrating the benefits of human-robot information fusion. The second application domain adds decision making to the human-robot task. Rational decisions are made based on the robots’ current beliefs which are generated by fusing human and robotic observations. Since humans are considered a valuable resource in this context, operators are only queried for input when the expected benefit of an observation exceeds the cost of obtaining it. The system can be seen as adjusting its autonomy at run-time based on the uncertainty in the robots’ beliefs. A navigation task is used to demonstrate the adjustable autonomy system experimentally. Results from two experiments are reported: a quantitative evaluation of human-robot team effectiveness, and a user study to compare the system to classical teleoperation. Results show the superiority of the system with respect to performance, operator workload, and usability

    Compact parametric models for efficient sequential decision making in high-dimensional, uncertain domains

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.Cataloged from PDF version of thesis.Includes bibliographical references (p. 137-144).Within artificial intelligence and robotics there is considerable interest in how a single agent can autonomously make sequential decisions in large, high-dimensional, uncertain domains. This thesis presents decision-making algorithms for maximizing the expected sum of future rewards in two types of large, high-dimensional, uncertain situations: when the agent knows its current state but does not have a model of the world dynamics within a Markov decision process (MDP) framework, and in partially observable Markov decision processes (POMDPs), when the agent knows the dynamics and reward models, but only receives information about its state through its potentially noisy sensors. One of the key challenges in the sequential decision making field is the tradeoff between optimality and tractability. To handle high-dimensional (many variables), large (many potential values per variable) domains, an algorithm must have a computational complexity that scales gracefully with the number of dimensions. However, many prior approaches achieve such scalability through the use of heuristic methods with limited or no guarantees on how close to optimal, and under what circumstances, are the decisions made by the algorithm. Algorithms that do provide rigorous optimality bounds often do so at the expense of tractability. This thesis proposes that the use of parametric models of the world dynamics, rewards and observations can enable efficient, provably close to optimal, decision making in large, high-dimensional uncertain environments.(cont.) In support of this, we present a reinforcement learning (RL) algorithm where the use of a parametric model allows the algorithm to make close to optimal decisions on all but a number of samples that scales polynomially with the dimension, a significant improvement over most prior RL provably approximately optimal algorithms. We also show that parametric models can be used to reduce the computational complexity from an exponential to polynomial dependence on the state dimension in forward search partially observable MDP planning. Under mild conditions our new forward-search POMDP planner maintains prior optimality guarantees on the resulting decisions. We present experimental results on a robot navigation over varying terrain RL task and a large global driving POMDP planning simulation.by Emma Patricia Brunskill.Ph.D

    Multi-modal on-body sensing of human activities

    Get PDF
    Increased usage and integration of state-of-the-art information technology in our everyday work life aims at increasing the working efficiency. Due to unhandy human-computer-interaction methods this progress does not always result in increased efficiency, for mobile workers in particular. Activity recognition based contextual computing attempts to balance this interaction deficiency. This work investigates wearable, on-body sensing techniques on their applicability in the field of human activity recognition. More precisely we are interested in the spotting and recognition of so-called manipulative hand gestures. In particular the thesis focuses on the question whether the widely used motion sensing based approach can be enhanced through additional information sources. The set of gestures a person usually performs on a specific place is limited -- in the contemplated production and maintenance scenarios in particular. As a consequence this thesis investigates whether the knowledge about the user's hand location provides essential hints for the activity recognition process. In addition, manipulative hand gestures -- due to their object manipulating character -- typically start in the moment the user's hand reaches a specific place, e.g. a specific part of a machinery. And the gestures most likely stop in the moment the hand leaves the position again. Hence this thesis investigates whether hand location can help solving the spotting problem. Moreover, as user-independence is still a major challenge in activity recognition, this thesis investigates location context as a possible key component in a user-independent recognition system. We test a Kalman filter based method to blend absolute position readings with orientation readings based on inertial measurements. A filter structure is suggested which allows up-sampling of slow absolute position readings, and thus introduces higher dynamics to the position estimations. In such a way the position measurement series is made aware of wrist motions in addition to the wrist position. We suggest location based gesture spotting and recognition approaches. Various methods to model the location classes used in the spotting and recognition stages as well as different location distance measures are suggested and evaluated. In addition a rather novel sensing approach in the field of human activity recognition is studied. This aims at compensating drawbacks of the mere motion sensing based approach. To this end we develop a wearable hardware architecture for lower arm muscular activity measurements. The sensing hardware based on force sensing resistors is designed to have a high dynamic range. In contrast to preliminary attempts the proposed new design makes hardware calibration unnecessary. Finally we suggest a modular and multi-modal recognition system; modular with respect to sensors, algorithms, and gesture classes. This means that adding or removing a sensor modality or an additional algorithm has little impact on the rest of the recognition system. Sensors and algorithms used for spotting and recognition can be selected and fine-tuned separately for each single activity. New activities can be added without impact on the recognition rates of the other activities

    Graphical Models: Modeling, Optimization, and Hilbert Space Embedding

    No full text
    Over the past two decades graphical models have been widely used as a powerful tool for compactly representing distributions. On the other hand, kernel methods have also been used extensively to come up with rich representations. This thesis aims to combine graphical models with kernels to produce compact models with rich representational abilities. The following four areas are our focus. 1. Conditional random fields for multi-agent reinforcement learning. Conditional random fields (CRFs) are graphical models for modeling the probability of labels given the observations. They have traditionally assumed that, conditioned on the training data, the label sequences of different training examples are independent and identically distributed (iid). We extended the use of CRFs to a class of temporal learning algorithms, namely policy gradient reinforcement learning (RL). Now the labels are no longer iid. They are actions that update the environment and affect the next observation. From an RL point of view, CRFs provide a natural way to model joint actions in a decentralized Markov decision process. Using tree sampling for inference, our experiment shows the RL methods employing CRFs clearly outperform those which do not model the proper joint policy. 2. Bayesian online multi-label classification. Gaussian density filtering provides fast and effective inference for graphical models (Maybeck, 1982). Based on it, we propose a Bayesian online multi-label classification (BOMC) framework which learns a probabilistic model of the linear classifier. The training labels are incorporated to update the posterior of the classifiers via a graphical model similar to TrueSkill (Herbrich et al, 2007). Using samples from the posterior, we label the test data by maximizing the expected F1-score. In our experiments, BOMC delivers significantly higher macro-averaged F1-score than the state-of-the-art online maximum margin learners. 3. Hilbert space embedment of distributions. Graphical models are also an essential tool in kernel measures of independence for non-iid data. Traditional information theory often requires density estimation, which makes it unideal for statistical estimation. Motivated by the fact that distributions often appear in machine learning via expectations, we can characterize the distance between distributions in terms of distances between means, especially means in reproducing kernel Hilbert spaces which are called kernel embeddings. Under this framework, the undirected graphical models further allow us to factorize the kernel embeddings onto cliques, which yields efficient measures of independence for non-iid data (Zhang et al, 2009). 4. Optimization in maximum margin models for structured data. Maximum margin estimation for structured data is an important task where graphical models also play a key role. They are special cases of regularized risk minimization, for which bundle methods (BMRM, Teo et al, 2007) are a state-of-the-art general purpose solver. Smola et al (2007) proved that BMRM requires O(1/epsilon) iterations to converge to an epsilon accurate solution, and we further show that this rate hits the lower bound. Motivated by (Nesterov 2003, 2005), we utilized the composite structure of the objective function and devised an algorithm for the structured loss which converges to an epsilon accurate solution in O(1/sqrt{epsilon}) iterations
    corecore