179 research outputs found

    Stochastic Shortest Path with Energy Constraints in POMDPs

    Full text link
    We consider partially observable Markov decision processes (POMDPs) with a set of target states and positive integer costs associated with every transition. The traditional optimization objective (stochastic shortest path) asks to minimize the expected total cost until the target set is reached. We extend the traditional framework of POMDPs to model energy consumption, which represents a hard constraint. The energy levels may increase and decrease with transitions, and the hard constraint requires that the energy level must remain positive in all steps till the target is reached. First, we present a novel algorithm for solving POMDPs with energy levels, developing on existing POMDP solvers and using RTDP as its main method. Our second contribution is related to policy representation. For larger POMDP instances the policies computed by existing solvers are too large to be understandable. We present an automated procedure based on machine learning techniques that automatically extracts important decisions of the policy allowing us to compute succinct human readable policies. Finally, we show experimentally that our algorithm performs well and computes succinct policies on a number of POMDP instances from the literature that were naturally enhanced with energy levels.Comment: Technical report accompanying a paper published in proceedings of AAMAS 201

    A Survey of Knowledge-based Sequential Decision Making under Uncertainty

    Get PDF
    Reasoning with declarative knowledge (RDK) and sequential decision-making (SDM) are two key research areas in artificial intelligence. RDK methods reason with declarative domain knowledge, including commonsense knowledge, that is either provided a priori or acquired over time, while SDM methods (probabilistic planning and reinforcement learning) seek to compute action policies that maximize the expected cumulative utility over a time horizon; both classes of methods reason in the presence of uncertainty. Despite the rich literature in these two areas, researchers have not fully explored their complementary strengths. In this paper, we survey algorithms that leverage RDK methods while making sequential decisions under uncertainty. We discuss significant developments, open problems, and directions for future work

    Stochastic Tools for Network Security: Anonymity Protocol Analysis and Network Intrusion Detection

    Get PDF
    With the rapid development of Internet and the sharp increase of network crime, network security has become very important and received a lot of attention. In this dissertation, we model security issues as stochastic systems. This allows us to find weaknesses in existing security systems and propose new solutions. Exploring the vulnerabilities of existing security tools can prevent cyber-attacks from taking advantages of the system weaknesses. We consider The Onion Router (Tor), which is one of the most popular anonymity systems in use today, and show how to detect a protocol tunnelled through Tor. A hidden Markov model (HMM) is used to represent the protocol. Hidden Markov models are statistical models of sequential data like network traffic, and are an effective tool for pattern analysis. New, flexible and adaptive security schemes are needed to cope with emerging security threats. We propose a hybrid network security scheme including intrusion detection systems (IDSs) and honeypots scattered throughout the network. This combines the advantages of two security technologies. A honeypot is an activity-based network security system, which could be the logical supplement of the passive detection policies used by IDSs. This integration forces us to balance security performance versus cost by scheduling device activities for the proposed system. By formulating the scheduling problem as a decentralized partially observable Markov decision process (DEC-POMDP), decisions are made in a distributed manner at each device without requiring centralized control. When using a HMM, it is important to ensure that it accurately represents both the data used to train the model and the underlying process. Current methods assume that observations used to construct a HMM completely represent the underlying process. It is often the case that the training data size is not large enough to adequately capture all statistical dependencies in the system. It is therefore important to know the statistical significance level that the constructed model represents the underlying process, not only the training set. We present a method to determine if the observation data and constructed model fully express the underlying process with a given level of statistical significance. We apply this approach to detecting the existence of protocols tunnelled through Tor. While HMMs are a powerful tool for representing patterns allowing for uncertainties, they cannot be used for system control. The partially observable Markov decision process (POMDP) is a useful choice for controlling stochastic systems. As a combination of two Markov models, POMDPs combine the strength of HMM (capturing dynamics that depend on unobserved states) and that of Markov decision process (MDP) (taking the decision aspect into account). Decision making under uncertainty is used in many parts of business and science. We use here for security tools. We propose three approximation methods for discrete-time infinite-horizon POMDPs. One of the main contributions of our work is high-quality approximation solution for finite-space POMDPs with the average cost criterion, and their extension to DEC-POMDPs. The solution of the first algorithm is built out of the observable portion when the underlying MDP operates optimally. The other two methods presented here can be classified as the policy-based approximation schemes, in which we formulate the POMDP planning as a quadratically constrained linear program (QCLP), which defines an optimal controller of a desired size. This representation allows a wide range of powerful nonlinear programming (NLP) algorithms to be used to solve POMDPs. Simulation results for a set of benchmark problems illustrate the effectiveness of the proposed method. We show how this tool could be used to design a network security framework

    Intention-Aware Motion Planning

    Get PDF
    As robots venture into new application domains as autonomous vehicles on the road or as domestic helpers at home, they must recognize human intentions and behaviors in order to operate effectively. This paper investigates a new class of motion planning problems with uncertainty in human intention. We propose a method for constructing a practical model by assuming a finite set of unknown intentions. We first construct a motion model for each intention in the set and then combine these models together into a single Mixed Observability Markov Decision Process (MOMDP), which is a structured variant of the more common Partially Observable Markov Decision Process (POMDP). By leveraging the latest advances in POMDP/MOMDP approximation algorithms, we can construct and solve moderately complex models for interesting robotic tasks. Experiments in simulation and with an autonomous vehicle show that the proposed method outperforms common alternatives because of its ability in recognizing intentions and using the information effectively for decision making.Singapore-MIT Alliance for Research and Technology (SMART) (grant R-252- 000-447-592)Singapore-MIT GAMBIT Game Lab (grant R-252-000-398-490)Singapore. Ministry of Education (AcRF grant 2010-T2-2-071

    REBA: A Refinement-Based Architecture for Knowledge Representation and Reasoning in Robotics

    Get PDF
    This paper describes an architecture for robots that combines the complementary strengths of probabilistic graphical models and declarative programming to represent and reason with logic-based and probabilistic descriptions of uncertainty and domain knowledge. An action language is extended to support non-boolean fluents and non-deterministic causal laws. This action language is used to describe tightly-coupled transition diagrams at two levels of granularity, with a fine-resolution transition diagram defined as a refinement of a coarse-resolution transition diagram of the domain. The coarse-resolution system description, and a history that includes (prioritized) defaults, are translated into an Answer Set Prolog (ASP) program. For any given goal, inference in the ASP program provides a plan of abstract actions. To implement each such abstract action, the robot automatically zooms to the part of the fine-resolution transition diagram relevant to this action. A probabilistic representation of the uncertainty in sensing and actuation is then included in this zoomed fine-resolution system description, and used to construct a partially observable Markov decision process (POMDP). The policy obtained by solving the POMDP is invoked repeatedly to implement the abstract action as a sequence of concrete actions, with the corresponding observations being recorded in the coarse-resolution history and used for subsequent reasoning. The architecture is evaluated in simulation and on a mobile robot moving objects in an indoor domain, to show that it supports reasoning with violation of defaults, noisy observations and unreliable actions, in complex domains.Comment: 72 pages, 14 figure

    Can bounded and self-interested agents be teammates? Application to planning in ad hoc teams

    Get PDF
    Planning for ad hoc teamwork is challenging because it involves agents collaborating without any prior coordination or communication. The focus is on principled methods for a single agent to cooperate with others. This motivates investigating the ad hoc teamwork problem in the context of self-interested decision-making frameworks. Agents engaged in individual decision making in multiagent settings face the task of having to reason about other agents’ actions, which may in turn involve reasoning about others. An established approximation that operationalizes this approach is to bound the infinite nesting from below by introducing level 0 models. For the purposes of this study, individual, self-interested decision making in multiagent settings is modeled using interactive dynamic influence diagrams (I-DID). These are graphical models with the benefit that they naturally offer a factored representation of the problem, allowing agents to ascribe dynamic models to others and reason about them. We demonstrate that an implication of bounded, finitely-nested reasoning by a self-interested agent is that we may not obtain optimal team solutions in cooperative settings, if it is part of a team. We address this limitation by including models at level 0 whose solutions involve reinforcement learning. We show how the learning is integrated into planning in the context of I-DIDs. This facilitates optimal teammate behavior, and we demonstrate its applicability to ad hoc teamwork on several problem domains and configurations

    A Survey on Causal Reinforcement Learning

    Full text link
    While Reinforcement Learning (RL) achieves tremendous success in sequential decision-making problems of many domains, it still faces key challenges of data inefficiency and the lack of interpretability. Interestingly, many researchers have leveraged insights from the causality literature recently, bringing forth flourishing works to unify the merits of causality and address well the challenges from RL. As such, it is of great necessity and significance to collate these Causal Reinforcement Learning (CRL) works, offer a review of CRL methods, and investigate the potential functionality from causality toward RL. In particular, we divide existing CRL approaches into two categories according to whether their causality-based information is given in advance or not. We further analyze each category in terms of the formalization of different models, ranging from the Markov Decision Process (MDP), Partially Observed Markov Decision Process (POMDP), Multi-Arm Bandits (MAB), and Dynamic Treatment Regime (DTR). Moreover, we summarize the evaluation matrices and open sources while we discuss emerging applications, along with promising prospects for the future development of CRL.Comment: 29 pages, 20 figure

    Probabilistic Motion Planning for Automated Vehicles

    Get PDF
    This thesis targets the problem of motion planning for automated vehicles. As a prerequisite for their on-road deployment, automated vehicles must show an appropriate and reliable driving behavior in mixed traffic, i.e. alongside human drivers. Besides the uncertainties resulting from imperfect perception, occlusions and limited sensor range, also the uncertainties in the behavior of other traffic participants have to be considered. Related approaches for motion planning in mixed traffic often employ a deterministic problem formulation. The solution of such formulations is restricted to a single trajectory. Deviations from the prediction of other traffic participants are accounted for during replanning, while large uncertainties lead to conservative and over-cautious behavior. As a result of the shortcomings of these formulations in cooperative scenarios and scenarios with severe uncertainties, probabilistic approaches are pursued. Due to the need for real-time capability, however, a holistic uncertainty treatment often induces a strong limitation of the action space of automated vehicles. Moreover, safety and traffic rule compliance are often not considered. Thus, in this work, three motion planning approaches and a scenario-based safety approach are presented. The safety approach is based on an existing concept, which targets the guarantee that automated vehicles will never cause accidents. This concept is enhanced by the consideration of traffic rules for crossing and merging traffic, occlusions, limited sensor range and lane changes. The three presented motion planning approaches are targeted towards the different predominant uncertainties in different scenarios, while operating in a continuous action space. For non-interactive scenarios with clear precedence, a probabilistic approach is presented. The problem is modeled as a partially observable Markov decision process (POMDP). In contrast to existing approaches, the underlying assumption is that the prediction of the future progression of the uncertainty in the behavior of other traffic participants can be performed independently of the automated vehicle\u27s motion plan. In addition to this prediction of currently visible traffic participants, the influence of occlusions and limited sensor range is considered. Despite its thorough uncertainty consideration, the presented approach facilitates planning in a continuous action space. Two further approaches are targeted towards the predominant uncertainties in interactive scenarios. In order to facilitate lane changes in dense traffic, a rule-based approach is proposed. The latter seeks to actively reduce the uncertainty in whether other vehicles willingly make room for a lane change. The generated trajectories are safe and traffic rule compliant with respect to the presented safety approach. To facilitate cooperation in scenarios without clear precedence, a multi-agent approach is presented. The globally optimal solution to the multi-agent problem is first analyzed regarding its ambiguity. If an unambiguous, cooperative solution is found, it is pursued. Still, the compliance of other vehicles with the presumed cooperation model is checked, and a conservative fallback trajectory is pursued in case of non-compliance. The performance of the presented approaches is shown in various scenarios with intersecting lanes, partly with limited visibility, as well as lane changes and a narrowing without predefined right of way
    • …
    corecore