63 research outputs found

    Tractable POMDP-planning for robots with complex non-linear dynamics

    Get PDF
    Planning under partial observability is an essential capability of autonomous robots. While robots operate in the real world, they are inherently subject to various uncertainties such a control and sensing errors, and limited information regarding the operating environment.Conceptually these type of planning problems can be solved in a principled manner when framed as a Partially Observable Markov Decision Process (POMDP). POMDPs model the aforementioned uncertainties as conditional probability functions and estimate the state of the system as probability functions over the state space, called beliefs. Instead of computing the best strategy with respect to single states, POMDP solvers compute the best strategy with respect to beliefs. Solving a POMDP exactly is computationally intractable in general.However, in the past two decades we have seen tremendous progress in the development of approximately optimal solvers that trade optimality for computational tractability. Despite this progress, approximately solving POMDPs for systems with complex non-linear dynamics remains challenging. Most state-of-the-art solvers rely on a large number of expensive forward simulations of the system to find an approximate-optimal strategy. For systems with complex non-linear dynamics that admit no closed-form solution, this strategy can become prohibitively expensive. Another difficulty in applying POMDPs to physical robots with complex transition dynamics is the fact that almost all implementations of state-of-the-art on-line POMDP solvers restrict the user to specific data structures for the POMDP model, and the model has to be hard-coded within the solver implementation. This, in turn, severely hinders the process of applying POMDPs to physical robots.In this thesis we aim to make POMDPs more practical for realistic robotic motion planning tasks under partial observability. We show that systematic approximations of complex, non-linear transition dynamics can be used to design on-line POMDP solvers that are more efficient than current solvers. Furthermore, we propose a new software-framework that supports the user in modeling complex planning problems under uncertainty with minimal implementation effort

    Learning Augmented, Multi-Robot Long-Horizon Navigation in Partially Mapped Environments

    Full text link
    We present a novel approach for efficient and reliable goal-directed long-horizon navigation for a multi-robot team in a structured, unknown environment by predicting statistics of unknown space. Building on recent work in learning-augmented model based planning under uncertainty, we introduce a high-level state and action abstraction that lets us approximate the challenging Dec-POMDP into a tractable stochastic MDP. Our Multi-Robot Learning over Subgoals Planner (MR-LSP) guides agents towards coordinated exploration of regions more likely to reach the unseen goal. We demonstrate improvement in cost against other multi-robot strategies; in simulated office-like environments, we show that our approach saves 13.29% (2 robot) and 4.6% (3 robot) average cost versus standard non-learned optimistic planning and a learning-informed baseline.Comment: 7 pages, 7 figures, ICRA202

    General-Purpose Planning Algorithms In Partially-Observable Stochastic Games

    Get PDF
    Partially observable stochastic games (POSGs) are difficult domains to plan in because they feature multiple agents with potentially opposing goals, parts of the world are hidden from the agents, and some actions have random outcomes. It is infeasible to solve a large POSG optimally. While it may be tempting to design a specialized algorithm for finding suboptimal solutions to a particular POSG, general-purpose planning algorithms can work just as well, but with less complexity and domain knowledge required. I explore this idea in two different POSGs: Navy Defense and Duelyst. In Navy Defense, I show that a specialized algorithm framework, goal-driven autonomy, which requires a complex subsystem separate from the planner for explicitly reasoning about goals, is unnecessary, as simple general planners such as hindsight optimization exhibit implicit goal reasoning and have strong performance. In Duelyst, I show that a specialized expert-rule-based AI can be consistently beaten by a simple general planner using only a small amount of domain knowledge. I also introduce a modification to Monte Carlo tree search that increases performance when rollouts are slow and there are time constraints on planning

    The Dynamic Defense of Network as POMDP and the DESPOT POMDP solver

    Get PDF
    Όλοι ακούμε για την Τεχνητή Νοημοσύνη που τα τελευταία χρόνια αποτελεί όλο και μεγαλύτερο κομμάτι της ζωής μας με εφαρμογές που οι περισσότεροι δε θα φανταζόμασταν ποτέ. Η αναπαράσταση του πραγματικού κόσμου απαιτεί πολύπλοκα μοντέλα που να μπορούμε να δώσουμε σε πράκτορες και να δούμε πώς θα ενεργήσουν. Οι Μαρκοβιανές Διαδικασίες Αποφάσεων (MDP) και κυρίως οι Μερικώς Παρατηρούμενες Μαρκοβιανές Διαδικασίές Αποφάσεων (POMDP) αφορούν τη λήψη αποφάσεων υπό αβεβαιότητα και βοηθούν ιδιαίτερα στην πιστή αναπαράσταση ενός περιβάλλοντος. Οι δυνατότητες φαίνονται ατελείωτες, καθώς οι εφαρμογές κυμαίνονται από «έξυπνους» παίκτες παιγνίων μέχρι αυτοματοποιημένα συστήματα οδήγησης. Ένα τέτοιο πρόβλημα που κεντρίζει συνεχώς το ενδιαφέρον είναι η αυτοματοποιημένη άμυνα ενός δικτύου, δηλαδή ένα δίκτυο που προστατεύεται μόνο του από επίδοξους εισβολείς, προβλέποντας τις κινήσεις τους και παίρνοντας τα κατάλληλα μέτρα ώστε να τους αποτρέψει από το να φτάσουν σε ζωτικά σημεία του δικτύου. Οι επιτηθέμενοι δεν κάνουν απλές ενέργειες, αλλά χρησιμοποιούν πολύπλοκες τακτικές συνδυάζοντας πολλά τρωτά σημεία του δικτύου κι έτσι η ανάπτυξη ενός τέτοιου συστήματος άμυνας καθίσταται αρκετά δύσκολη. Αν και μπορούμε να αναπαραστίσουμε το πρόβλημα αρκετά πιστά σαν POMDP, υπάρχει το ζήτημα της γρήγορης επίλυσης, καθώς το POMDP μοντέλο είναι ήδη περιπλεγμένο αυτό καθ’αυτό. Οι ερευνητές, λοιπόν, εστιάζουν την προσοχή τους στην ανάπτυξη γρήγορων αλγορίθμων που να μπορούν να λύνουν αυτά τα προβλήματα σε ρεαλιστικές καταστάσεις. Αρχικά, θα εισάγουμε τις βασικές έννοιες και πληροφορίες προκειμένου να γίνει κατανοητό το MDP μοντέλο και θα συνεχίσουμε με το POMDP που επεκτείνει το προηγόυμενο, κάνοντάς το ρεαλιστικά εφαρμόσιμο. Έπειτα, γίνεται η παρουσίαση του προβλήματος της αυτοματοποιημένης άμυνας σαν POMDP και καταλήγουμε στον αλγόριθμο DESPOT, που είναι από τους καλύτερους που μπορούν να ανταπεξέλθουν σε POMDP προβλήματα τέτοιας κλίμακας.In recent years, artificial intelligence becomes all the more significant for our lives with many applications most of us would not even imagine. Representing the real world demands sophisticated models, which we “feed” to agents to see how they will respond. This is where Markov Decision Processes (MDPs) and Partially Observed Markov Decision Processes (POMDPs) shine. POMDPs provide us with a general framework to depict many different kinds of problems. The capabilities seem endless; from agents that play games optimally to driverless cars. One of these problems that is becoming more and more relevant today is the dynamic defense of a cyber network, which basically means a network that protects itself from intruders in real time by trying to predict their moves and stop them from progressing further into the network and reaching vital points. The development of such a defense system is complicated, since the attackers do not use simplistic methods, but instead rely on a complex sequence of exploits, combining many vulnerabilities. The POMDP model can provide a quite realistic representation of this problem. However, as with most demanding problems modeled as such, it is difficult to solve them efficiently due to the complicated structure of the POMDP model itself. Researchers focus on creating sufficient algorithms that can tackle these problems in realistic situations. We will begin with introducing the basic information needed to understand the MDP model and then we continue with the POMDP model which extends the idea to more realistic applications. Then, we can present the formulation of the dynamic defense problem as POMDP and after that we take a look into the DESPOT POMDP solver, which is one of the best algorithms to scale up and cope with such complicated problems
    corecore