Search CORE

408 research outputs found

Towards effective planning strategies for robots in recycling

Author: Suárez Hernández Alejandro
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2018
Field of study

This work presents several ideas for planning under uncertainty. We seek to recycle electromechanical devices with a robotic arm. We resort to the Markov Decision Process formulation. In order to avoid scalability issues, we employ determinization techniques and hierarchical planning

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

General-Purpose Planning Algorithms In Partially-Observable Stochastic Games

Author: McKenney Bryan
Publication venue: University of New Hampshire Scholars\u27 Repository
Publication date: 01/01/2023
Field of study

Partially observable stochastic games (POSGs) are difficult domains to plan in because they feature multiple agents with potentially opposing goals, parts of the world are hidden from the agents, and some actions have random outcomes. It is infeasible to solve a large POSG optimally. While it may be tempting to design a specialized algorithm for finding suboptimal solutions to a particular POSG, general-purpose planning algorithms can work just as well, but with less complexity and domain knowledge required. I explore this idea in two different POSGs: Navy Defense and Duelyst. In Navy Defense, I show that a specialized algorithm framework, goal-driven autonomy, which requires a complex subsystem separate from the planner for explicitly reasoning about goals, is unnecessary, as simple general planners such as hindsight optimization exhibit implicit goal reasoning and have strong performance. In Duelyst, I show that a specialized expert-rule-based AI can be consistently beaten by a simple general planner using only a small amount of domain knowledge. I also introduce a modification to Monte Carlo tree search that increases performance when rollouts are slow and there are time constraints on planning

UNH Scholars' Repository

Recommended from our members

Domain-Independent Planning for Markov Decision Processes with Factored State and Action Spaces

Author: Raghavan Aswin
Publication venue: 'Oregon State University'
Publication date
Field of study

Markov Decision Processes (MDPs) are the de-facto formalism for studying sequential decision making problems with uncertainty, ranging from classical problems such as inventory control and path planning, to more complex problems such as reservoir control under rainfall uncertainty and emergency response optimization for fire and medical emergencies. Most prior research has focused on exact and approximate solutions to MDPs with factored states, assuming a small number of actions. In contrast to this, many applications are most naturally modeled as having factored actions described in terms of multiple action variables. In this thesis we study domain-independent algorithms that leverage the factored action structure in the MDP dynamics and reward, and scale better than treating each of the exponentially many joint actions as atomic. Our contributions are three-fold based on three fundamental approaches to MDP planning namely exact solution using symbolic dynamic programming (DP), anytime online planning using heuristic search and online action selection using hindsight optimization. The first part is focused on deriving optimal policies over all states for MDPs whose state and action space are described in terms of multiple discrete random variables. In order to capture the factored action structure, we introduce new symbolic operators for computing DP updates over all states efficiently by leveraging the abstract and symbolic representation of Decision Diagrams. Addressing the potential bottleneck of diagrammatic blowup in these operators we present a novel and optimal policy iteration algorithm that emphasizes the diagrammatic compactness of the intermediate value functions and policies. The impact is seen in experiments on the well-studied problems of inventory control and system administration where our algorithm is able to exploit the increasing compactness of the optimal policy with increasing complexity of the action space. Under the framework of anytime planning, the second part expands the scalability of our approach to factored actions by restricting its attention to the reachable part of the state space. Our contribution is the introduction of new symbolic generalization operators that guarantee a more moderate use of space and time while providing non-trivial generalization. These operators yield anytime algorithms that guarantee convergence to the optimal value and action for the current world state, while guaranteeing bounded growth in the size of the symbolic representation. We empirically show that our online algorithm is successfully able to combine forward search from an initial state with backwards generalized DP updates on symbolic states. The third part considers a general class of hybrid (mixed discrete and continuous) state and action (HSA) MDPs. Whereas the insights from the above approaches are valid for hybrid MDPs as well, there are significant limitations inherent to the DP approach. Existing solvers for hybrid state and action MDPs are either limited to very restricted transition distributions, require knowledge of domain-specific basis functions to achieve good approximations, or do not scale. We explore a domain-independent approach based on the framework of hindsight optimization (HOP) for HSA-MDPs, which uses an upper bound on the finite-horizon action values for action selection. Our main contribution is a linear time reduction to a Mixed Integer Linear Program (MILP) that encodes the HOP objective, when the dynamics are specified as location-scale probability distributions parametrized by Piecewise Linear (PWL) functions of states and actions. In addition, we show how to use the same machinery to select actions based on a lower-bound generated by straight-line plans. Our empirical results show that the HSA-HOP approach effectively scales to high-dimensional problems and outperforms baselines that are capable of scaling to such large hybrid MDPs. In a concluding case study, we cast the real-time dispatch optimization problem faced by the Corvallis Fire Department as an HSA-MDP with factored actions. We show that our domain-independent planner significantly improves upon the responsiveness of the baseline that dispatches the nearest responders

ScholarsArchive@OSU

Recommended from our members

Scheduling and Online Planning in Stochastic Diffusion Networks

Author: Xue Shan
Publication venue: 'Oregon State University'
Publication date
Field of study

Diﬀusion processes in networks are common models for many domains, including species colonization, information/idea cascade, disease propagation and ﬁre spreading. In diﬀusion networks, a diﬀusion event occurs when a behavior spreads from one node to the other following a probabilistic model, where the behavior could be species, an idea, a virus, ﬁre, etc. In the real world, in addition to observing diﬀusion processes, people are usually able to control the inﬂuence of diﬀusion by conducting operations on each individual node or node groups. Then the diﬀusion network control problem is to decide how to perform possible controls in order to maximize or minimize the range of diﬀusion, especially when there is a limited resource for doing controls. Diﬀusion network control problems are challenging for most AI planning techniques. The complexity comes from highly stochastic exogenous events, a large action branching factor (the number of combinations of individual operations), a long time horizon, and the need to reason about numeric resource limits. In this thesis, we explore approaches that oﬀer high-quality policies of controlling diﬀusion processes in large-scale networks. We ﬁrst propose a non-adaptive policy in conservation planning, where the goal is to encourage species spread in a long term. Given a set of control operations of interest, this policy speciﬁes the deadline of taking each operation, so that the resource is used with the most ﬂexibility while keeping the loss of diﬀusion inﬂuence within a desired ratio. This is particularly applicable in cases where a domain expert can develop a set of control operations that captures their own objectives. Then our approach provides a way of trading oﬀ diﬀusion inﬂuence and resource usage. We further propose a fully adaptive approach for this conservation planning problem by computing a Hindsight Optimization (HOP) solution at every time step. Instead of computing a HOP action in the traditional way which is linear in the number of actions, we take advantage of its separable structure and develop an eﬀective algorithm that scales for exponentially large, factored action spaces. From experiments on both synthetic and real data sets, we show that our algorithm returns near-optimal HOP solutions while scaling to large problems. Moreover, we extend our implementation of HOP policy to a general framework of online planning for diﬀusion network control problems. In particular, we give a general and formal representation of diﬀusion network problems. Our framework proposes a schema of eﬀectively computing multiple lookahead policies, some of which have been successfully applied to various probabilistic planning problems. We evaluate our ap-proach on diﬀusion network control problems in conservation planning, epidemic control and ﬁreﬁghting. The experimental results demonstrate the behaviors of these lookahead policies and the advantage of each in diﬀerent domains

ScholarsArchive@OSU

A survey on policy search algorithms for learning robot controllers in a handful of trials

Author: Calinon Sylvain
Chatzilygeroudis Konstantinos
Mouret Jean-Baptiste
Stulp Freek
Vassiliades Vassilis
Publication venue
Publication date: 04/12/2019
Field of study

Most policy search algorithms require thousands of training episodes to find an effective policy, which is often infeasible with a physical robot. This survey article focuses on the extreme other end of the spectrum: how can a robot adapt with only a handful of trials (a dozen) and a few minutes? By analogy with the word "big-data", we refer to this challenge as "micro-data reinforcement learning". We show that a first strategy is to leverage prior knowledge on the policy structure (e.g., dynamic movement primitives), on the policy parameters (e.g., demonstrations), or on the dynamics (e.g., simulators). A second strategy is to create data-driven surrogate models of the expected reward (e.g., Bayesian optimization) or the dynamical model (e.g., model-based policy search), so that the policy optimizer queries the model instead of the real system. Overall, all successful micro-data algorithms combine these two strategies by varying the kind of model and prior knowledge. The current scientific challenges essentially revolve around scaling up to complex robots (e.g., humanoids), designing generic priors, and optimizing the computing time.Comment: 21 pages, 3 figures, 4 algorithms, accepted at IEEE Transactions on Robotic

arXiv.org e-Print Archive

Institute of Transport Research:Publications

ZENODO

INRIA a CCSD electronic archive server

HAL Descartes

HAL-Rennes 1