408 research outputs found
Towards effective planning strategies for robots in recycling
This work presents several ideas for planning under uncertainty. We seek to recycle electromechanical devices with a robotic arm. We resort to the Markov Decision Process formulation. In order to avoid scalability issues, we employ determinization techniques and hierarchical planning
General-Purpose Planning Algorithms In Partially-Observable Stochastic Games
Partially observable stochastic games (POSGs) are difficult domains to plan in because they feature multiple agents with potentially opposing goals, parts of the world are hidden from the agents, and some actions have random outcomes. It is infeasible to solve a large POSG optimally. While it may be tempting to design a specialized algorithm for finding suboptimal solutions to a particular POSG, general-purpose planning algorithms can work just as well, but with less complexity and domain knowledge required. I explore this idea in two different POSGs: Navy Defense and Duelyst.
In Navy Defense, I show that a specialized algorithm framework, goal-driven autonomy, which requires a complex subsystem separate from the planner for explicitly reasoning about goals, is unnecessary, as simple general planners such as hindsight optimization exhibit implicit goal reasoning and have strong performance.
In Duelyst, I show that a specialized expert-rule-based AI can be consistently beaten by a simple general planner using only a small amount of domain knowledge. I also introduce a modification to Monte Carlo tree search that increases performance when rollouts are slow and there are time constraints on planning
Recommended from our members
Domain-Independent Planning for Markov Decision Processes with Factored State and Action Spaces
Markov Decision Processes (MDPs) are the de-facto formalism for studying sequential decision making problems with uncertainty, ranging from classical problems such as inventory control and path planning, to more complex problems such as reservoir control under rainfall uncertainty and emergency response optimization for fire and medical emergencies. Most prior research has focused on exact and approximate solutions to MDPs with factored states, assuming a small number of actions. In contrast to this, many applications are most naturally modeled as having factored actions described in terms of multiple action variables. In this thesis we study domain-independent algorithms that leverage the factored action structure in the MDP dynamics and reward, and scale better than treating each of the exponentially many joint actions as atomic. Our contributions are three-fold based on three fundamental approaches to MDP planning namely exact solution using symbolic dynamic programming (DP), anytime online planning using heuristic search and online action selection using hindsight optimization.
The first part is focused on deriving optimal policies over all states for MDPs whose state and action space are described in terms of multiple discrete random variables. In order to capture the factored action structure, we introduce new symbolic operators for computing DP updates over all states
efficiently by leveraging the abstract and symbolic representation of Decision Diagrams. Addressing the potential bottleneck of diagrammatic blowup in these operators we present a novel
and optimal policy iteration algorithm that emphasizes the diagrammatic compactness of the intermediate value functions and policies. The impact is seen in experiments on the well-studied problems of inventory control and system administration where our algorithm is able to exploit the increasing compactness of the optimal policy with increasing complexity of the action space.
Under the framework of anytime planning, the second part expands the scalability of our approach to factored actions by restricting its attention to the reachable part of the state space. Our contribution is the introduction of new symbolic generalization operators that guarantee a more moderate use of space and time while providing non-trivial generalization. These operators yield anytime algorithms that guarantee convergence to the optimal value and action for the current world state, while guaranteeing bounded growth in the size of the symbolic representation. We empirically show that our online algorithm is successfully able to combine forward search from an initial state with backwards generalized DP updates on symbolic states.
The third part considers a general class of hybrid (mixed discrete and continuous) state and action (HSA) MDPs. Whereas the insights from the above approaches are valid for hybrid MDPs as well, there are significant limitations inherent to the DP approach. Existing solvers for hybrid state and action MDPs are either limited to very restricted transition distributions, require knowledge of domain-specific basis functions to achieve good approximations, or do not scale. We explore a domain-independent approach based on the framework of hindsight optimization (HOP) for HSA-MDPs, which uses an upper bound on the finite-horizon action values for action selection. Our main contribution is a linear time reduction to a Mixed Integer Linear Program (MILP) that encodes the HOP objective, when the dynamics are specified as location-scale probability distributions parametrized by Piecewise Linear (PWL) functions of states and actions. In addition, we show how to use the same machinery to select actions based on a lower-bound generated by straight-line plans. Our empirical results show that the HSA-HOP approach effectively scales to high-dimensional problems and outperforms baselines that are capable of scaling to such large hybrid MDPs. In a concluding case study, we cast the real-time dispatch optimization problem faced by the Corvallis Fire Department as an HSA-MDP with factored actions. We show that our domain-independent planner significantly improves upon the responsiveness of the baseline that dispatches the nearest responders
Recommended from our members
Scheduling and Online Planning in Stochastic Diffusion Networks
Diffusion processes in networks are common models for many domains, including species colonization, information/idea cascade, disease propagation and fire spreading. In diffusion networks, a diffusion event occurs when a behavior spreads from one node to the other following a probabilistic model, where the behavior could be species, an idea, a virus, fire, etc. In the real world, in addition to observing diffusion processes, people are usually able to control the influence of diffusion by conducting operations on each individual node or node groups. Then the diffusion network control problem is to decide how to perform possible controls in order to maximize or minimize the range of diffusion, especially when there is a limited resource for doing controls.
Diffusion network control problems are challenging for most AI planning techniques. The complexity comes from highly stochastic exogenous events, a large action branching factor (the number of combinations of individual operations), a long time horizon, and the need to reason about numeric resource limits. In this thesis, we explore approaches that offer high-quality policies of controlling diffusion processes in large-scale networks.
We first propose a non-adaptive policy in conservation planning, where the goal is to encourage species spread in a long term. Given a set of control operations of interest, this policy specifies the deadline of taking each operation, so that the resource is used with the most flexibility while keeping the loss of diffusion influence within a desired ratio. This is particularly applicable in cases where a domain expert can develop a set of control operations that captures their own objectives. Then our approach provides a way of trading off diffusion influence and resource usage.
We further propose a fully adaptive approach for this conservation planning problem by computing a Hindsight Optimization (HOP) solution at every time step. Instead of computing a HOP action in the traditional way which is linear in the number of actions, we take advantage of its separable structure and develop an effective algorithm that scales for exponentially large, factored action spaces. From experiments on both synthetic and real data sets, we show that our algorithm returns near-optimal HOP solutions while scaling to large problems.
Moreover, we extend our implementation of HOP policy to a general framework of online planning for diffusion network control problems. In particular, we give a general and formal representation of diffusion network problems. Our framework proposes a schema of effectively computing multiple lookahead policies, some of which have been successfully applied to various probabilistic planning problems. We evaluate our ap-proach on diffusion network control problems in conservation planning, epidemic control and firefighting. The experimental results demonstrate the behaviors of these lookahead policies and the advantage of each in different domains
A survey on policy search algorithms for learning robot controllers in a handful of trials
Most policy search algorithms require thousands of training episodes to find
an effective policy, which is often infeasible with a physical robot. This
survey article focuses on the extreme other end of the spectrum: how can a
robot adapt with only a handful of trials (a dozen) and a few minutes? By
analogy with the word "big-data", we refer to this challenge as "micro-data
reinforcement learning". We show that a first strategy is to leverage prior
knowledge on the policy structure (e.g., dynamic movement primitives), on the
policy parameters (e.g., demonstrations), or on the dynamics (e.g.,
simulators). A second strategy is to create data-driven surrogate models of the
expected reward (e.g., Bayesian optimization) or the dynamical model (e.g.,
model-based policy search), so that the policy optimizer queries the model
instead of the real system. Overall, all successful micro-data algorithms
combine these two strategies by varying the kind of model and prior knowledge.
The current scientific challenges essentially revolve around scaling up to
complex robots (e.g., humanoids), designing generic priors, and optimizing the
computing time.Comment: 21 pages, 3 figures, 4 algorithms, accepted at IEEE Transactions on
Robotic
- …