313 research outputs found
Planning for Decentralized Control of Multiple Robots Under Uncertainty
We describe a probabilistic framework for synthesizing control policies for
general multi-robot systems, given environment and sensor models and a cost
function. Decentralized, partially observable Markov decision processes
(Dec-POMDPs) are a general model of decision processes where a team of agents
must cooperate to optimize some objective (specified by a shared reward or cost
function) in the presence of uncertainty, but where communication limitations
mean that the agents cannot share their state, so execution must proceed in a
decentralized fashion. While Dec-POMDPs are typically intractable to solve for
real-world problems, recent research on the use of macro-actions in Dec-POMDPs
has significantly increased the size of problem that can be practically solved
as a Dec-POMDP. We describe this general model, and show how, in contrast to
most existing methods that are specialized to a particular problem class, it
can synthesize control policies that use whatever opportunities for
coordination are present in the problem, while balancing off uncertainty in
outcomes, sensor information, and information about other agents. We use three
variations on a warehouse task to show that a single planner of this type can
generate cooperative behavior using task allocation, direct communication, and
signaling, as appropriate
Mixed Logical Inference and Probabilistic Planning for Robots in Unreliable Worlds
Abstract—Deployment of robots in practical domains poses key knowledge representation and reasoning challenges. Robots need to represent and reason with incomplete domain knowl-edge, acquiring and using sensor inputs based on need and availability. This paper presents an architecture that exploits the complementary strengths of declarative programming and probabilistic graphical models as a step towards addressing these challenges. Answer Set Prolog (ASP), a declarative language, is used to represent, and perform inference with, incomplete domain knowledge, including default information that holds in all but a few exceptional situations. A hierarchy of partially observable Markov decision processes (POMDPs) probabilistically models the uncertainty in sensor input processing and navigation. Non-monotonic logical inference in ASP is used to generate a multi-nomial prior for probabilistic state estimation with the hierarchy of POMDPs. It is also used with historical data to construct a Beta (meta) density model of priors for metareasoning and early termination of trials when appropriate. Robots equipped with this architecture automatically tailor sensor input processing and navigation to tasks at hand, revising existing knowledge using information extracted from sensor inputs. The architecture is empirically evaluated in simulation and on a mobile robot visually localizing objects in indoor domains. I
Accelerating decision making under partial observability using learned action priors
Thesis (M.Sc.)--University of the Witwatersrand, Faculty of Science, School of Computer Science and Applied Mathematics, 2017.Partially Observable Markov Decision Processes (POMDPs) provide a principled mathematical
framework allowing a robot to reason about the consequences of actions and
observations with respect to the agent's limited perception of its environment. They
allow an agent to plan and act optimally in uncertain environments. Although they
have been successfully applied to various robotic tasks, they are infamous for their high
computational cost. This thesis demonstrates the use of knowledge transfer, learned
from previous experiences, to accelerate the learning of POMDP tasks. We propose
that in order for an agent to learn to solve these tasks quicker, it must be able to generalise
from past behaviours and transfer knowledge, learned from solving multiple tasks,
between di erent circumstances. We present a method for accelerating this learning
process by learning the statistics of action choices over the lifetime of an agent, known
as action priors. Action priors specify the usefulness of actions in situations and allow
us to bias exploration, which in turn improves the performance of the learning process.
Using navigation domains, we study the degree to which transferring knowledge
between tasks in this way results in a considerable speed up in solution times.
This thesis therefore makes the following contributions. We provide an algorithm
for learning action priors from a set of approximately optimal value functions and two
approaches with which a prior knowledge over actions can be used in a POMDP context.
As such, we show that considerable gains in speed can be achieved in learning subsequent
tasks using prior knowledge rather than learning from scratch. Learning with
action priors can particularly be useful in reducing the cost of exploration in the early
stages of the learning process as the priors can act as mechanism that allows the agent
to select more useful actions given particular circumstances. Thus, we demonstrate how
the initial losses associated with unguided exploration can be alleviated through the
use of action priors which allow for safer exploration. Additionally, we illustrate that
action priors can also improve the computation speeds of learning feasible policies in a
shorter period of time.MT201
- …