46,241 research outputs found
DualSMC: Tunneling Differentiable Filtering and Planning under Continuous POMDPs
A major difficulty of solving continuous POMDPs is to infer the multi-modal
distribution of the unobserved true states and to make the planning algorithm
dependent on the perceived uncertainty. We cast POMDP filtering and planning
problems as two closely related Sequential Monte Carlo (SMC) processes, one
over the real states and the other over the future optimal trajectories, and
combine the merits of these two parts in a new model named the DualSMC network.
In particular, we first introduce an adversarial particle filter that leverages
the adversarial relationship between its internal components. Based on the
filtering results, we then propose a planning algorithm that extends the
previous SMC planning approach [Piche et al., 2018] to continuous POMDPs with
an uncertainty-dependent policy. Crucially, not only can DualSMC handle complex
observations such as image input but also it remains highly interpretable. It
is shown to be effective in three continuous POMDP domains: the floor
positioning domain, the 3D light-dark navigation domain, and a modified Reacher
domain.Comment: IJCAI 202
An Adversarial Interpretation of Information-Theoretic Bounded Rationality
Recently, there has been a growing interest in modeling planning with
information constraints. Accordingly, an agent maximizes a regularized expected
utility known as the free energy, where the regularizer is given by the
information divergence from a prior to a posterior policy. While this approach
can be justified in various ways, including from statistical mechanics and
information theory, it is still unclear how it relates to decision-making
against adversarial environments. This connection has previously been suggested
in work relating the free energy to risk-sensitive control and to extensive
form games. Here, we show that a single-agent free energy optimization is
equivalent to a game between the agent and an imaginary adversary. The
adversary can, by paying an exponential penalty, generate costs that diminish
the decision maker's payoffs. It turns out that the optimal strategy of the
adversary consists in choosing costs so as to render the decision maker
indifferent among its choices, which is a definining property of a Nash
equilibrium, thus tightening the connection between free energy optimization
and game theory.Comment: 7 pages, 4 figures. Proceedings of AAAI-1
Multi-Agent Planning with Planning Graph
In this paper, we consider planning for multi-agents situations in STRIPS-like domains with planning graph. Three possible relationships between agents' goals are considered in order to evaluate plans: the agents may be collaborative, adversarial or indifferent entities. We propose algorithms to deal with each situation. The collaborative situations can be easily dealt with the original Graphplan algorithm by redefining the domain in a proper way. Forward-chaining and backward chaining algorithms are discussed to find infallible plans in adversarial situations. In case such plans cannot be found, the agent can still attempt to find a plan for achieving some part of the goals. A forward-chaining algorithm is also proposed to find plans for agents with independent goals
Efficient Supervision for Robot Learning via Imitation, Simulation, and Adaptation
Recent successes in machine learning have led to a shift in the design of
autonomous systems, improving performance on existing tasks and rendering new
applications possible. Data-focused approaches gain relevance across diverse,
intricate applications when developing data collection and curation pipelines
becomes more effective than manual behaviour design. The following work aims at
increasing the efficiency of this pipeline in two principal ways: by utilising
more powerful sources of informative data and by extracting additional
information from existing data. In particular, we target three orthogonal
fronts: imitation learning, domain adaptation, and transfer from simulation.Comment: Dissertation Summar
- …