66 research outputs found
Scaling Up Probabilistic Circuits by Latent Variable Distillation
Probabilistic Circuits (PCs) are a unified framework for tractable
probabilistic models that support efficient computation of various
probabilistic queries (e.g., marginal probabilities). One key challenge is to
scale PCs to model large and high-dimensional real-world datasets: we observe
that as the number of parameters in PCs increases, their performance
immediately plateaus. This phenomenon suggests that the existing optimizers
fail to exploit the full expressive power of large PCs. We propose to overcome
such bottleneck by latent variable distillation: we leverage the less tractable
but more expressive deep generative models to provide extra supervision over
the latent variables of PCs. Specifically, we extract information from
Transformer-based generative models to assign values to latent variables of
PCs, providing guidance to PC optimizers. Experiments on both image and
language modeling benchmarks (e.g., ImageNet and WikiText-2) show that latent
variable distillation substantially boosts the performance of large PCs
compared to their counterparts without latent variable distillation. In
particular, on the image modeling benchmarks, PCs achieve competitive
performance against some of the widely-used deep generative models, including
variational autoencoders and flow-based models, opening up new avenues for
tractable generative modeling
Sparse Probabilistic Circuits via Pruning and Growing
Probabilistic circuits (PCs) are a tractable representation of probability
distributions allowing for exact and efficient computation of likelihoods and
marginals. There has been significant recent progress on improving the scale
and expressiveness of PCs. However, PC training performance plateaus as model
size increases. We discover that most capacity in existing large PC structures
is wasted: fully-connected parameter layers are only sparsely used. We propose
two operations: pruning and growing, that exploit the sparsity of PC
structures. Specifically, the pruning operation removes unimportant
sub-networks of the PC for model compression and comes with theoretical
guarantees. The growing operation increases model capacity by increasing the
size of the latent space. By alternatingly applying pruning and growing, we
increase the capacity that is meaningfully used, allowing us to significantly
scale up PC learning. Empirically, our learner achieves state-of-the-art
likelihoods on MNIST-family image datasets and on Penn Tree Bank language data
compared to other PC learners and less tractable deep generative models such as
flow-based models and variational autoencoders (VAEs).Comment: 36th Conference on Neural Information Processing Systems (NeurIPS
2022
Understanding the Distillation Process from Deep Generative Models to Tractable Probabilistic Circuits
Probabilistic Circuits (PCs) are a general and unified computational
framework for tractable probabilistic models that support efficient computation
of various inference tasks (e.g., computing marginal probabilities). Towards
enabling such reasoning capabilities in complex real-world tasks, Liu et al.
(2022) propose to distill knowledge (through latent variable assignments) from
less tractable but more expressive deep generative models. However, it is still
unclear what factors make this distillation work well. In this paper, we
theoretically and empirically discover that the performance of a PC can exceed
that of its teacher model. Therefore, instead of performing distillation from
the most expressive deep generative model, we study what properties the teacher
model and the PC should have in order to achieve good distillation performance.
This leads to a generic algorithmic improvement as well as other
data-type-specific ones over the existing latent variable distillation
pipeline. Empirically, we outperform SoTA TPMs by a large margin on challenging
image modeling benchmarks. In particular, on ImageNet32, PCs achieve 4.06
bits-per-dimension, which is only 0.34 behind variational diffusion models
(Kingma et al., 2021)
Efficient Meta Reinforcement Learning for Preference-based Fast Adaptation
Learning new task-specific skills from a few trials is a fundamental
challenge for artificial intelligence. Meta reinforcement learning (meta-RL)
tackles this problem by learning transferable policies that support few-shot
adaptation to unseen tasks. Despite recent advances in meta-RL, most existing
methods require the access to the environmental reward function of new tasks to
infer the task objective, which is not realistic in many practical
applications. To bridge this gap, we study the problem of few-shot adaptation
in the context of human-in-the-loop reinforcement learning. We develop a
meta-RL algorithm that enables fast policy adaptation with preference-based
feedback. The agent can adapt to new tasks by querying human's preference
between behavior trajectories instead of using per-step numeric rewards. By
extending techniques from information theory, our approach can design query
sequences to maximize the information gain from human interactions while
tolerating the inherent error of non-expert human oracle. In experiments, we
extensively evaluate our method, Adaptation with Noisy OracLE (ANOLE), on a
variety of meta-RL benchmark tasks and demonstrate substantial improvement over
baseline algorithms in terms of both feedback efficiency and error tolerance.Comment: Thirty-sixth Conference on Neural Information Processing Systems
(NeurIPS 2022
Open-World Multi-Task Control Through Goal-Aware Representation Learning and Adaptive Horizon Prediction
We study the problem of learning goal-conditioned policies in Minecraft, a
popular, widely accessible yet challenging open-ended environment for
developing human-level multi-task agents. We first identify two main challenges
of learning such policies: 1) the indistinguishability of tasks from the state
distribution, due to the vast scene diversity, and 2) the non-stationary nature
of environment dynamics caused by partial observability. To tackle the first
challenge, we propose Goal-Sensitive Backbone (GSB) for the policy to encourage
the emergence of goal-relevant visual state representations. To tackle the
second challenge, the policy is further fueled by an adaptive horizon
prediction module that helps alleviate the learning uncertainty brought by the
non-stationary dynamics. Experiments on 20 Minecraft tasks show that our method
significantly outperforms the best baseline so far; in many of them, we double
the performance. Our ablation and exploratory studies then explain how our
approach beat the counterparts and also unveil the surprising bonus of
zero-shot generalization to new scenes (biomes). We hope our agent could help
shed some light on learning goal-conditioned, multi-task agents in challenging,
open-ended environments like Minecraft.Comment: This paper is accepted by CVPR202
GROOT: Learning to Follow Instructions by Watching Gameplay Videos
We study the problem of building a controller that can follow open-ended
instructions in open-world environments. We propose to follow reference videos
as instructions, which offer expressive goal specifications while eliminating
the need for expensive text-gameplay annotations. A new learning framework is
derived to allow learning such instruction-following controllers from gameplay
videos while producing a video instruction encoder that induces a structured
goal space. We implement our agent GROOT in a simple yet effective
encoder-decoder architecture based on causal transformers. We evaluate GROOT
against open-world counterparts and human players on a proposed Minecraft
SkillForge benchmark. The Elo ratings clearly show that GROOT is closing the
human-machine gap as well as exhibiting a 70% winning rate over the best
generalist agent baseline. Qualitative analysis of the induced goal space
further demonstrates some interesting emergent properties, including the goal
composition and complex gameplay behavior synthesis. The project page is
available at https://craftjarvis-groot.github.io
Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents
We investigate the challenge of task planning for multi-task embodied agents
in open-world environments. Two main difficulties are identified: 1) executing
plans in an open-world environment (e.g., Minecraft) necessitates accurate and
multi-step reasoning due to the long-term nature of tasks, and 2) as vanilla
planners do not consider how easy the current agent can achieve a given
sub-task when ordering parallel sub-goals within a complicated plan, the
resulting plan could be inefficient or even infeasible. To this end, we propose
"escribe, xplain, lan and
elect" (), an interactive planning approach based
on Large Language Models (LLMs). DEPS facilitates better error correction on
initial LLM-generated by integrating of
the plan execution process and providing self- of
feedback when encountering failures during the extended planning phases.
Furthermore, it includes a goal , which is a trainable
module that ranks parallel candidate sub-goals based on the estimated steps of
completion, consequently refining the initial plan. Our experiments mark the
milestone of the first zero-shot multi-task agent that can robustly accomplish
70+ Minecraft tasks and nearly double the overall performances. Further testing
reveals our method's general effectiveness in popularly adopted non-open-ended
domains as well (i.e., ALFWorld and tabletop manipulation). The ablation and
exploratory studies detail how our design beats the counterparts and provide a
promising update on the grand challenge with our
approach. The code is released at https://github.com/CraftJarvis/MC-Planner.Comment: NeurIPS 202
- …