69 research outputs found
Decision Stacks: Flexible Reinforcement Learning via Modular Generative Models
Reinforcement learning presents an attractive paradigm to reason about
several distinct aspects of sequential decision making, such as specifying
complex goals, planning future observations and actions, and critiquing their
utilities. However, the combined integration of these capabilities poses
competing algorithmic challenges in retaining maximal expressivity while
allowing for flexibility in modeling choices for efficient learning and
inference. We present Decision Stacks, a generative framework that decomposes
goal-conditioned policy agents into 3 generative modules. These modules
simulate the temporal evolution of observations, rewards, and actions via
independent generative models that can be learned in parallel via teacher
forcing. Our framework guarantees both expressivity and flexibility in
designing individual modules to account for key factors such as architectural
bias, optimization objective and dynamics, transferrability across domains, and
inference speed. Our empirical results demonstrate the effectiveness of
Decision Stacks for offline policy optimization for several MDP and POMDP
environments, outperforming existing methods and enabling flexible generative
decision making.Comment: published at NeurIPS 2023, project page:
https://siyan-zhao.github.io/decision-stacks
AlignFlow: Cycle Consistent Learning from Multiple Domains via Normalizing Flows
Given datasets from multiple domains, a key challenge is to efficiently
exploit these data sources for modeling a target domain. Variants of this
problem have been studied in many contexts, such as cross-domain translation
and domain adaptation. We propose AlignFlow, a generative modeling framework
that models each domain via a normalizing flow. The use of normalizing flows
allows for a) flexibility in specifying learning objectives via adversarial
training, maximum likelihood estimation, or a hybrid of the two methods; and b)
learning and exact inference of a shared representation in the latent space of
the generative model. We derive a uniform set of conditions under which
AlignFlow is marginally-consistent for the different learning objectives.
Furthermore, we show that AlignFlow guarantees exact cycle consistency in
mapping datapoints from a source domain to target and back to the source
domain. Empirically, AlignFlow outperforms relevant baselines on image-to-image
translation and unsupervised domain adaptation and can be used to
simultaneously interpolate across the various domains using the learned
representation.Comment: AAAI 202
- …