14 research outputs found
Learning Calibratable Policies using Programmatic Style-Consistency
We study the important and challenging problem of controllable generation of long-term sequential behaviors. Solutions to this problem would impact many applications, such as calibrating behaviors of AI agents in games or predicting player trajectories in sports. In contrast to the well-studied areas of controllable generation of images, text, and speech, there are significant challenges that are unique to or exacerbated by generating long-term behaviors: how should we specify the factors of variation to control, and how can we ensure that the generated temporal behavior faithfully demonstrates diverse styles? In this paper, we leverage large amounts of raw behavioral data to learn policies that can be calibrated to generate a diverse range of behavior styles (e.g., aggressive versus passive play in sports). Inspired by recent work on leveraging programmatic labeling functions, we present a novel framework that combines imitation learning with data programming to learn style-calibratable policies. Our primary technical contribution is a formal notion of style-consistency as a learning objective, and its integration with conventional imitation learning approaches. We evaluate our framework using demonstrations from professional basketball players and agents in the MuJoCo physics environment, and show that our learned policies can be accurately calibrated to generate interesting behavior styles in both domains
Learning a Hierarchical Planner from Humans in Multiple Generations
A typical way in which a machine acquires knowledge from humans is by
programming. Compared to learning from demonstrations or experiences,
programmatic learning allows the machine to acquire a novel skill as soon as
the program is written, and, by building a library of programs, a machine can
quickly learn how to perform complex tasks. However, as programs often take
their execution contexts for granted, they are brittle when the contexts
change, making it difficult to adapt complex programs to new contexts. We
present natural programming, a library learning system that combines
programmatic learning with a hierarchical planner. Natural programming
maintains a library of decompositions, consisting of a goal, a linguistic
description of how this goal decompose into sub-goals, and a concrete instance
of its decomposition into sub-goals. A user teaches the system via curriculum
building, by identifying a challenging yet not impossible goal along with
linguistic hints on how this goal may be decomposed into sub-goals. The system
solves for the goal via hierarchical planning, using the linguistic hints to
guide its probability distribution in proposing the right plans. The system
learns from this interaction by adding newly found decompositions in the
successful search into its library. Simulated studies and a human experiment
(n=360) on a controlled environment demonstrate that natural programming can
robustly compose programs learned from different users and contexts, adapting
faster and solving more complex tasks when compared to programmatic baselines.Comment: First two authors contributed equall
Task Programming: Learning Data Efficient Behavior Representations
Specialized domain knowledge is often necessary to accurately annotate training sets for in-depth analysis, but can be burdensome and time-consuming to acquire from domain experts. This issue arises prominently in automated behavior analysis, in which agent movements or actions of interest are detected from video tracking data. To reduce annotation effort, we present TREBA: a method to learn annotation-sample efficient trajectory embedding for behavior analysis, based on multi-task self-supervised learning. The tasks in our method can be efficiently engineered by domain experts through a process we call "task programming", which uses programs to explicitly encode structured knowledge from domain experts. Total domain expert effort can be reduced by exchanging data annotation time for the construction of a small number of programmed tasks. We evaluate this trade-off using data from behavioral neuroscience, in which specialized domain knowledge is used to identify behaviors. We present experimental results in three datasets across two domains: mice and fruit flies. Using embeddings from TREBA, we reduce annotation burden by up to a factor of 10 without compromising accuracy compared to state-of-the-art features. Our results thus suggest that task programming can be an effective way to reduce annotation effort for domain experts
UniMASK: Unified Inference in Sequential Decision Problems
Randomly masking and predicting word tokens has been a successful approach in
pre-training language models for a variety of downstream tasks. In this work,
we observe that the same idea also applies naturally to sequential
decision-making, where many well-studied tasks like behavior cloning, offline
reinforcement learning, inverse dynamics, and waypoint conditioning correspond
to different sequence maskings over a sequence of states, actions, and returns.
We introduce the UniMASK framework, which provides a unified way to specify
models which can be trained on many different sequential decision-making tasks.
We show that a single UniMASK model is often capable of carrying out many tasks
with performance similar to or better than single-task models. Additionally,
after fine-tuning, our UniMASK models consistently outperform comparable
single-task models. Our code is publicly available at
https://github.com/micahcarroll/uniMASK.Comment: NeurIPS 2022 (Oral). A prior version was published at an ICML
Workshop, available at arXiv:2204.1332