50 research outputs found
Sampling Through the Lens of Sequential Decision Making
Sampling is ubiquitous in machine learning methodologies. Due to the growth
of large datasets and model complexity, we want to learn and adapt the sampling
process while training a representation. Towards achieving this grand goal, a
variety of sampling techniques have been proposed. However, most of them either
use a fixed sampling scheme or adjust the sampling scheme based on simple
heuristics. They cannot choose the best sample for model training in different
stages. Inspired by "Think, Fast and Slow" (System 1 and System 2) in cognitive
science, we propose a reward-guided sampling strategy called Adaptive Sample
with Reward (ASR) to tackle this challenge. To the best of our knowledge, this
is the first work utilizing reinforcement learning (RL) to address the sampling
problem in representation learning. Our approach optimally adjusts the sampling
process to achieve optimal performance. We explore geographical relationships
among samples by distance-based sampling to maximize overall cumulative reward.
We apply ASR to the long-standing sampling problems in similarity-based loss
functions. Empirical results in information retrieval and clustering
demonstrate ASR's superb performance across different datasets. We also discuss
an engrossing phenomenon which we name as "ASR gravity well" in experiments
Learning Agile Skills via Adversarial Imitation of Rough Partial Demonstrations
Learning agile skills is one of the main challenges in robotics. To this end,
reinforcement learning approaches have achieved impressive results. These
methods require explicit task information in terms of a reward function or an
expert that can be queried in simulation to provide a target control output,
which limits their applicability. In this work, we propose a generative
adversarial method for inferring reward functions from partial and potentially
physically incompatible demonstrations for successful skill acquirement where
reference or expert demonstrations are not easily accessible. Moreover, we show
that by using a Wasserstein GAN formulation and transitions from demonstrations
with rough and partial information as input, we are able to extract policies
that are robust and capable of imitating demonstrated behaviors. Finally, the
obtained skills such as a backflip are tested on an agile quadruped robot
called Solo 8 and present faithful replication of hand-held human
demonstrations
On the power of foundation models
With infinitely many high-quality data points, infinite computational power,
an infinitely large foundation model with a perfect training algorithm and
guaranteed zero generalization error on the pretext task, can the model be used
for everything? This question cannot be answered by the existing theory of
representation, optimization or generalization, because the issues they mainly
investigate are assumed to be nonexistent here. In this paper, we show that
category theory provides powerful machinery to answer this question. We have
proved three results. The first one limits the power of prompt-based learning,
saying that the model can solve a downstream task with prompts if and only if
the task is representable. The second one says fine tuning does not have this
limit, as a foundation model with the minimum required power (up to symmetry)
can theoretically solve downstream tasks with fine tuning and enough resources.
Our final result can be seen as a new type of generalization theorem, showing
that the foundation model can generate unseen objects from the target category
(e.g., images) using the structural information from the source category (e.g.,
texts). Along the way, we provide a categorical framework for supervised and
self-supervised learning, which might be of independent interest
Unsupervised Behavior Extraction via Random Intent Priors
Reward-free data is abundant and contains rich prior knowledge of human
behaviors, but it is not well exploited by offline reinforcement learning (RL)
algorithms. In this paper, we propose UBER, an unsupervised approach to extract
useful behaviors from offline reward-free datasets via diversified rewards.
UBER assigns different pseudo-rewards sampled from a given prior distribution
to different agents to extract a diverse set of behaviors, and reuse them as
candidate policies to facilitate the learning of new tasks. Perhaps
surprisingly, we show that rewards generated from random neural networks are
sufficient to extract diverse and useful behaviors, some even close to expert
ones. We provide both empirical and theoretical evidence to justify the use of
random priors for the reward function. Experiments on multiple benchmarks
showcase UBER's ability to learn effective and diverse behavior sets that
enhance sample efficiency for online RL, outperforming existing baselines. By
reducing reliance on human supervision, UBER broadens the applicability of RL
to real-world scenarios with abundant reward-free data.Comment: Thirty-seventh Conference on Neural Information Processing System