49 research outputs found
Value function estimation using conditional diffusion models for control
A fairly reliable trend in deep reinforcement learning is that the
performance scales with the number of parameters, provided a complimentary
scaling in amount of training data. As the appetite for large models increases,
it is imperative to address, sooner than later, the potential problem of
running out of high-quality demonstrations. In this case, instead of collecting
only new data via costly human demonstrations or risking a simulation-to-real
transfer with uncertain effects, it would be beneficial to leverage vast
amounts of readily-available low-quality data. Since classical control
algorithms such as behavior cloning or temporal difference learning cannot be
used on reward-free or action-free data out-of-the-box, this solution warrants
novel training paradigms for continuous control. We propose a simple algorithm
called Diffused Value Function (DVF), which learns a joint multi-step model of
the environment-robot interaction dynamics using a diffusion model. This model
can be efficiently learned from state sequences (i.e., without access to reward
functions nor actions), and subsequently used to estimate the value of each
action out-of-the-box. We show how DVF can be used to efficiently capture the
state visitation measure for multiple controllers, and show promising
qualitative and quantitative results on challenging robotics benchmarks
Evaluation of Hydrodynamic Drag on Experimental Fouling-release Surfaces, using Rotating Disks
Fouling by biofilms significantly increases frictional drag on ships' hulls. A device, the friction disk machine, designed to measure torque on rotating disks, was used to examine differences among experimental fouling-release coatings in the drag penalty due to accumulated biofilms. Penalties were measured as the percentage change in the frictional resistance coefficient C f . Drag penalties due to microfouling ranged from 9% to 29%, comparable to previously reported values. An antifouling control coating showed a smaller drag penalty than the fouling-release coatings. There were also significant differences among the fouling-release coatings in drag due to biofilm formation. These results indicate that the friction disk machine may serve as a valuable tool for investigating the effects of experimental coatings, both antifouling and fouling-release, on microfouling and associated drag penalties
Large Language Models as Generalizable Policies for Embodied Tasks
We show that large language models (LLMs) can be adapted to be generalizable
policies for embodied visual tasks. Our approach, called Large LAnguage model
Reinforcement Learning Policy (LLaRP), adapts a pre-trained frozen LLM to take
as input text instructions and visual egocentric observations and output
actions directly in the environment. Using reinforcement learning, we train
LLaRP to see and act solely through environmental interactions. We show that
LLaRP is robust to complex paraphrasings of task instructions and can
generalize to new tasks that require novel optimal behavior. In particular, on
1,000 unseen tasks it achieves 42% success rate, 1.7x the success rate of other
common learned baselines or zero-shot applications of LLMs. Finally, to aid the
community in studying language conditioned, massively multi-task, embodied AI
problems we release a novel benchmark, Language Rearrangement, consisting of
150,000 training and 1,000 testing tasks for language-conditioned
rearrangement. Video examples of LLaRP in unseen Language Rearrangement
instructions are at https://llm-rl.github.io
Position Prediction as an Effective Pretraining Strategy
Transformers have gained increasing popularity in a wide range of
applications, including Natural Language Processing (NLP), Computer Vision and
Speech Recognition, because of their powerful representational capacity.
However, harnessing this representational capacity effectively requires a large
amount of data, strong regularization, or both, to mitigate overfitting.
Recently, the power of the Transformer has been unlocked by self-supervised
pretraining strategies based on masked autoencoders which rely on
reconstructing masked inputs, directly, or contrastively from unmasked content.
This pretraining strategy which has been used in BERT models in NLP, Wav2Vec
models in Speech and, recently, in MAE models in Vision, forces the model to
learn about relationships between the content in different parts of the input
using autoencoding related objectives. In this paper, we propose a novel, but
surprisingly simple alternative to content reconstruction~-- that of predicting
locations from content, without providing positional information for it. Doing
so requires the Transformer to understand the positional relationships between
different parts of the input, from their content alone. This amounts to an
efficient implementation where the pretext task is a classification problem
among all possible positions for each input token. We experiment on both Vision
and Speech benchmarks, where our approach brings improvements over strong
supervised training baselines and is comparable to modern
unsupervised/self-supervised pretraining methods. Our method also enables
Transformers trained without position embeddings to outperform ones trained
with full position information.Comment: Accepted to ICML 202