32 research outputs found
Model-based Offline Policy Optimization with Adversarial Network
Model-based offline reinforcement learning (RL), which builds a supervised
transition model with logging dataset to avoid costly interactions with the
online environment, has been a promising approach for offline policy
optimization. As the discrepancy between the logging data and online
environment may result in a distributional shift problem, many prior works have
studied how to build robust transition models conservatively and estimate the
model uncertainty accurately. However, the over-conservatism can limit the
exploration of the agent, and the uncertainty estimates may be unreliable. In
this work, we propose a novel Model-based Offline policy optimization framework
with Adversarial Network (MOAN). The key idea is to use adversarial learning to
build a transition model with better generalization, where an adversary is
introduced to distinguish between in-distribution and out-of-distribution
samples. Moreover, the adversary can naturally provide a quantification of the
model's uncertainty with theoretical guarantees. Extensive experiments showed
that our approach outperforms existing state-of-the-art baselines on widely
studied offline RL benchmarks. It can also generate diverse in-distribution
samples, and quantify the uncertainty more accurately.Comment: Accepted by 26th European Conference on Artificial Intelligence ECAI
202
Baby's CoThought: Leveraging Large Language Models for Enhanced Reasoning in Compact Models
Large Language Models (LLMs) demonstrate remarkable performance on a variety
of natural language understanding (NLU) tasks, primarily due to their
in-context learning ability. This ability could be applied to building babylike
models, i.e. models at small scales, improving training efficiency. In this
paper, we propose a "CoThought" pipeline, which efficiently trains smaller
"baby" language models (BabyLMs) by leveraging the Chain of Thought prompting
of LLMs. Our pipeline restructures a dataset of less than 100M in size using
GPT-3.5-turbo, transforming it into task-oriented, human-readable texts that
are comparable to the school texts for language learners. The BabyLM is then
pretrained on this restructured dataset in a RoBERTa fashion. In evaluations
across 4 benchmarks, our BabyLM outperforms the vanilla RoBERTa in 10
linguistic, NLU, and question-answering tasks by more than 3 points, showing a
superior ability to extract contextual information. These results suggest that
compact LMs pretrained on small, LLM-restructured data can better understand
tasks and achieve improved performance.Comment: CoNLL 2023 BabyLM Challeng
Propagation of elastic solitons in chains of pre-deformed beams
International audienceWe use a combination of experiments, numerical analysis and theory to investigate the nonlinear dynamic response of a chain of precompressed elastic beams. Our results show that this simple system offers a rich platform to study the propagation of large amplitude waves. Compression waves are strongly dispersive, whereas rarefaction pulses propagate in the form of solitons. Further, we find that the model describing our structure closely resembles those introduced to characterize the dynamics of several molecular chains and macromolecular crystals, suggesting that our macroscopic system can provide insights into the effect of nonlinear vibrations on molecular mechanisms
MetaDrive: Composing Diverse Driving Scenarios for Generalizable Reinforcement Learning
Driving safely requires multiple capabilities from human and intelligent
agents, such as the generalizability to unseen environments, the safety
awareness of the surrounding traffic, and the decision-making in complex
multi-agent settings. Despite the great success of Reinforcement Learning (RL),
most of the RL research works investigate each capability separately due to the
lack of integrated environments. In this work, we develop a new driving
simulation platform called MetaDrive to support the research of generalizable
reinforcement learning algorithms for machine autonomy. MetaDrive is highly
compositional, which can generate an infinite number of diverse driving
scenarios from both the procedural generation and the real data importing.
Based on MetaDrive, we construct a variety of RL tasks and baselines in both
single-agent and multi-agent settings, including benchmarking generalizability
across unseen scenes, safe exploration, and learning multi-agent traffic. The
generalization experiments conducted on both procedurally generated scenarios
and real-world scenarios show that increasing the diversity and the size of the
training set leads to the improvement of the generalizability of the RL agents.
We further evaluate various safe reinforcement learning and multi-agent
reinforcement learning algorithms in MetaDrive environments and provide the
benchmarks. Source code, documentation, and demo video are available at
https://metadriverse.github.io/metadrive . More research projects based on
MetaDrive simulator are listed at https://metadriverse.github.ioComment: Source code, documentation, and demo video are available at
https://metadriverse.github.io/metadrive . More research projects based on
MetaDrive simulator are listed at https://metadriverse.github.i
GAN Inversion: A Survey
GAN inversion aims to invert a given image back into the latent space of a
pretrained GAN model, for the image to be faithfully reconstructed from the
inverted code by the generator. As an emerging technique to bridge the real and
fake image domains, GAN inversion plays an essential role in enabling the
pretrained GAN models such as StyleGAN and BigGAN to be used for real image
editing applications. Meanwhile, GAN inversion also provides insights on the
interpretation of GAN's latent space and how the realistic images can be
generated. In this paper, we provide an overview of GAN inversion with a focus
on its recent algorithms and applications. We cover important techniques of GAN
inversion and their applications to image restoration and image manipulation.
We further elaborate on some trends and challenges for future directions
Decentralized Policy Coordination in Mobile Sensing with Consensual Communication
In a typical mobile-sensing scenario, multiple autonomous vehicles cooperatively navigate to maximize the spatial–temporal coverage of the environment. However, as each vehicle can only make decentralized navigation decisions based on limited local observations, it is still a critical challenge to coordinate the vehicles for cooperation in an open, dynamic environment. In this paper, we propose a novel framework that incorporates consensual communication in multi-agent reinforcement learning for cooperative mobile sensing. At each step, the vehicles first learn to communicate with each other, and then, based on the received messages from others, navigate. Through communication, the decentralized vehicles can share information to break through the dilemma of local observation. Moreover, we utilize mutual information as a regularizer to promote consensus among the vehicles. The mutual information can enforce positive correlation between the navigation policy and the communication message, and therefore implicitly coordinate the decentralized policies. The convergence of this regularized algorithm can be proved theoretically under certain mild assumptions. In the experiments, we show that our algorithm is scalable and can converge very fast during training phase. It also outperforms other baselines significantly in the execution phase. The results validate that consensual communication plays very important role in coordinating the behaviors of decentralized vehicles