260 research outputs found
Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision
Recent AI-assistant agents, such as ChatGPT, predominantly rely on supervised
fine-tuning (SFT) with human annotations and reinforcement learning from human
feedback (RLHF) to align the output of large language models (LLMs) with human
intentions, ensuring they are helpful, ethical, and reliable. However, this
dependence can significantly constrain the true potential of AI-assistant
agents due to the high cost of obtaining human supervision and the related
issues on quality, reliability, diversity, self-consistency, and undesirable
biases. To address these challenges, we propose a novel approach called
SELF-ALIGN, which combines principle-driven reasoning and the generative power
of LLMs for the self-alignment of AI agents with minimal human supervision. Our
approach encompasses four stages: first, we use an LLM to generate synthetic
prompts, and a topic-guided method to augment the prompt diversity; second, we
use a small set of human-written principles for AI models to follow, and guide
the LLM through in-context learning from demonstrations (of principles
application) to produce helpful, ethical, and reliable responses to user's
queries; third, we fine-tune the original LLM with the high-quality
self-aligned responses so that the resulting model can generate desirable
responses for each query directly without the principle set and the
demonstrations anymore; and finally, we offer a refinement step to address the
issues of overly-brief or indirect responses. Applying SELF-ALIGN to the
LLaMA-65b base language model, we develop an AI assistant named Dromedary. With
fewer than 300 lines of human annotations (including < 200 seed prompts, 16
generic principles, and 5 exemplars for in-context learning). Dromedary
significantly surpasses the performance of several state-of-the-art AI systems,
including Text-Davinci-003 and Alpaca, on benchmark datasets with various
settings.Comment: Accepted at NeurIPS 2023 (Spotlight). Project page:
https://github.com/IBM/Dromedar
Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models
While large language models (LLMs) have demonstrated remarkable capabilities
across a range of downstream tasks, a significant concern revolves around their
propensity to exhibit hallucinations: LLMs occasionally generate content that
diverges from the user input, contradicts previously generated context, or
misaligns with established world knowledge. This phenomenon poses a substantial
challenge to the reliability of LLMs in real-world scenarios. In this paper, we
survey recent efforts on the detection, explanation, and mitigation of
hallucination, with an emphasis on the unique challenges posed by LLMs. We
present taxonomies of the LLM hallucination phenomena and evaluation
benchmarks, analyze existing approaches aiming at mitigating LLM hallucination,
and discuss potential directions for future research.Comment: work in progress; 32 page
Invariant Causal Imitation Learning for Generalizable Policies
Consider learning an imitation policy on the basis of demonstrated behavior
from multiple environments, with an eye towards deployment in an unseen
environment. Since the observable features from each setting may be different,
directly learning individual policies as mappings from features to actions is
prone to spurious correlations -- and may not generalize well. However, the
expert's policy is often a function of a shared latent structure underlying
those observable features that is invariant across settings. By leveraging data
from multiple environments, we propose Invariant Causal Imitation Learning
(ICIL), a novel technique in which we learn a feature representation that is
invariant across domains, on the basis of which we learn an imitation policy
that matches expert behavior. To cope with transition dynamics mismatch, ICIL
learns a shared representation of causal features (for all training
environments), that is disentangled from the specific representations of noise
variables (for each of those environments). Moreover, to ensure that the
learned policy matches the observation distribution of the expert's policy,
ICIL estimates the energy of the expert's observations and uses a
regularization term that minimizes the imitator policy's next state energy.
Experimentally, we compare our methods against several benchmarks in control
and healthcare tasks and show its effectiveness in learning imitation policies
capable of generalizing to unseen environments
Scenic4RL: Programmatic Modeling and Generation of Reinforcement Learning Environments
The capability of reinforcement learning (RL) agent directly depends on the
diversity of learning scenarios the environment generates and how closely it
captures real-world situations. However, existing environments/simulators lack
the support to systematically model distributions over initial states and
transition dynamics. Furthermore, in complex domains such as soccer, the space
of possible scenarios is infinite, which makes it impossible for one research
group to provide a comprehensive set of scenarios to train, test, and benchmark
RL algorithms. To address this issue, for the first time, we adopt an existing
formal scenario specification language, SCENIC, to intuitively model and
generate interactive scenarios. We interfaced SCENIC to Google Research Soccer
environment to create a platform called SCENIC4RL. Using this platform, we
provide a dataset consisting of 36 scenario programs encoded in SCENIC and
demonstration data generated from a subset of them. We share our experimental
results to show the effectiveness of our dataset and the platform to train,
test, and benchmark RL algorithms. More importantly, we open-source our
platform to enable RL community to collectively contribute to constructing a
comprehensive set of scenarios.Comment: First two authors contributed equally. Currently Under Revie
- …