39 research outputs found
Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs
We introduce Lumos, a novel framework for training language agents that
employs a unified data format and a modular architecture based on open-source
large language models (LLMs). Lumos consists of three distinct modules:
planning, grounding, and execution. The planning module breaks down a task into
a series of high-level, tool-agnostic subgoals, which are then made specific by
the grounding module through a set of low-level actions. These actions are
subsequently executed by the execution module, utilizing a range of
off-the-shelf tools and APIs. In order to train these modules effectively,
high-quality annotations of subgoals and actions were collected and are made
available for fine-tuning open-source LLMs for various tasks such as complex
question answering, web tasks, and math problems. Leveraging this unified data
and modular design, Lumos not only achieves comparable or superior performance
to current, state-of-the-art agents, but also exhibits several key advantages:
(1) Lumos surpasses GPT-4/3.5-based agents in complex question answering and
web tasks, while equalling the performance of significantly larger LLM agents
on math tasks; (2) Lumos outperforms open-source agents created through
conventional training methods and those using chain-of-thoughts training; and
(3) Lumos is capable of effectively generalizing to unseen interactive tasks,
outperforming larger LLM-based agents and even exceeding performance of
specialized agents.Comment: Project website: https://allenai.github.io/lumos
SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks
We introduce SwiftSage, a novel agent framework inspired by the dual-process
theory of human cognition, designed to excel in action planning for complex
interactive reasoning tasks. SwiftSage integrates the strengths of behavior
cloning and prompting large language models (LLMs) to enhance task completion
performance. The framework comprises two primary modules: the Swift module,
representing fast and intuitive thinking, and the Sage module, emulating
deliberate thought processes. The Swift module is a small encoder-decoder LM
fine-tuned on the oracle agent's action trajectories, while the Sage module
employs LLMs such as GPT-4 for subgoal planning and grounding. We develop a
heuristic method to harmoniously integrate the two modules, resulting in a more
efficient and robust problem-solving process. In 30 tasks from the ScienceWorld
benchmark, SwiftSage significantly outperforms other methods such as SayCan,
ReAct, and Reflexion, demonstrating its effectiveness in solving complex
real-world tasks.Comment: Project website: https://yuchenlin.xyz/swiftsage
Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging
While Reinforcement Learning from Human Feedback (RLHF) aligns Large Language
Models (LLMs) with general, aggregate human preferences, it is suboptimal for
learning diverse, individual perspectives. In this work, we study Reinforcement
Learning from Personalized Human Feedback (RLPHF) problem, wherein LLMs are
aligned to multiple (sometimes conflicting) preferences by modeling alignment
as a Multi-Objective Reinforcement Learning (MORL) problem. Compared to strong
single-objective baselines, we show that we can achieve personalized alignment
by decomposing preferences into multiple dimensions. These dimensions are
defined based on personalizations that are declared as desirable by the user.
In this work, we show that they can be efficiently trained independently in a
distributed manner and combined effectively post-hoc through parameter merging.
The code is available at https://github.com/joeljang/RLPHF.Comment: Preprin
Scaling and statistics of bottom-up synthesized armchair graphene nanoribbon transistors
Bottom-up assembled nanomaterials and nanostructures allow for the studies of rich and unprecedented quantum-related and mesoscopic transport phenomena. However, it can be difficult to quantify the correlations between the geometrical or structural parameters obtained from advanced microscopy and measured electrical characteristics when they are made into macroscopic devices. Here, we propose a strategy to connect the nanomaterial morphologies and the device performance through a Monte Carlo device model and apply it to understand the scaling trends of bottom-up synthesized armchair graphene nanoribbon (GNR) transistors. A new nanofabrication process is developed for GNR transistors with channel length down to 7 nm. The impacts of the GNR spatial distributions and the device geometries on the device performance are investigated systematically through comparison of experimental data with the model. Through this study, challenges and opportunities of transistor technologies based on bottom-up synthesized GNRs are pinpointed, paving the way to the further improvement of the GNR device performance for future transistor technology nodes
Inference-Time Policy Adapters (IPA): Tailoring Extreme-Scale LMs without Fine-tuning
Large language models excel at a variety of language tasks when prompted with
examples or instructions. Yet controlling these models through prompting alone
is limited. Tailoring language models through fine-tuning (e.g., via
reinforcement learning) can be effective, but it is expensive and requires
model access.
We propose Inference-time Policy Adapters (IPA), which efficiently tailors a
language model such as GPT-3 without fine-tuning it. IPA guides a large base
model during decoding time through a lightweight policy adaptor trained to
optimize an arbitrary user objective with reinforcement learning.
On five challenging text generation tasks, such as toxicity reduction and
open-domain generation, IPA consistently brings significant improvements over
off-the-shelf language models. It outperforms competitive baseline methods,
sometimes even including expensive fine-tuning. In particular, tailoring GPT-2
with IPA can outperform GPT-3, while tailoring GPT- 3 with IPA brings a major
performance boost over GPT-3 (and sometimes even over GPT-4). Our promising
results highlight the potential of IPA as a lightweight alternative to
tailoring extreme-scale language models
Faith and Fate: Limits of Transformers on Compositionality
Transformer large language models (LLMs) have sparked admiration for their
exceptional performance on tasks that demand intricate multi-step reasoning.
Yet, these models simultaneously show failures on surprisingly trivial
problems. This begs the question: Are these errors incidental, or do they
signal more substantial limitations? In an attempt to demystify Transformers,
we investigate the limits of these models across three representative
compositional tasks -- multi-digit multiplication, logic grid puzzles, and a
classic dynamic programming problem. These tasks require breaking problems down
into sub-steps and synthesizing these steps into a precise answer. We formulate
compositional tasks as computation graphs to systematically quantify the level
of complexity, and break down reasoning steps into intermediate sub-procedures.
Our empirical findings suggest that Transformers solve compositional tasks by
reducing multi-step compositional reasoning into linearized subgraph matching,
without necessarily developing systematic problem-solving skills. To round off
our empirical study, we provide theoretical arguments on abstract multi-step
reasoning problems that highlight how Transformers' performance will rapidly
decay with increased task complexity.Comment: 10 pages + appendix (21 pages