Building agents using large language models (LLMs) to control computers is an
emerging research field, where the agent perceives computer states and performs
actions to accomplish complex tasks. Previous computer agents have demonstrated
the benefits of in-context learning (ICL); however, their performance is
hindered by several issues. First, the limited context length of LLMs and
complex computer states restrict the number of exemplars, as a single webpage
can consume the entire context. Second, the exemplars in current methods, such
as high-level plans and multi-choice questions, cannot represent complete
trajectories, leading to suboptimal performance in tasks that require many
steps or repeated actions. Third, existing computer agents rely on
task-specific exemplars and overlook the similarity among tasks, resulting in
poor generalization to novel tasks. To address these challenges, we introduce
Synapse, featuring three key components: i) state abstraction, which filters
out task-irrelevant information from raw states, allowing more exemplars within
the limited context, ii) trajectory-as-exemplar prompting, which prompts the
LLM with complete trajectories of the abstracted states and actions for
improved multi-step decision-making, and iii) exemplar memory, which stores the
embeddings of exemplars and retrieves them via similarity search for
generalization to novel tasks. We evaluate Synapse on MiniWoB++, a standard
task suite, and Mind2Web, a real-world website benchmark. In MiniWoB++, Synapse
achieves a 99.2% average success rate (a 10% relative improvement) across 64
tasks using demonstrations from only 48 tasks. Notably, Synapse is the first
ICL method to solve the book-flight task in MiniWoB++. Synapse also exhibits a
53% relative improvement in average step success rate over the previous
state-of-the-art prompting scheme in Mind2Web.Comment: 22 pages, 7 figure