Large Language Models (LLMs) have emerged as promising agents for web
navigation tasks, interpreting objectives and interacting with web pages.
However, the efficiency of spliced prompts for such tasks remains
underexplored. We introduces AllTogether, a standardized prompt template that
enhances task context representation, thereby improving LLMs' performance in
HTML-based web navigation. We evaluate the efficacy of this approach through
prompt learning and instruction finetuning based on open-source Llama-2 and
API-accessible GPT models. Our results reveal that models like GPT-4 outperform
smaller models in web navigation tasks. Additionally, we find that the length
of HTML snippet and history trajectory significantly influence performance, and
prior step-by-step instructions prove less effective than real-time
environmental feedback. Overall, we believe our work provides valuable insights
for future research in LLM-driven web agents.Comment: Include wrong information in comment. Should be 7 pages and not
published ye