471 research outputs found

    Synapse: Trajectory-as-Exemplar Prompting with Memory for Computer Control

    Full text link
    Building agents using large language models (LLMs) to control computers is an emerging research field, where the agent perceives computer states and performs actions to accomplish complex tasks. Previous computer agents have demonstrated the benefits of in-context learning (ICL); however, their performance is hindered by several issues. First, the limited context length of LLMs and complex computer states restrict the number of exemplars, as a single webpage can consume the entire context. Second, the exemplars in current methods, such as high-level plans and multi-choice questions, cannot represent complete trajectories, leading to suboptimal performance in tasks that require many steps or repeated actions. Third, existing computer agents rely on task-specific exemplars and overlook the similarity among tasks, resulting in poor generalization to novel tasks. To address these challenges, we introduce Synapse, featuring three key components: i) state abstraction, which filters out task-irrelevant information from raw states, allowing more exemplars within the limited context, ii) trajectory-as-exemplar prompting, which prompts the LLM with complete trajectories of the abstracted states and actions for improved multi-step decision-making, and iii) exemplar memory, which stores the embeddings of exemplars and retrieves them via similarity search for generalization to novel tasks. We evaluate Synapse on MiniWoB++, a standard task suite, and Mind2Web, a real-world website benchmark. In MiniWoB++, Synapse achieves a 99.2% average success rate (a 10% relative improvement) across 64 tasks using demonstrations from only 48 tasks. Notably, Synapse is the first ICL method to solve the book-flight task in MiniWoB++. Synapse also exhibits a 53% relative improvement in average step success rate over the previous state-of-the-art prompting scheme in Mind2Web.Comment: 22 pages, 7 figure

    From Keyword Search to Exploration: How Result Visualization Aids Discovery on the Web

    No full text
    A key to the Web's success is the power of search. The elegant way in which search results are returned is usually remarkably effective. However, for exploratory search in which users need to learn, discover, and understand novel or complex topics, there is substantial room for improvement. Human computer interaction researchers and web browser designers have developed novel strategies to improve Web search by enabling users to conveniently visualize, manipulate, and organize their Web search results. This monograph offers fresh ways to think about search-related cognitive processes and describes innovative design approaches to browsers and related tools. For instance, while key word search presents users with results for specific information (e.g., what is the capitol of Peru), other methods may let users see and explore the contexts of their requests for information (related or previous work, conflicting information), or the properties that associate groups of information assets (group legal decisions by lead attorney). We also consider the both traditional and novel ways in which these strategies have been evaluated. From our review of cognitive processes, browser design, and evaluations, we reflect on the future opportunities and new paradigms for exploring and interacting with Web search results

    A hypertext system that learns from user feedback

    Get PDF
    Retrieving specific information from large amounts of documentation is not an easy task. It could be facilitated if information relevant in the current problem solving context could be automatically supplied to the user. As a first step towards this goal, we have developed an intelligent hypertext system called CID (Computer Integrated Documentation). Besides providing an hypertext interface for browsing large documents, the CID system automatically acquires and reuses the context in which previous searches were appropriate. This mechanism utilizes on-line user information requirements and relevance feedback either to reinforce current indexing in case of success or to generate new knowledge in case of failure. Thus, the user continually augments and refines the intelligence of the retrieval system. This allows the CID system to provide helpful responses, based on previous usage of the documentation, and to improve its performance over time. We successfully tested the CID system with users of the Space Station Freedom requirements documents. We are currently extending CID to other application domains (Space Shuttle operations documents, airplane maintenance manuals, and on-line training). We are also exploring the potential commercialization of this technique

    A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis

    Full text link
    Pre-trained large language models (LLMs) have recently achieved better generalization and sample efficiency in autonomous web navigation. However, the performance on real-world websites has still suffered from (1) open domainness, (2) limited context length, and (3) lack of inductive bias on HTML. We introduce WebAgent, an LLM-driven agent that can complete the tasks on real websites following natural language instructions. WebAgent plans ahead by decomposing instructions into canonical sub-instructions, summarizes long HTML documents into task-relevant snippets, and acts on websites via generated Python programs from those. We design WebAgent with Flan-U-PaLM, for grounded code generation, and HTML-T5, new pre-trained LLMs for long HTML documents using local and global attention mechanisms and a mixture of long-span denoising objectives, for planning and summarization. We empirically demonstrate that our recipe improves the success on a real website by over 50%, and that HTML-T5 is the best model to solve HTML-based tasks; achieving 14.9% higher success rate than prior SoTA on the MiniWoB web navigation benchmark and better accuracy on offline task planning evaluation

    Understanding HTML with Large Language Models

    Full text link
    Large language models (LLMs) have shown exceptional performance on a variety of natural language tasks. Yet, their capabilities for HTML understanding -- i.e., parsing the raw HTML of a webpage, with applications to automation of web-based tasks, crawling, and browser-assisted retrieval -- have not been fully explored. We contribute HTML understanding models (fine-tuned LLMs) and an in-depth analysis of their capabilities under three tasks: (i) Semantic Classification of HTML elements, (ii) Description Generation for HTML inputs, and (iii) Autonomous Web Navigation of HTML pages. While previous work has developed dedicated architectures and training procedures for HTML understanding, we show that LLMs pretrained on standard natural language corpora transfer remarkably well to HTML understanding tasks. For instance, fine-tuned LLMs are 12% more accurate at semantic classification compared to models trained exclusively on the task dataset. Moreover, when fine-tuned on data from the MiniWoB benchmark, LLMs successfully complete 50% more tasks using 192x less data compared to the previous best supervised model. Out of the LLMs we evaluate, we show evidence that T5-based models are ideal due to their bidirectional encoder-decoder architecture. To promote further research on LLMs for HTML understanding, we create and open-source a large-scale HTML dataset distilled and auto-labeled from CommonCrawl

    Visual Occam: High level visualization and design of process networks

    Full text link
    With networks, multiprocessors, and multi-threaded systems becoming more common in our world it is increasingly evident that concurrent programming is not something to be ignored or marginalized even though many takes on concurrency (mainly by means of monitors or shared resources) have proven to be difficult to deal with on large scales. Thankfully, a good deal of work has already been done to combat this, through CSP, occam, and other such derivatives, to produce a scalable process oriented paradigm. Still, it is cumbersome to attempt to deal with the intricacies of such communicating networks down to every minutia; if, instead, it was possible to manage communicating elements on a higher level it would be far more practical to design large scale networks of processes! As such, Visual Occam has been designed to automate some of the inner workings of occam to allow any user (novice or otherwise) the ability to create complex networks of communicating processes through easy to understand user interactions and interfaces. Taking a number of cues from digital circuit design software and modern integrated development environments, it is possible to select components (both predefined and arbitrarily complex user created systems) from a library of objects, hook them together in a network, and produce compilable code without having to worry about how or why the chosen components perform their function. Since any of these components may themselves be networks of processes, it becomes trivial to construct large systems that would otherwise be unwieldy to put together by hand. The end result? A high level, easy to understand, visual abstraction of those concurrent networks previously so frustrating to develop

    Index to 1984 NASA Tech Briefs, volume 9, numbers 1-4

    Get PDF
    Short announcements of new technology derived from the R&D activities of NASA are presented. These briefs emphasize information considered likely to be transferrable across industrial, regional, or disciplinary lines and are issued to encourage commercial application. This index for 1984 Tech B Briefs contains abstracts and four indexes: subject, personal author, originating center, and Tech Brief Number. The following areas are covered: electronic components and circuits, electronic systems, physical sciences, materials, life sciences, mechanics, machinery, fabrication technology, and mathematics and information sciences

    MyBookStore-eshopping for books

    Get PDF
    Master of ScienceDepartment of Computing and Information SciencesDaniel A. AndresenThe Web is a shopper's paradise boasting every kind of product imaginable — plus many more that are almost unimaginable. People find it easy and secure to shop online these days thereby saving time and also have more options to choose from at their fingertips. Based on this comes MyBookStore, a neat web application designed to exclusively cater the needs of students for purchasing books online. Primary focus of this application is to ease the use of searching for a particular book by the user and also navigability within the website. A sophisticated search engine has been designed in this application which filters the products based on various user criterions. Searching and viewing the details about a book is available. This also has an administrator side through which the administrator can update the website with new products, remove any of the available products, and add new categories, subcategories and products along with updating the shipping status of orders placed. This section is majorly responsible for user accounts maintenance, product maintenance as well as orders maintenance. Major emphasis of this application is to build user interactive search techniques for simplifying user needs and to provide specific products as required by the user
    corecore