6,625 research outputs found

    A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis

    Full text link
    Pre-trained large language models (LLMs) have recently achieved better generalization and sample efficiency in autonomous web navigation. However, the performance on real-world websites has still suffered from (1) open domainness, (2) limited context length, and (3) lack of inductive bias on HTML. We introduce WebAgent, an LLM-driven agent that can complete the tasks on real websites following natural language instructions. WebAgent plans ahead by decomposing instructions into canonical sub-instructions, summarizes long HTML documents into task-relevant snippets, and acts on websites via generated Python programs from those. We design WebAgent with Flan-U-PaLM, for grounded code generation, and HTML-T5, new pre-trained LLMs for long HTML documents using local and global attention mechanisms and a mixture of long-span denoising objectives, for planning and summarization. We empirically demonstrate that our recipe improves the success on a real website by over 50%, and that HTML-T5 is the best model to solve HTML-based tasks; achieving 14.9% higher success rate than prior SoTA on the MiniWoB web navigation benchmark and better accuracy on offline task planning evaluation

    Augmented Behavioral Annotation Tools, with Application to Multimodal Datasets and Models: A Systematic Review

    Get PDF
    Annotation tools are an essential component in the creation of datasets for machine learning purposes. Annotation tools have evolved greatly since the turn of the century, and now commonly include collaborative features to divide labor efficiently, as well as automation employed to amplify human efforts. Recent developments in machine learning models, such as Transformers, allow for training upon very large and sophisticated multimodal datasets and enable generalization across domains of knowledge. These models also herald an increasing emphasis on prompt engineering to provide qualitative fine-tuning upon the model itself, adding a novel emerging layer of direct machine learning annotation. These capabilities enable machine intelligence to recognize, predict, and emulate human behavior with much greater accuracy and nuance, a noted shortfall of which have contributed to algorithmic injustice in previous techniques. However, the scale and complexity of training data required for multimodal models presents engineering challenges. Best practices for conducting annotation for large multimodal models in the most safe and ethical, yet efficient, manner have not been established. This paper presents a systematic literature review of crowd and machine learning augmented behavioral annotation methods to distill practices that may have value in multimodal implementations, cross-correlated across disciplines. Research questions were defined to provide an overview of the evolution of augmented behavioral annotation tools in the past, in relation to the present state of the art. (Contains five figures and four tables)

    Bildung in der digitalen Transformation

    Get PDF
    Die Coronapandemie und der durch sie erzwungene zeitweise Übergang von Präsenz- zu Distanzlehre haben die Digitalisierung des Bildungswesens enorm vorangetrieben. Noch deutlicher als vorher traten dabei positive wie negative Aspekte dieser Entwicklung zum Vorschein. Während den Hochschulen der Wechsel mit vergleichsweise geringen Reibungsverlusten gelang, offenbarten sich diese an Schulen weitaus deutlicher. Trotz aller Widrigkeiten erscheint eines klar: Die zeitweisen Veränderungen werden Nachwirkungen zeigen. Eine völlige Rückkehr zum Status quo ante ist kaum noch vorstellbar. Zwei Fragen bestimmen vor diesem Hintergrund die Doppelgesichtigkeit des Themas der 29. Jahrestagung der Gesellschaft für Medien in der Wissenschaft (GMW). Erstens: Wie ‚funktioniert‘ Bildung in der sich derzeit ereignenden digitalen Transformation und welche Herausforderungen gibt es? Und zweitens: Befindet sich möglicherweise Bildung selbst in der Transformation? Beiträge zu diesen und weiteren Fragen vereint der vorliegende Tagungsband

    CAMEL: Communicative Agents for "Mind" Exploration of Large Scale Language Model Society

    Full text link
    The rapid advancement of conversational and chat-based language models has led to remarkable progress in complex task-solving. However, their success heavily relies on human input to guide the conversation, which can be challenging and time-consuming. This paper explores the potential of building scalable techniques to facilitate autonomous cooperation among communicative agents and provide insight into their "cognitive" processes. To address the challenges of achieving autonomous cooperation, we propose a novel communicative agent framework named role-playing. Our approach involves using inception prompting to guide chat agents toward task completion while maintaining consistency with human intentions. We showcase how role-playing can be used to generate conversational data for studying the behaviors and capabilities of chat agents, providing a valuable resource for investigating conversational language models. Our contributions include introducing a novel communicative agent framework, offering a scalable approach for studying the cooperative behaviors and capabilities of multi-agent systems, and open-sourcing our library to support research on communicative agents and beyond. The GitHub repository of this project is made publicly available on: https://github.com/lightaime/camel

    On Transforming Reinforcement Learning by Transformer: The Development Trajectory

    Full text link
    Transformer, originally devised for natural language processing, has also attested significant success in computer vision. Thanks to its super expressive power, researchers are investigating ways to deploy transformers to reinforcement learning (RL) and the transformer-based models have manifested their potential in representative RL benchmarks. In this paper, we collect and dissect recent advances on transforming RL by transformer (transformer-based RL or TRL), in order to explore its development trajectory and future trend. We group existing developments in two categories: architecture enhancement and trajectory optimization, and examine the main applications of TRL in robotic manipulation, text-based games, navigation and autonomous driving. For architecture enhancement, these methods consider how to apply the powerful transformer structure to RL problems under the traditional RL framework, which model agents and environments much more precisely than deep RL methods, but they are still limited by the inherent defects of traditional RL algorithms, such as bootstrapping and "deadly triad". For trajectory optimization, these methods treat RL problems as sequence modeling and train a joint state-action model over entire trajectories under the behavior cloning framework, which are able to extract policies from static datasets and fully use the long-sequence modeling capability of the transformer. Given these advancements, extensions and challenges in TRL are reviewed and proposals about future direction are discussed. We hope that this survey can provide a detailed introduction to TRL and motivate future research in this rapidly developing field.Comment: 26 page

    Handbuch kommunikationswissenschaftliche Erinnerungsforschung

    Get PDF

    One-sided differentiability: a challenge for computer algebra systems

    Get PDF
    Computer Algebra Systems (CASs) are extremely powerful and widely used digital tools. Focusing on differentiation, CASs include a command that computes the derivative of functions in one variable (and also the partial derivative of functions in several variables). We will focus in this article on real-valued functions of one real variable. Since CASs usually compute the derivative of real-valued functions as a whole, the value of the computed derivative at points where the left derivative and the right derivative are different (that we will call conflicting points) should be something like "undefined", although this isn't always the case: the output could strongly differ depending on the chosen CAS. We have analysed and compared in this article how some well-known CASs behave when addressing differentiation at the conflicting points of five different functions chosen by the authors. Finally, the ability for calculating one-sided limits of CASs allows to directly compute the result in these cumbersome cases using the formal definition of one-sided derivative, which we have also analysed and compared for the selected CASs. Regarding teaching, this is an important issue, as it is a topic of Secondary Education and nowadays the use of CASs as an auxiliary digital tool for teaching mathematics is very common

    Towards a Burning Method: how might the contemporary performer build on the legacy of Grotowski's Total Act?

    Get PDF
    Towards a Burning Method is a practical-theoretical investigation in the context of the contemporary theatre practice of Jerzy Grotowski's ideas, particularly the Total Act from the Theatre of Productions period. The study employs a practice of confrontation with Grotowski's ideas rather than identification, and text and practice are inseparable, interpenetrating each other. The method of confrontation (which is one of Grotowski's ideas) means, in the context of my research, a dialogue which through its dynamics creates a performative situation, stimulates the deepening of knowledge, and seeks its own answers. The resulting performance Burning Method - Four Lectures on Conditional Love is also integral to the theoretical reflections, both representing and containing Grotowski's ideas and my response to them. Other practical activities include video notes of process, classes with students and workshops. Oneof the important results of the research is to broaden the understanding of what performance is, what kind of forms it takes in relation to the Total Act. The thesis consists of four chapters. Chapter One introduces the cultural-historical context of the emergence of the Total Act, while the following chapters are a guide to understanding the Act in practice. The final chapter is entirely devoted to my practical activities through a dialogue on contemporary performance, its lineage of origin and projective reflections for the future. A reference point or dialogue partner on contemporary theatre is mainly Grotowski's book Towards a Poor Theatre but also Artaud's The Theatre and its Double. The research has helped me understand what performance work might result from thecontinuation of a distinctly masculine Polish Romantic tradition, and what new possibilities emerge from my perspective as a female performer and theatre-maker. In reading Grotowski (and confronting it in practice) I discover the potential for creative freedom, with the rigour of attention to the execution of ideas. At the same time, working with video projections I verify my thinking about the performer's body and space by finding another dimension of expression in the image in relation to the audience. I also believe that actor training from this tradition has no temporal, cultural or other limitations as its essence is the search for the living impulse in the performer's body. Summing up the research is an opening for further explorations revealing the potential and currency of Grotowski's ideas for today
    corecore