7 research outputs found

    ScriptWorld: Text Based Environment For Learning Procedural Knowledge

    Full text link
    Text-based games provide a framework for developing natural language understanding and commonsense knowledge about the world in reinforcement learning based agents. Existing text-based environments often rely on fictional situations and characters to create a gaming framework and are far from real-world scenarios. In this paper, we introduce ScriptWorld: a text-based environment for teaching agents about real-world daily chores and hence imparting commonsense knowledge. To the best of our knowledge, it is the first interactive text-based gaming framework that consists of daily real-world human activities designed using scripts dataset. We provide gaming environments for 10 daily activities and perform a detailed analysis of the proposed environment. We develop RL-based baseline models/agents to play the games in Scriptworld. To understand the role of language models in such environments, we leverage features obtained from pre-trained language models in the RL agents. Our experiments show that prior knowledge obtained from a pre-trained language model helps to solve real-world text-based gaming environments. We release the environment via Github: https://github.com/Exploration-Lab/ScriptWorldComment: Accepted at IJCAI 2023, 26 Pages (7 main + 19 for appendix

    Learn to automate GUI tasks from demonstration

    Get PDF
    This thesis explores and extends Computer Vision applications in the context of Graphical User Interface (GUI) environments to address the challenges of Programming by Demonstration (PbD). The challenges are explored in PbD which could be addressed through innovations in Computer Vision, when GUIs are treated as an application domain, analogous to automotive or factory settings. Existing PbD systems were restricted by domain applications or special application interfaces. Although they use the term Demonstration, the systems did not actually see what the user performs. Rather they listen to the demonstrations through internal communications via operating system. Machine Vision and Human in the Loop Machine Learning are used to circumvent many restrictions, allowing the PbD system to watch the demonstration like another human observer would. This thesis will demonstrate that our prototype PbD systems allow non-programmer users to easily create their own automation scripts for their repetitive and looping tasks. Our PbD systems take their input from sequences of screenshots, and sometimes from easily available keyboard and mouse sniffer software. It will also be shown that the problem of inconsistent human demonstration can be remedied with our proposed Human in the Loop Computer Vision techniques. Lastly, the problem is extended to learn from demonstration videos. Due to the sheer complexity of computer desktop GUI manipulation videos, attention is focused on the domain of video game environments. The initial studies illustrate that it is possible to teach a computer to watch gameplay videos and to estimate what buttons the user pressed

    Decompositional Semantics for Events, Participants, and Scripts in Text

    Get PDF
    This thesis presents a sequence of practical and conceptual developments in decompositional meaning representations for events, participants, and scripts in text under the framework of Universal Decompositional Semantics (UDS) (White et al., 2016a). Part I of the thesis focuses on the semantic representation of individual events and their participants. Chapter 3 examines the feasibility of deriving semantic representations of events from dependency syntax; we demonstrate that predicate- argument structure may be extracted from syntax, but other desirable semantic attributes are not directly discernible. Accordingly, we present in Chapters 4 and 5 state of the art models for predicting these semantic attributes from text. Chapter 4 presents a model for predicting semantic proto-role labels (SPRL), attributes of participants in events based on Dowty’s seminal theory of thematic proto-roles (Dowty, 1991). In Chapter 5 we present a model of event factuality prediction (EFP), the task of determining whether an event mentioned in text happened (according to the meaning of the text). Both chapters include extensive experiments on multi-task learning for improving performance on each semantic prediction task. Taken together, Chapters 3, 4, and 5 represent the development of individual components of a UDS parsing pipeline. In Part II of the thesis, we shift to modeling sequences of events, or scripts (Schank and Abelson, 1977). Chapter 7 presents a case study in script induction using a collection of restaurant narratives from an online blog to learn the canonical “Restaurant Script.” In Chapter 8, we introduce a simple discriminative neural model for script induction based on narrative chains (Chambers and Jurafsky, 2008) that outperforms prior methods. Because much existing work on narrative chains employs semantically impoverished representations of events, Chapter 9 draws on the contributions of Part I to learn narrative chains with semantically rich, decompositional event representations. Finally, in Chapter 10, we observe that corpus based approaches to script induction resemble the task of language modeling. We explore the broader question of the relationship between language modeling and acquisition of common-sense knowledge, and introduce an approach that combines language modeling and light human supervision to construct datasets for common-sense inference
    corecore