22,564 research outputs found

    Evaluating Visual Conversational Agents via Cooperative Human-AI Games

    Full text link
    As AI continues to advance, human-AI teams are inevitable. However, progress in AI is routinely measured in isolation, without a human in the loop. It is crucial to benchmark progress in AI, not just in isolation, but also in terms of how it translates to helping humans perform certain tasks, i.e., the performance of human-AI teams. In this work, we design a cooperative game - GuessWhich - to measure human-AI team performance in the specific context of the AI being a visual conversational agent. GuessWhich involves live interaction between the human and the AI. The AI, which we call ALICE, is provided an image which is unseen by the human. Following a brief description of the image, the human questions ALICE about this secret image to identify it from a fixed pool of images. We measure performance of the human-ALICE team by the number of guesses it takes the human to correctly identify the secret image after a fixed number of dialog rounds with ALICE. We compare performance of the human-ALICE teams for two versions of ALICE. Our human studies suggest a counterintuitive trend - that while AI literature shows that one version outperforms the other when paired with an AI questioner bot, we find that this improvement in AI-AI performance does not translate to improved human-AI performance. This suggests a mismatch between benchmarking of AI in isolation and in the context of human-AI teams.Comment: HCOMP 201

    No Grice: Computers that Lie, Deceive and Conceal

    Get PDF
    In the future our daily life interactions with other people, with computers, robots and smart environments will be recorded and interpreted by computers or embedded intelligence in environments, furniture, robots, displays, and wearables. These sensors record our activities, our behavior, and our interactions. Fusion of such information and reasoning about such information makes it possible, using computational models of human behavior and activities, to provide context- and person-aware interpretations of human behavior and activities, including determination of attitudes, moods, and emotions. Sensors include cameras, microphones, eye trackers, position and proximity sensors, tactile or smell sensors, et cetera. Sensors can be embedded in an environment, but they can also move around, for example, if they are part of a mobile social robot or if they are part of devices we carry around or are embedded in our clothes or body. \ud \ud Our daily life behavior and daily life interactions are recorded and interpreted. How can we use such environments and how can such environments use us? Do we always want to cooperate with these environments; do these environments always want to cooperate with us? In this paper we argue that there are many reasons that users or rather human partners of these environments do want to keep information about their intentions and their emotions hidden from these smart environments. On the other hand, their artificial interaction partner may have similar reasons to not give away all information they have or to treat their human partner as an opponent rather than someone that has to be supported by smart technology.\ud \ud This will be elaborated in this paper. We will survey examples of human-computer interactions where there is not necessarily a goal to be explicit about intentions and feelings. In subsequent sections we will look at (1) the computer as a conversational partner, (2) the computer as a butler or diary companion, (3) the computer as a teacher or a trainer, acting in a virtual training environment (a serious game), (4) sports applications (that are not necessarily different from serious game or education environments), and games and entertainment applications

    Can You Explain That? Lucid Explanations Help Human-AI Collaborative Image Retrieval

    Full text link
    While there have been many proposals on making AI algorithms explainable, few have attempted to evaluate the impact of AI-generated explanations on human performance in conducting human-AI collaborative tasks. To bridge the gap, we propose a Twenty-Questions style collaborative image retrieval game, Explanation-assisted Guess Which (ExAG), as a method of evaluating the efficacy of explanations (visual evidence or textual justification) in the context of Visual Question Answering (VQA). In our proposed ExAG, a human user needs to guess a secret image picked by the VQA agent by asking natural language questions to it. We show that overall, when AI explains its answers, users succeed more often in guessing the secret image correctly. Notably, a few correct explanations can readily improve human performance when VQA answers are mostly incorrect as compared to no-explanation games. Furthermore, we also show that while explanations rated as "helpful" significantly improve human performance, "incorrect" and "unhelpful" explanations can degrade performance as compared to no-explanation games. Our experiments, therefore, demonstrate that ExAG is an effective means to evaluate the efficacy of AI-generated explanations on a human-AI collaborative task.Comment: 2019 AAAI Conference on Human Computation and Crowdsourcin

    The Hanabi Challenge: A New Frontier for AI Research

    Full text link
    From the early days of computing, games have been important testbeds for studying how well machines can do sophisticated decision making. In recent years, machine learning has made dramatic advances with artificial agents reaching superhuman performance in challenge domains like Go, Atari, and some variants of poker. As with their predecessors of chess, checkers, and backgammon, these game domains have driven research by providing sophisticated yet well-defined challenges for artificial intelligence practitioners. We continue this tradition by proposing the game of Hanabi as a new challenge domain with novel problems that arise from its combination of purely cooperative gameplay with two to five players and imperfect information. In particular, we argue that Hanabi elevates reasoning about the beliefs and intentions of other agents to the foreground. We believe developing novel techniques for such theory of mind reasoning will not only be crucial for success in Hanabi, but also in broader collaborative efforts, especially those with human partners. To facilitate future research, we introduce the open-source Hanabi Learning Environment, propose an experimental framework for the research community to evaluate algorithmic advances, and assess the performance of current state-of-the-art techniques.Comment: 32 pages, 5 figures, In Press (Artificial Intelligence
    • ā€¦
    corecore