2,812 research outputs found

    GuessWhat?! Visual object discovery through multi-modal dialogue

    Get PDF
    We introduce GuessWhat?!, a two-player guessing game as a testbed for research on the interplay of computer vision and dialogue systems. The goal of the game is to locate an unknown object in a rich image scene by asking a sequence of questions. Higher-level image understanding, like spatial reasoning and language grounding, is required to solve the proposed task. Our key contribution is the collection of a large-scale dataset consisting of 150K human-played games with a total of 800K visual question-answer pairs on 66K images. We explain our design decisions in collecting the dataset and introduce the oracle and questioner tasks that are associated with the two players of the game. We prototyped deep learning models to establish initial baselines of the introduced tasks.Comment: 23 pages; CVPR 2017 submission; see https://guesswhat.a

    Regret-Minimizing Double Oracle for Extensive-Form Games

    Full text link
    By incorporating regret minimization, double oracle methods have demonstrated rapid convergence to Nash Equilibrium (NE) in normal-form games and extensive-form games, through algorithms such as online double oracle (ODO) and extensive-form double oracle (XDO), respectively. In this study, we further examine the theoretical convergence rate and sample complexity of such regret minimization-based double oracle methods, utilizing a unified framework called Regret-Minimizing Double Oracle. Based on this framework, we extend ODO to extensive-form games and determine its sample complexity. Moreover, we demonstrate that the sample complexity of XDO can be exponential in the number of information sets S|S|, owing to the exponentially decaying stopping threshold of restricted games. To solve this problem, we propose the Periodic Double Oracle (PDO) method, which has the lowest sample complexity among regret minimization-based double oracle methods, being only polynomial in S|S|. Empirical evaluations on multiple poker and board games show that PDO achieves significantly faster convergence than previous double oracle algorithms and reaches a competitive level with state-of-the-art regret minimization methods.Comment: Accepted at ICML, 202

    Sketching-out virtual humans: A smart interface for human modelling and animation

    Get PDF
    In this paper, we present a fast and intuitive interface for sketching out 3D virtual humans and animation. The user draws stick figure key frames first and chooses one for “fleshing-out” with freehand body contours. The system automatically constructs a plausible 3D skin surface from the rendered figure, and maps it onto the posed stick figures to produce the 3D character animation. A “creative model-based method” is developed, which performs a human perception process to generate 3D human bodies of various body sizes, shapes and fat distributions. In this approach, an anatomical 3D generic model has been created with three distinct layers: skeleton, fat tissue, and skin. It can be transformed sequentially through rigid morphing, fatness morphing, and surface fitting to match the original 2D sketch. An auto-beautification function is also offered to regularise the 3D asymmetrical bodies from users’ imperfect figure sketches. Our current system delivers character animation in various forms, including articulated figure animation, 3D mesh model animation, 2D contour figure animation, and even 2D NPR animation with personalised drawing styles. The system has been formally tested by various users on Tablet PC. After minimal training, even a beginner can create vivid virtual humans and animate them within minutes

    Synthesizing Training Data for Object Detection in Indoor Scenes

    Full text link
    Detection of objects in cluttered indoor environments is one of the key enabling functionalities for service robots. The best performing object detection approaches in computer vision exploit deep Convolutional Neural Networks (CNN) to simultaneously detect and categorize the objects of interest in cluttered scenes. Training of such models typically requires large amounts of annotated training data which is time consuming and costly to obtain. In this work we explore the ability of using synthetically generated composite images for training state-of-the-art object detectors, especially for object instance detection. We superimpose 2D images of textured object models into images of real environments at variety of locations and scales. Our experiments evaluate different superimposition strategies ranging from purely image-based blending all the way to depth and semantics informed positioning of the object models into real scenes. We demonstrate the effectiveness of these object detector training strategies on two publicly available datasets, the GMU-Kitchens and the Washington RGB-D Scenes v2. As one observation, augmenting some hand-labeled training data with synthetic examples carefully composed onto scenes yields object detectors with comparable performance to using much more hand-labeled data. Broadly, this work charts new opportunities for training detectors for new objects by exploiting existing object model repositories in either a purely automatic fashion or with only a very small number of human-annotated examples.Comment: Added more experiments and link to project webpag

    Crowdsourcing in Computer Vision

    Full text link
    Computer vision systems require large amounts of manually annotated data to properly learn challenging visual concepts. Crowdsourcing platforms offer an inexpensive method to capture human knowledge and understanding, for a vast number of visual perception tasks. In this survey, we describe the types of annotations computer vision researchers have collected using crowdsourcing, and how they have ensured that this data is of high quality while annotation effort is minimized. We begin by discussing data collection on both classic (e.g., object recognition) and recent (e.g., visual story-telling) vision tasks. We then summarize key design decisions for creating effective data collection interfaces and workflows, and present strategies for intelligently selecting the most important data instances to annotate. Finally, we conclude with some thoughts on the future of crowdsourcing in computer vision.Comment: A 69-page meta review of the field, Foundations and Trends in Computer Graphics and Vision, 201
    corecore