4,284 research outputs found

    HoME: a Household Multimodal Environment

    Full text link
    We introduce HoME: a Household Multimodal Environment for artificial agents to learn from vision, audio, semantics, physics, and interaction with objects and other agents, all within a realistic context. HoME integrates over 45,000 diverse 3D house layouts based on the SUNCG dataset, a scale which may facilitate learning, generalization, and transfer. HoME is an open-source, OpenAI Gym-compatible platform extensible to tasks in reinforcement learning, language grounding, sound-based navigation, robotics, multi-agent learning, and more. We hope HoME better enables artificial agents to learn as humans do: in an interactive, multimodal, and richly contextualized setting.Comment: Presented at NIPS 2017's Visually-Grounded Interaction and Language Worksho

    Diagrams Based on Structured Object Perception

    Get PDF
    Most diagrams, particularly those used in software engineering, are line drawings consisting of nodes drawn as rectangles or circles, and edges drawn as lines linking them. In the present paper we review some of the literature on human perception to develop guidelines for effective diagram drawing. Particular attention is paid to structural object recognition theory. According to this theory as objects are perceived they are decomposed into 3D set of primitives called geons, together with the skeleton structure connecting them. We present a set of guidelines for drawing variations on node-link diagrams using geon-like primitives, and provide some examples. Results from three experiments are reported that evaluate 3D geon diagrams in comparison with 2D UML (Unified Modeling Language) diagrams. The first experiment measures the time and accuracy for a subject to recognize a sub-structure of a diagram represented either using geon primitives or UML primitives. The second and third experiments compare the accuracy of recalling geon vs. UML diagrams. The results of these experiments show that geon diagrams can be visually analyzed more rapidly, with fewer errors, and can be remembered better in comparison with equivalent UML diagrams

    Perception, cognition, and action in hyperspaces: implications on brain plasticity, learning, and cognition

    Get PDF
    We live in a three-dimensional (3D) spatial world; however, our retinas receive a pair of 2D projections of the 3D environment. By using multiple cues, such as disparity, motion parallax, perspective, our brains can construct 3D representations of the world from the 2D projections on our retinas. These 3D representations underlie our 3D perceptions of the world and are mapped into our motor systems to generate accurate sensorimotor behaviors. Three-dimensional perceptual and sensorimotor capabilities emerge during development: the physiology of the growing baby changes hence necessitating an ongoing re-adaptation of the mapping between 3D sensory representations and the motor coordinates. This adaptation continues in adulthood and is quite general to successfully deal with joint-space changes (longer arms due to growth), skull and eye size changes (and still being able of accurate eye movements), etc. A fundamental question is whether our brains are inherently limited to 3D representations of the environment because we are living in a 3D world, or alternatively, our brains may have the inherent capability and plasticity of representing arbitrary dimensions; however, 3D representations emerge from the fact that our development and learning take place in a 3D world. Here, we review research related to inherent capabilities and limitations of brain plasticity in terms of its spatial representations and discuss whether with appropriate training, humans can build perceptual and sensorimotor representations of spatial 4D environments, and how the presence or lack of ability of a solid and direct 4D representation can reveal underlying neural representations of space.Published versio

    When Computer Vision Gazes at Cognition

    Get PDF
    Joint attention is a core, early-developing form of social interaction. It is based on our ability to discriminate the third party objects that other people are looking at. While it has been shown that people can accurately determine whether another person is looking directly at them versus away, little is known about human ability to discriminate a third person gaze directed towards objects that are further away, especially in unconstraint cases where the looker can move her head and eyes freely. In this paper we address this question by jointly exploring human psychophysics and a cognitively motivated computer vision model, which can detect the 3D direction of gaze from 2D face images. The synthesis of behavioral study and computer vision yields several interesting discoveries. (1) Human accuracy of discriminating targets 8{\deg}-10{\deg} of visual angle apart is around 40% in a free looking gaze task; (2) The ability to interpret gaze of different lookers vary dramatically; (3) This variance can be captured by the computational model; (4) Human outperforms the current model significantly. These results collectively show that the acuity of human joint attention is indeed highly impressive, given the computational challenge of the natural looking task. Moreover, the gap between human and model performance, as well as the variability of gaze interpretation across different lookers, require further understanding of the underlying mechanisms utilized by humans for this challenging task.Comment: Tao Gao and Daniel Harari contributed equally to this wor
    corecore