Search CORE

4,284 research outputs found

HoME: a Household Multimodal Environment

Author: Anand Ankesh
Brodeur Simon
Celotti Luca
Courville Aaron
Golemo Florian
Larochelle Hugo
Perez Ethan
Rouat Jean
Strub Florian
Publication venue
Publication date: 29/11/2017
Field of study

We introduce HoME: a Household Multimodal Environment for artificial agents to learn from vision, audio, semantics, physics, and interaction with objects and other agents, all within a realistic context. HoME integrates over 45,000 diverse 3D house layouts based on the SUNCG dataset, a scale which may facilitate learning, generalization, and transfer. HoME is an open-source, OpenAI Gym-compatible platform extensible to tasks in reinforcement learning, language grounding, sound-based navigation, robotics, multi-agent learning, and more. We hope HoME better enables artificial agents to learn as humans do: in an interactive, multimodal, and richly contextualized setting.Comment: Presented at NIPS 2017's Visually-Grounded Interaction and Language Worksho

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Diagrams Based on Structured Object Perception

Author: Irani Pourang
Ware Colin
Publication venue: University of New Hampshire Scholars\u27 Repository
Publication date: 01/05/2000
Field of study

Most diagrams, particularly those used in software engineering, are line drawings consisting of nodes drawn as rectangles or circles, and edges drawn as lines linking them. In the present paper we review some of the literature on human perception to develop guidelines for effective diagram drawing. Particular attention is paid to structural object recognition theory. According to this theory as objects are perceived they are decomposed into 3D set of primitives called geons, together with the skeleton structure connecting them. We present a set of guidelines for drawing variations on node-link diagrams using geon-like primitives, and provide some examples. Results from three experiments are reported that evaluate 3D geon diagrams in comparison with 2D UML (Unified Modeling Language) diagrams. The first experiment measures the time and accuracy for a subject to recognize a sub-structure of a diagram represented either using geon primitives or UML primitives. The second and third experiments compare the accuracy of recalling geon vs. UML diagrams. The results of these experiments show that geon diagrams can be visually analyzed more rapidly, with fewer errors, and can be remembered better in comparison with equivalent UML diagrams

UNH Scholars' Repository

Perception, cognition, and action in hyperspaces: implications on brain plasticity, learning, and cognition

Author: Ogmen Haluk
Shibata Kazuhisa
Yazdanbakhsh Arash
Publication venue: 'Frontiers Media SA'
Publication date: 22/01/2020
Field of study

We live in a three-dimensional (3D) spatial world; however, our retinas receive a pair of 2D projections of the 3D environment. By using multiple cues, such as disparity, motion parallax, perspective, our brains can construct 3D representations of the world from the 2D projections on our retinas. These 3D representations underlie our 3D perceptions of the world and are mapped into our motor systems to generate accurate sensorimotor behaviors. Three-dimensional perceptual and sensorimotor capabilities emerge during development: the physiology of the growing baby changes hence necessitating an ongoing re-adaptation of the mapping between 3D sensory representations and the motor coordinates. This adaptation continues in adulthood and is quite general to successfully deal with joint-space changes (longer arms due to growth), skull and eye size changes (and still being able of accurate eye movements), etc. A fundamental question is whether our brains are inherently limited to 3D representations of the environment because we are living in a 3D world, or alternatively, our brains may have the inherent capability and plasticity of representing arbitrary dimensions; however, 3D representations emerge from the fact that our development and learning take place in a 3D world. Here, we review research related to inherent capabilities and limitations of brain plasticity in terms of its spatial representations and discuss whether with appropriate training, humans can build perceptual and sensorimotor representations of spatial 4D environments, and how the presence or lack of ability of a solid and direct 4D representation can reveal underlying neural representations of space.Published versio

Boston University Institutional Repository (OpenBU)

Recommended from our members

Towards Rapid Generation and Visualisation of Large 3D Urban Landscapes for Mobile Device Navigation

Author: Baker S.
Brujic-Okretic V.
Gatzidis C.
Liarokapis F.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

In this paper a procedural 3D modelling solution for mobile devices is presented based on scripting algorithms allowing for both the automatic and also semi-automatic creation of photorealistic quality virtual urban content. The combination of aerial images, GIS data, 2D ground maps and terrestrial photographs as input data coupled with a user-friendly customized interface permits the automatic and interactive generation of large-scale, accurate, georeferenced and fully-textured 3D virtual city content, content that can be specially optimized for use with mobile devices but also with navigational tasks in mind. Furthermore, a user-centred mobile virtual reality (VR) visualisation and interaction tool operating on PDAs (Personal Digital Assistants) for pedestrian navigation is also discussed. Via this engine, the import and display of various navigational file formats (2D and 3D) is supported, including a comprehensive front-end user-friendly graphical user interface providing immersive virtual 3D navigation

City Research Online

Kingston University Research Repository

When Computer Vision Gazes at Cognition

Author: Gao Tao
Harari Daniel
Tenenbaum Joshua
Ullman Shimon
Publication venue
Publication date: 08/12/2014
Field of study

Joint attention is a core, early-developing form of social interaction. It is based on our ability to discriminate the third party objects that other people are looking at. While it has been shown that people can accurately determine whether another person is looking directly at them versus away, little is known about human ability to discriminate a third person gaze directed towards objects that are further away, especially in unconstraint cases where the looker can move her head and eyes freely. In this paper we address this question by jointly exploring human psychophysics and a cognitively motivated computer vision model, which can detect the 3D direction of gaze from 2D face images. The synthesis of behavioral study and computer vision yields several interesting discoveries. (1) Human accuracy of discriminating targets 8{\deg}-10{\deg} of visual angle apart is around 40% in a free looking gaze task; (2) The ability to interpret gaze of different lookers vary dramatically; (3) This variance can be captured by the computational model; (4) Human outperforms the current model significantly. These results collectively show that the acuity of human joint attention is indeed highly impressive, given the computational challenge of the natural looking task. Moreover, the gap between human and model performance, as well as the variability of gaze interpretation across different lookers, require further understanding of the underlying mechanisms utilized by humans for this challenging task.Comment: Tao Gao and Daniel Harari contributed equally to this wor

arXiv.org e-Print Archive

DSpace@MIT