4,284 research outputs found
HoME: a Household Multimodal Environment
We introduce HoME: a Household Multimodal Environment for artificial agents
to learn from vision, audio, semantics, physics, and interaction with objects
and other agents, all within a realistic context. HoME integrates over 45,000
diverse 3D house layouts based on the SUNCG dataset, a scale which may
facilitate learning, generalization, and transfer. HoME is an open-source,
OpenAI Gym-compatible platform extensible to tasks in reinforcement learning,
language grounding, sound-based navigation, robotics, multi-agent learning, and
more. We hope HoME better enables artificial agents to learn as humans do: in
an interactive, multimodal, and richly contextualized setting.Comment: Presented at NIPS 2017's Visually-Grounded Interaction and Language
Worksho
Diagrams Based on Structured Object Perception
Most diagrams, particularly those used in software engineering, are line drawings consisting of nodes drawn as rectangles or circles, and edges drawn as lines linking them. In the present paper we review some of the literature on human perception to develop guidelines for effective diagram drawing. Particular attention is paid to structural object recognition theory. According to this theory as objects are perceived they are decomposed into 3D set of primitives called geons, together with the skeleton structure connecting them. We present a set of guidelines for drawing variations on node-link diagrams using geon-like primitives, and provide some examples. Results from three experiments are reported that evaluate 3D geon diagrams in comparison with 2D UML (Unified Modeling Language) diagrams. The first experiment measures the time and accuracy for a subject to recognize a sub-structure of a diagram represented either using geon primitives or UML primitives. The second and third experiments compare the accuracy of recalling geon vs. UML diagrams. The results of these experiments show that geon diagrams can be visually analyzed more rapidly, with fewer errors, and can be remembered better in comparison with equivalent UML diagrams
Perception, cognition, and action in hyperspaces: implications on brain plasticity, learning, and cognition
We live in a three-dimensional (3D) spatial world; however, our retinas receive a pair of 2D projections of the 3D environment. By using multiple cues, such as disparity, motion parallax, perspective, our brains can construct 3D representations of the world from the 2D projections on our retinas. These 3D representations underlie our 3D perceptions of the world and are mapped into our motor systems to generate accurate sensorimotor behaviors. Three-dimensional perceptual and sensorimotor capabilities emerge during development: the physiology of the growing baby changes hence necessitating an ongoing re-adaptation of the mapping between 3D sensory representations and the motor coordinates. This adaptation continues in adulthood and is quite general to successfully deal with joint-space changes (longer arms due to growth), skull and eye size changes (and still being able of accurate eye movements), etc. A fundamental question is whether our brains are inherently limited to 3D representations of the environment because we are living in a 3D world, or alternatively, our brains may have the inherent capability and plasticity of representing arbitrary dimensions; however, 3D representations emerge from the fact that our development and learning take place in a 3D world. Here, we review research related to inherent capabilities and limitations of brain plasticity in terms of its spatial representations and discuss whether with appropriate training, humans can build perceptual and sensorimotor representations of spatial 4D environments, and how the presence or lack of ability of a solid and direct 4D representation can reveal underlying neural representations of space.Published versio
Recommended from our members
Towards Rapid Generation and Visualisation of Large 3D Urban Landscapes for Mobile Device Navigation
In this paper a procedural 3D modelling solution for mobile devices is presented based on scripting algorithms allowing for both the automatic and also semi-automatic creation of photorealistic quality virtual urban content. The combination of aerial images, GIS data, 2D ground maps and terrestrial photographs as input data coupled with a user-friendly customized interface permits the automatic and interactive generation of large-scale, accurate, georeferenced and fully-textured 3D virtual city content, content that can be specially optimized for use with mobile devices but also with navigational tasks in mind. Furthermore, a user-centred mobile virtual reality (VR) visualisation and interaction tool operating on PDAs (Personal Digital Assistants) for pedestrian navigation is also discussed. Via this engine, the import and display of various navigational file formats (2D and 3D) is supported, including a comprehensive front-end user-friendly graphical user interface providing immersive virtual 3D navigation
When Computer Vision Gazes at Cognition
Joint attention is a core, early-developing form of social interaction. It is
based on our ability to discriminate the third party objects that other people
are looking at. While it has been shown that people can accurately determine
whether another person is looking directly at them versus away, little is known
about human ability to discriminate a third person gaze directed towards
objects that are further away, especially in unconstraint cases where the
looker can move her head and eyes freely. In this paper we address this
question by jointly exploring human psychophysics and a cognitively motivated
computer vision model, which can detect the 3D direction of gaze from 2D face
images. The synthesis of behavioral study and computer vision yields several
interesting discoveries. (1) Human accuracy of discriminating targets
8{\deg}-10{\deg} of visual angle apart is around 40% in a free looking gaze
task; (2) The ability to interpret gaze of different lookers vary dramatically;
(3) This variance can be captured by the computational model; (4) Human
outperforms the current model significantly. These results collectively show
that the acuity of human joint attention is indeed highly impressive, given the
computational challenge of the natural looking task. Moreover, the gap between
human and model performance, as well as the variability of gaze interpretation
across different lookers, require further understanding of the underlying
mechanisms utilized by humans for this challenging task.Comment: Tao Gao and Daniel Harari contributed equally to this wor
- …