801 research outputs found
Active End-Effector Pose Selection for Tactile Object Recognition through Monte Carlo Tree Search
This paper considers the problem of active object recognition using touch
only. The focus is on adaptively selecting a sequence of wrist poses that
achieves accurate recognition by enclosure grasps. It seeks to minimize the
number of touches and maximize recognition confidence. The actions are
formulated as wrist poses relative to each other, making the algorithm
independent of absolute workspace coordinates. The optimal sequence is
approximated by Monte Carlo tree search. We demonstrate results in a physics
engine and on a real robot. In the physics engine, most object instances were
recognized in at most 16 grasps. On a real robot, our method recognized objects
in 2--9 grasps and outperformed a greedy baseline.Comment: Accepted to International Conference on Intelligent Robots and
Systems (IROS) 201
Active End-Effector Pose Selection for Tactile Object Recognition through Monte Carlo Tree Search
This paper considers the problem of active object recognition using touch
only. The focus is on adaptively selecting a sequence of wrist poses that
achieves accurate recognition by enclosure grasps. It seeks to minimize the
number of touches and maximize recognition confidence. The actions are
formulated as wrist poses relative to each other, making the algorithm
independent of absolute workspace coordinates. The optimal sequence is
approximated by Monte Carlo tree search. We demonstrate results in a physics
engine and on a real robot. In the physics engine, most object instances were
recognized in at most 16 grasps. On a real robot, our method recognized objects
in 2--9 grasps and outperformed a greedy baseline.Comment: Accepted to International Conference on Intelligent Robots and
Systems (IROS) 201
Multimodality in VR: A survey
Virtual reality (VR) is rapidly growing, with the potential to change the way we create and consume content. In VR, users integrate multimodal sensory information they receive, to create a unified perception of the virtual world. In this survey, we review the body of work addressing multimodality in VR, and its role and benefits in user experience, together with different applications that leverage multimodality in many disciplines. These works thus encompass several fields of research, and demonstrate that multimodality plays a fundamental role in VR; enhancing the experience, improving overall performance, and yielding unprecedented abilities in skill and knowledge transfer
Learning efficient haptic shape exploration with a rigid tactile sensor array
Haptic exploration is a key skill for both robots and humans to discriminate
and handle unknown objects or to recognize familiar objects. Its active nature
is evident in humans who from early on reliably acquire sophisticated
sensory-motor capabilities for active exploratory touch and directed manual
exploration that associates surfaces and object properties with their spatial
locations. This is in stark contrast to robotics. In this field, the relative
lack of good real-world interaction models - along with very restricted sensors
and a scarcity of suitable training data to leverage machine learning methods -
has so far rendered haptic exploration a largely underdeveloped skill. In the
present work, we connect recent advances in recurrent models of visual
attention with previous insights about the organisation of human haptic search
behavior, exploratory procedures and haptic glances for a novel architecture
that learns a generative model of haptic exploration in a simulated
three-dimensional environment. The proposed algorithm simultaneously optimizes
main perception-action loop components: feature extraction, integration of
features over time, and the control strategy, while continuously acquiring data
online. We perform a multi-module neural network training, including a feature
extractor and a recurrent neural network module aiding pose control for storing
and combining sequential sensory data. The resulting haptic meta-controller for
the rigid tactile sensor array moving in a physics-driven
simulation environment, called the Haptic Attention Model, performs a sequence
of haptic glances, and outputs corresponding force measurements. The resulting
method has been successfully tested with four different objects. It achieved
results close to while performing object contour exploration that has
been optimized for its own sensor morphology
Robot in the mirror: toward an embodied computational model of mirror self-recognition
Self-recognition or self-awareness is a capacity attributed typically only to
humans and few other species. The definitions of these concepts vary and little
is known about the mechanisms behind them. However, there is a Turing test-like
benchmark: the mirror self-recognition, which consists in covertly putting a
mark on the face of the tested subject, placing her in front of a mirror, and
observing the reactions. In this work, first, we provide a mechanistic
decomposition, or process model, of what components are required to pass this
test. Based on these, we provide suggestions for empirical research. In
particular, in our view, the way the infants or animals reach for the mark
should be studied in detail. Second, we develop a model to enable the humanoid
robot Nao to pass the test. The core of our technical contribution is learning
the appearance representation and visual novelty detection by means of learning
the generative model of the face with deep auto-encoders and exploiting the
prediction error. The mark is identified as a salient region on the face and
reaching action is triggered, relying on a previously learned mapping to arm
joint angles. The architecture is tested on two robots with a completely
different face.Comment: To appear in KI - K\"unstliche Intelligenz - German Journal of
Artificial Intelligence - Springe
Multimodality in {VR}: {A} Survey
Virtual reality has the potential to change the way we create and consume content in our everyday life. Entertainment, training, design and manufacturing, communication, or advertising are all applications that already benefit from this new medium reaching consumer level. VR is inherently different from traditional media: it offers a more immersive experience, and has the ability to elicit a sense of presence through the place and plausibility illusions. It also gives the user unprecedented capabilities to explore their environment, in contrast with traditional media. In VR, like in the real world, users integrate the multimodal sensory information they receive to create a unified perception of the virtual world. Therefore, the sensory cues that are available in a virtual environment can be leveraged to enhance the final experience. This may include increasing realism, or the sense of presence; predicting or guiding the attention of the user through the experience; or increasing their performance if the experience involves the completion of certain tasks. In this state-of-the-art report, we survey the body of work addressing multimodality in virtual reality, its role and benefits in the final user experience. The works here reviewed thus encompass several fields of research, including computer graphics, human computer interaction, or psychology and perception. Additionally, we give an overview of different applications that leverage multimodal input in areas such as medicine, training and education, or entertainment; we include works in which the integration of multiple sensory information yields significant improvements, demonstrating how multimodality can play a fundamental role in the way VR systems are designed, and VR experiences created and consumed
Fusing Multimedia Data Into Dynamic Virtual Environments
In spite of the dramatic growth of virtual and augmented reality (VR and AR) technology, content creation for immersive and dynamic virtual environments remains a significant challenge. In this dissertation, we present our research in fusing multimedia data, including text, photos, panoramas, and multi-view videos, to create rich and compelling virtual environments.
First, we present Social Street View, which renders geo-tagged social media in its natural geo-spatial context provided by 360° panoramas. Our system takes into account visual saliency and uses maximal Poisson-disc placement with spatiotemporal filters to render social multimedia in an immersive setting. We also present a novel GPU-driven pipeline for saliency computation in 360° panoramas using spherical harmonics (SH). Our spherical residual model can be applied to virtual cinematography in 360° videos. We further present Geollery, a mixed-reality platform to render an interactive mirrored world in real time with three-dimensional (3D) buildings, user-generated content, and geo-tagged social media. Our user study has identified several use cases for these systems, including immersive social storytelling, experiencing the culture, and crowd-sourced tourism.
We next present Video Fields, a web-based interactive system to create, calibrate, and render dynamic videos overlaid on 3D scenes. Our system renders dynamic entities from multiple videos, using early and deferred texture sampling. Video Fields can be used for immersive surveillance in virtual environments. Furthermore, we present VRSurus and ARCrypt projects to explore the applications of gestures recognition, haptic feedback, and visual cryptography for virtual and augmented reality.
Finally, we present our work on Montage4D, a real-time system for seamlessly fusing multi-view video textures with dynamic meshes. We use geodesics on meshes with view-dependent rendering to mitigate spatial occlusion seams while maintaining temporal consistency. Our experiments show significant enhancement in rendering quality, especially for salient regions such as faces. We believe that Social Street View, Geollery, Video Fields, and Montage4D will greatly facilitate several applications such as virtual tourism, immersive telepresence, and remote education
- …