3,009 research outputs found
Tactile mesh saliency
While the concept of visual saliency has been previously explored in the areas of mesh and image processing, saliency detection also applies to other sensory stimuli. In this paper, we explore the problem of tactile mesh saliency, where we define salient points on a virtual mesh as those that a human is more likely to grasp, press, or touch if the mesh were a real-world object. We solve the problem of taking as input a 3D mesh and computing the relative tactile saliency of every mesh vertex. Since it is difficult to manually define a tactile saliency measure, we introduce a crowdsourcing and learning framework. It is typically easy for humans to provide relative rankings of saliency between vertices rather than absolute values. We thereby collect crowdsourced data of such relative rankings and take a learning-to-rank approach. We develop a new formulation to combine deep learning and learning-to-rank methods to compute a tactile saliency measure. We demonstrate our framework with a variety of 3D meshes and various applications including material suggestion for rendering and fabricatio
AVEID: Automatic Video System for Measuring Engagement In Dementia
Engagement in dementia is typically measured using behavior observational
scales (BOS) that are tedious and involve intensive manual labor to annotate,
and are therefore not easily scalable. We propose AVEID, a low cost and
easy-to-use video-based engagement measurement tool to determine the engagement
level of a person with dementia (PwD) during digital interaction. We show that
the objective behavioral measures computed via AVEID correlate well with
subjective expert impressions for the popular MPES and OME BOS, confirming its
viability and effectiveness. Moreover, AVEID measures can be obtained for a
variety of engagement designs, thereby facilitating large-scale studies with
PwD populations
Digging Deeper into Egocentric Gaze Prediction
This paper digs deeper into factors that influence egocentric gaze. Instead
of training deep models for this purpose in a blind manner, we propose to
inspect factors that contribute to gaze guidance during daily tasks. Bottom-up
saliency and optical flow are assessed versus strong spatial prior baselines.
Task-specific cues such as vanishing point, manipulation point, and hand
regions are analyzed as representatives of top-down information. We also look
into the contribution of these factors by investigating a simple recurrent
neural model for ego-centric gaze prediction. First, deep features are
extracted for all input video frames. Then, a gated recurrent unit is employed
to integrate information over time and to predict the next fixation. We also
propose an integrated model that combines the recurrent model with several
top-down and bottom-up cues. Extensive experiments over multiple datasets
reveal that (1) spatial biases are strong in egocentric videos, (2) bottom-up
saliency models perform poorly in predicting gaze and underperform spatial
biases, (3) deep features perform better compared to traditional features, (4)
as opposed to hand regions, the manipulation point is a strong influential cue
for gaze prediction, (5) combining the proposed recurrent model with bottom-up
cues, vanishing points and, in particular, manipulation point results in the
best gaze prediction accuracy over egocentric videos, (6) the knowledge
transfer works best for cases where the tasks or sequences are similar, and (7)
task and activity recognition can benefit from gaze prediction. Our findings
suggest that (1) there should be more emphasis on hand-object interaction and
(2) the egocentric vision community should consider larger datasets including
diverse stimuli and more subjects.Comment: presented at WACV 201
Robot in the mirror: toward an embodied computational model of mirror self-recognition
Self-recognition or self-awareness is a capacity attributed typically only to
humans and few other species. The definitions of these concepts vary and little
is known about the mechanisms behind them. However, there is a Turing test-like
benchmark: the mirror self-recognition, which consists in covertly putting a
mark on the face of the tested subject, placing her in front of a mirror, and
observing the reactions. In this work, first, we provide a mechanistic
decomposition, or process model, of what components are required to pass this
test. Based on these, we provide suggestions for empirical research. In
particular, in our view, the way the infants or animals reach for the mark
should be studied in detail. Second, we develop a model to enable the humanoid
robot Nao to pass the test. The core of our technical contribution is learning
the appearance representation and visual novelty detection by means of learning
the generative model of the face with deep auto-encoders and exploiting the
prediction error. The mark is identified as a salient region on the face and
reaching action is triggered, relying on a previously learned mapping to arm
joint angles. The architecture is tested on two robots with a completely
different face.Comment: To appear in KI - K\"unstliche Intelligenz - German Journal of
Artificial Intelligence - Springe
DRLViz: Understanding Decisions and Memory in Deep Reinforcement Learning
We present DRLViz, a visual analytics interface to interpret the internal
memory of an agent (e.g. a robot) trained using deep reinforcement learning.
This memory is composed of large temporal vectors updated when the agent moves
in an environment and is not trivial to understand due to the number of
dimensions, dependencies to past vectors, spatial/temporal correlations, and
co-correlation between dimensions. It is often referred to as a black box as
only inputs (images) and outputs (actions) are intelligible for humans. Using
DRLViz, experts are assisted to interpret decisions using memory reduction
interactions, and to investigate the role of parts of the memory when errors
have been made (e.g. wrong direction). We report on DRLViz applied in the
context of video games simulators (ViZDoom) for a navigation scenario with item
gathering tasks. We also report on experts evaluation using DRLViz, and
applicability of DRLViz to other scenarios and navigation problems beyond
simulation games, as well as its contribution to black box models
interpretability and explainability in the field of visual analytics
- …