2,451 research outputs found
A Software Retina for Egocentric & Robotic Vision Applications on Mobile Platforms
We present work in progress to develop a low-cost highly
integrated camera sensor for egocentric and robotic vision. Our underlying
approach is to address current limitations to image analysis by Deep
Convolutional Neural Networks, such as the requirement to learn simple
scale and rotation transformations, which contribute to the large computational
demands for training and opaqueness of the learned structure,
by applying structural constraints based on known properties of the human
visual system. We propose to apply a version of the retino-cortical
transform to reduce the dimensionality of the input image space by a
factor of ex100, and map this spatially to transform rotations and scale
changes into spatial shifts. By reducing the input image size accordingly,
and therefore learning requirements, we aim to develop compact and
lightweight egocentric and robot vision sensor using a smartphone as the
target platfor
Out of my real body: Cognitive neuroscience meets eating disorders
Clinical psychology is starting to explain eating disorders (ED) as the outcome of the interaction among cognitive, socio-emotional and interpersonal elements. In particular two influential models-the revised cognitive-interpersonal maintenance model and the transdiagnostic cognitive behavioral theory-identified possible key predisposing and maintaining factors. These models, even if very influential and able to provide clear suggestions for therapy, still are not able to provide answers to several critical questions: why do not all the individuals with obsessive compulsive features, anxious avoidance or with a dysfunctional scheme for self-evaluation develop an ED? What is the role of the body experience in the etiology of these disorders? In this paper we suggest that the path to a meaningful answer requires the integration of these models with the recent outcomes of cognitive neuroscience. First, our bodily representations are not just a way to map an external space but the main tool we use to generate meaning, organize our experience, and shape our social identity. In particular, we will argue that our bodily experience evolves over time by integrating six different representations of the body characterized by specific pathologies-body schema (phantom limb), spatial body (unilateral hemi-neglect), active body (alien hand syndrome), personal body (autoscopic phenomena), objectified body (xenomelia) and body image (body dysmorphia). Second, these representations include either schematic (allocentric) or perceptual (egocentric) contents that interact within the working memory of the individual through the alignment between the retrieved contents from long-term memory and the ongoing egocentric contents from perception. In this view EDs may be the outcome of an impairment in the ability of updating a negative body representation stored in autobiographical memory (allocentric) with real-time sensorimotor and proprioceptive data (egocentric)
Knowledge Distillation for Action Anticipation via Label Smoothing
Human capability to anticipate near future from visual observations and
non-verbal cues is essential for developing intelligent systems that need to
interact with people. Several research areas, such as human-robot interaction
(HRI), assisted living or autonomous driving need to foresee future events to
avoid crashes or help people. Egocentric scenarios are classic examples where
action anticipation is applied due to their numerous applications. Such
challenging task demands to capture and model domain's hidden structure to
reduce prediction uncertainty. Since multiple actions may equally occur in the
future, we treat action anticipation as a multi-label problem with missing
labels extending the concept of label smoothing. This idea resembles the
knowledge distillation process since useful information is injected into the
model during training. We implement a multi-modal framework based on long
short-term memory (LSTM) networks to summarize past observations and make
predictions at different time steps. We perform extensive experiments on
EPIC-Kitchens and EGTEA Gaze+ datasets including more than 2500 and 100 action
classes, respectively. The experiments show that label smoothing systematically
improves performance of state-of-the-art models for action anticipation.Comment: Accepted to ICPR 202
VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera
We present the first real-time method to capture the full global 3D skeletal
pose of a human in a stable, temporally consistent manner using a single RGB
camera. Our method combines a new convolutional neural network (CNN) based pose
regressor with kinematic skeleton fitting. Our novel fully-convolutional pose
formulation regresses 2D and 3D joint positions jointly in real time and does
not require tightly cropped input frames. A real-time kinematic skeleton
fitting method uses the CNN output to yield temporally stable 3D global pose
reconstructions on the basis of a coherent kinematic skeleton. This makes our
approach the first monocular RGB method usable in real-time applications such
as 3D character control---thus far, the only monocular methods for such
applications employed specialized RGB-D cameras. Our method's accuracy is
quantitatively on par with the best offline 3D monocular RGB pose estimation
methods. Our results are qualitatively comparable to, and sometimes better
than, results from monocular RGB-D approaches, such as the Kinect. However, we
show that our approach is more broadly applicable than RGB-D solutions, i.e. it
works for outdoor scenes, community videos, and low quality commodity RGB
cameras.Comment: Accepted to SIGGRAPH 201
- …