2,451 research outputs found

    A Software Retina for Egocentric & Robotic Vision Applications on Mobile Platforms

    Get PDF
    We present work in progress to develop a low-cost highly integrated camera sensor for egocentric and robotic vision. Our underlying approach is to address current limitations to image analysis by Deep Convolutional Neural Networks, such as the requirement to learn simple scale and rotation transformations, which contribute to the large computational demands for training and opaqueness of the learned structure, by applying structural constraints based on known properties of the human visual system. We propose to apply a version of the retino-cortical transform to reduce the dimensionality of the input image space by a factor of ex100, and map this spatially to transform rotations and scale changes into spatial shifts. By reducing the input image size accordingly, and therefore learning requirements, we aim to develop compact and lightweight egocentric and robot vision sensor using a smartphone as the target platfor

    Out of my real body: Cognitive neuroscience meets eating disorders

    Get PDF
    Clinical psychology is starting to explain eating disorders (ED) as the outcome of the interaction among cognitive, socio-emotional and interpersonal elements. In particular two influential models-the revised cognitive-interpersonal maintenance model and the transdiagnostic cognitive behavioral theory-identified possible key predisposing and maintaining factors. These models, even if very influential and able to provide clear suggestions for therapy, still are not able to provide answers to several critical questions: why do not all the individuals with obsessive compulsive features, anxious avoidance or with a dysfunctional scheme for self-evaluation develop an ED? What is the role of the body experience in the etiology of these disorders? In this paper we suggest that the path to a meaningful answer requires the integration of these models with the recent outcomes of cognitive neuroscience. First, our bodily representations are not just a way to map an external space but the main tool we use to generate meaning, organize our experience, and shape our social identity. In particular, we will argue that our bodily experience evolves over time by integrating six different representations of the body characterized by specific pathologies-body schema (phantom limb), spatial body (unilateral hemi-neglect), active body (alien hand syndrome), personal body (autoscopic phenomena), objectified body (xenomelia) and body image (body dysmorphia). Second, these representations include either schematic (allocentric) or perceptual (egocentric) contents that interact within the working memory of the individual through the alignment between the retrieved contents from long-term memory and the ongoing egocentric contents from perception. In this view EDs may be the outcome of an impairment in the ability of updating a negative body representation stored in autobiographical memory (allocentric) with real-time sensorimotor and proprioceptive data (egocentric)

    Knowledge Distillation for Action Anticipation via Label Smoothing

    Full text link
    Human capability to anticipate near future from visual observations and non-verbal cues is essential for developing intelligent systems that need to interact with people. Several research areas, such as human-robot interaction (HRI), assisted living or autonomous driving need to foresee future events to avoid crashes or help people. Egocentric scenarios are classic examples where action anticipation is applied due to their numerous applications. Such challenging task demands to capture and model domain's hidden structure to reduce prediction uncertainty. Since multiple actions may equally occur in the future, we treat action anticipation as a multi-label problem with missing labels extending the concept of label smoothing. This idea resembles the knowledge distillation process since useful information is injected into the model during training. We implement a multi-modal framework based on long short-term memory (LSTM) networks to summarize past observations and make predictions at different time steps. We perform extensive experiments on EPIC-Kitchens and EGTEA Gaze+ datasets including more than 2500 and 100 action classes, respectively. The experiments show that label smoothing systematically improves performance of state-of-the-art models for action anticipation.Comment: Accepted to ICPR 202

    VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera

    Full text link
    We present the first real-time method to capture the full global 3D skeletal pose of a human in a stable, temporally consistent manner using a single RGB camera. Our method combines a new convolutional neural network (CNN) based pose regressor with kinematic skeleton fitting. Our novel fully-convolutional pose formulation regresses 2D and 3D joint positions jointly in real time and does not require tightly cropped input frames. A real-time kinematic skeleton fitting method uses the CNN output to yield temporally stable 3D global pose reconstructions on the basis of a coherent kinematic skeleton. This makes our approach the first monocular RGB method usable in real-time applications such as 3D character control---thus far, the only monocular methods for such applications employed specialized RGB-D cameras. Our method's accuracy is quantitatively on par with the best offline 3D monocular RGB pose estimation methods. Our results are qualitatively comparable to, and sometimes better than, results from monocular RGB-D approaches, such as the Kinect. However, we show that our approach is more broadly applicable than RGB-D solutions, i.e. it works for outdoor scenes, community videos, and low quality commodity RGB cameras.Comment: Accepted to SIGGRAPH 201
    • …
    corecore