851 research outputs found
Egocentric Hand Detection Via Dynamic Region Growing
Egocentric videos, which mainly record the activities carried out by the
users of the wearable cameras, have drawn much research attentions in recent
years. Due to its lengthy content, a large number of ego-related applications
have been developed to abstract the captured videos. As the users are
accustomed to interacting with the target objects using their own hands while
their hands usually appear within their visual fields during the interaction,
an egocentric hand detection step is involved in tasks like gesture
recognition, action recognition and social interaction understanding. In this
work, we propose a dynamic region growing approach for hand region detection in
egocentric videos, by jointly considering hand-related motion and egocentric
cues. We first determine seed regions that most likely belong to the hand, by
analyzing the motion patterns across successive frames. The hand regions can
then be located by extending from the seed regions, according to the scores
computed for the adjacent superpixels. These scores are derived from four
egocentric cues: contrast, location, position consistency and appearance
continuity. We discuss how to apply the proposed method in real-life scenarios,
where multiple hands irregularly appear and disappear from the videos.
Experimental results on public datasets show that the proposed method achieves
superior performance compared with the state-of-the-art methods, especially in
complicated scenarios
Survey on Vision-based Path Prediction
Path prediction is a fundamental task for estimating how pedestrians or
vehicles are going to move in a scene. Because path prediction as a task of
computer vision uses video as input, various information used for prediction,
such as the environment surrounding the target and the internal state of the
target, need to be estimated from the video in addition to predicting paths.
Many prediction approaches that include understanding the environment and the
internal state have been proposed. In this survey, we systematically summarize
methods of path prediction that take video as input and and extract features
from the video. Moreover, we introduce datasets used to evaluate path
prediction methods quantitatively.Comment: DAPI 201
The Evolution of First Person Vision Methods: A Survey
The emergence of new wearable technologies such as action cameras and
smart-glasses has increased the interest of computer vision scientists in the
First Person perspective. Nowadays, this field is attracting attention and
investments of companies aiming to develop commercial devices with First Person
Vision recording capabilities. Due to this interest, an increasing demand of
methods to process these videos, possibly in real-time, is expected. Current
approaches present a particular combinations of different image features and
quantitative methods to accomplish specific objectives like object detection,
activity recognition, user machine interaction and so on. This paper summarizes
the evolution of the state of the art in First Person Vision video analysis
between 1997 and 2014, highlighting, among others, most commonly used features,
methods, challenges and opportunities within the field.Comment: First Person Vision, Egocentric Vision, Wearable Devices, Smart
Glasses, Computer Vision, Video Analytics, Human-machine Interactio
Limbs detection and tracking of head-fixed mice for behavioral phenotyping using motion tubes and deep learning
The broad accessibility of affordable and reliable recording equipment and its relative ease of use has enabled neuroscientists to record large amounts of neurophysiological and behavioral data. Given that most of this raw data is unlabeled, great effort is required to adapt it for behavioral phenotyping or signal extraction, for behavioral and neurophysiological data, respectively. Traditional methods for labeling datasets rely on human annotators which is a resource and time intensive process, which often produce data that that is prone to reproducibility errors. Here, we propose a deep learning-based image segmentation framework to automatically extract and label limb movements from movies capturing frontal and lateral views of head-fixed mice. The method decomposes the image into elemental regions (superpixels) with similar appearance and concordant dynamics and stacks them following their partial temporal trajectory. These 3D descriptors (referred as motion cues) are used to train a deep convolutional neural network (CNN). We use the features extracted at the last fully connected layer of the network for training a Long Short Term Memory (LSTM) network that introduces spatio-temporal coherence to the limb segmentation. We tested the pipeline in two video acquisition settings. In the first, the camera is installed on the right side of the mouse (lateral setting). In the second, the camera is installed facing the mouse directly (frontal setting). We also investigated the effect of the noise present in the videos and the amount of training data needed, and we found that reducing the number of training samples does not result in a drop of more than 5% in detection accuracy even when as little as 10% of the available data is used for training
- …