2,951 research outputs found
Egocentric Hand Detection Via Dynamic Region Growing
Egocentric videos, which mainly record the activities carried out by the
users of the wearable cameras, have drawn much research attentions in recent
years. Due to its lengthy content, a large number of ego-related applications
have been developed to abstract the captured videos. As the users are
accustomed to interacting with the target objects using their own hands while
their hands usually appear within their visual fields during the interaction,
an egocentric hand detection step is involved in tasks like gesture
recognition, action recognition and social interaction understanding. In this
work, we propose a dynamic region growing approach for hand region detection in
egocentric videos, by jointly considering hand-related motion and egocentric
cues. We first determine seed regions that most likely belong to the hand, by
analyzing the motion patterns across successive frames. The hand regions can
then be located by extending from the seed regions, according to the scores
computed for the adjacent superpixels. These scores are derived from four
egocentric cues: contrast, location, position consistency and appearance
continuity. We discuss how to apply the proposed method in real-life scenarios,
where multiple hands irregularly appear and disappear from the videos.
Experimental results on public datasets show that the proposed method achieves
superior performance compared with the state-of-the-art methods, especially in
complicated scenarios
Detecting Hands in Egocentric Videos: Towards Action Recognition
Recently, there has been a growing interest in analyzing human daily
activities from data collected by wearable cameras. Since the hands are
involved in a vast set of daily tasks, detecting hands in egocentric images is
an important step towards the recognition of a variety of egocentric actions.
However, besides extreme illumination changes in egocentric images, hand
detection is not a trivial task because of the intrinsic large variability of
hand appearance. We propose a hand detector that exploits skin modeling for
fast hand proposal generation and Convolutional Neural Networks for hand
recognition. We tested our method on UNIGE-HANDS dataset and we showed that the
proposed approach achieves competitive hand detection results
The Evolution of First Person Vision Methods: A Survey
The emergence of new wearable technologies such as action cameras and
smart-glasses has increased the interest of computer vision scientists in the
First Person perspective. Nowadays, this field is attracting attention and
investments of companies aiming to develop commercial devices with First Person
Vision recording capabilities. Due to this interest, an increasing demand of
methods to process these videos, possibly in real-time, is expected. Current
approaches present a particular combinations of different image features and
quantitative methods to accomplish specific objectives like object detection,
activity recognition, user machine interaction and so on. This paper summarizes
the evolution of the state of the art in First Person Vision video analysis
between 1997 and 2014, highlighting, among others, most commonly used features,
methods, challenges and opportunities within the field.Comment: First Person Vision, Egocentric Vision, Wearable Devices, Smart
Glasses, Computer Vision, Video Analytics, Human-machine Interactio
Analysis of Hand Segmentation in the Wild
A large number of works in egocentric vision have concentrated on action and
object recognition. Detection and segmentation of hands in first-person videos,
however, has less been explored. For many applications in this domain, it is
necessary to accurately segment not only hands of the camera wearer but also
the hands of others with whom he is interacting. Here, we take an in-depth look
at the hand segmentation problem. In the quest for robust hand segmentation
methods, we evaluated the performance of the state of the art semantic
segmentation methods, off the shelf and fine-tuned, on existing datasets. We
fine-tune RefineNet, a leading semantic segmentation method, for hand
segmentation and find that it does much better than the best contenders.
Existing hand segmentation datasets are collected in the laboratory settings.
To overcome this limitation, we contribute by collecting two new datasets: a)
EgoYouTubeHands including egocentric videos containing hands in the wild, and
b) HandOverFace to analyze the performance of our models in presence of similar
appearance occlusions. We further explore whether conditional random fields can
help refine generated hand segmentations. To demonstrate the benefit of
accurate hand maps, we train a CNN for hand-based activity recognition and
achieve higher accuracy when a CNN was trained using hand maps produced by the
fine-tuned RefineNet. Finally, we annotate a subset of the EgoHands dataset for
fine-grained action recognition and show that an accuracy of 58.6% can be
achieved by just looking at a single hand pose which is much better than the
chance level (12.5%).Comment: Accepted at CVPR 201
Analysis of the hands in egocentric vision: A survey
Egocentric vision (a.k.a. first-person vision - FPV) applications have
thrived over the past few years, thanks to the availability of affordable
wearable cameras and large annotated datasets. The position of the wearable
camera (usually mounted on the head) allows recording exactly what the camera
wearers have in front of them, in particular hands and manipulated objects.
This intrinsic advantage enables the study of the hands from multiple
perspectives: localizing hands and their parts within the images; understanding
what actions and activities the hands are involved in; and developing
human-computer interfaces that rely on hand gestures. In this survey, we review
the literature that focuses on the hands using egocentric vision, categorizing
the existing approaches into: localization (where are the hands or parts of
them?); interpretation (what are the hands doing?); and application (e.g.,
systems that used egocentric hand cues for solving a specific problem).
Moreover, a list of the most prominent datasets with hand-based annotations is
provided
Analysis of Swine Movements in a Province in Northern Vietnam and Application in the Design of Surveillance Strategies for Infectious Diseases
While swine production is rapidly growing in South-East Asia, the structure of the swine industry and the dynamic of pig movements have not been well-studied. However, this knowledge is a prerequisite for understanding the dynamic of disease transmission in swine populations and designing cost-effective surveillance strategies for infectious diseases. In this study, we assessed the farming and trading practices in the Vietnamese swine familial farming sector, which accounts for most pigs in Vietnam, and for which disease surveillance is a major challenge. Farmers from two communes of a Red River Delta Province (northern Vietnam) were interviewed, along with traders involved in pig transactions. Major differences in the trade structure were observed between the two communes. One commune had mainly transversal trades, that is between farms of equivalent sizes, whereas the other had pyramidal trades, that is from larger to smaller farms. Companies and large familial farrow-to-finish farms were likely to act as major sources of disease spread through pig sales, demonstrating their importance for disease control. Familial fattening farms with high pig purchases were at greater risk of disease introduction and should be targeted for disease detection as part of a risk-based surveillance. In contrast, many other familial farms were isolated or weakly connected to the swine trade network limiting their relevance for surveillance activities. However, some of these farms used boar hiring for breeding, increasing the risk of disease spread. Most familial farms were slaughtering pigs at the farm or in small local slaughterhouses, making the surveillance at the slaughterhouse inefficient. In terms of spatial distribution of the trades, the results suggested that northern provinces were highly connected and showed some connection with central and southern provinces. These results are useful to develop risk-based surveillance protocols for disease detection in the swine familial sector and to make recommendations for disease control. (Résumé d'auteur
Learning to Localize and Align Fine-Grained Actions to Sparse Instructions
Automatic generation of textual video descriptions that are time-aligned with
video content is a long-standing goal in computer vision. The task is
challenging due to the difficulty of bridging the semantic gap between the
visual and natural language domains. This paper addresses the task of
automatically generating an alignment between a set of instructions and a first
person video demonstrating an activity. The sparse descriptions and ambiguity
of written instructions create significant alignment challenges. The key to our
approach is the use of egocentric cues to generate a concise set of action
proposals, which are then matched to recipe steps using object recognition and
computational linguistic techniques. We obtain promising results on both the
Extended GTEA Gaze+ dataset and the Bristol Egocentric Object Interactions
Dataset
- …