8 research outputs found
Scaling Egocentric Vision: The EPIC-KITCHENS Dataset
First-person vision is gaining interest as it offers a unique viewpoint on
people's interaction with objects, their attention, and even intention.
However, progress in this challenging domain has been relatively slow due to
the lack of sufficiently large datasets. In this paper, we introduce
EPIC-KITCHENS, a large-scale egocentric video benchmark recorded by 32
participants in their native kitchen environments. Our videos depict
nonscripted daily activities: we simply asked each participant to start
recording every time they entered their kitchen. Recording took place in 4
cities (in North America and Europe) by participants belonging to 10 different
nationalities, resulting in highly diverse cooking styles. Our dataset features
55 hours of video consisting of 11.5M frames, which we densely labeled for a
total of 39.6K action segments and 454.3K object bounding boxes. Our annotation
is unique in that we had the participants narrate their own videos (after
recording), thus reflecting true intention, and we crowd-sourced ground-truths
based on these. We describe our object, action and anticipation challenges, and
evaluate several baselines over two test splits, seen and unseen kitchens.
Dataset and Project page: http://epic-kitchens.github.ioComment: European Conference on Computer Vision (ECCV) 2018 Dataset and
Project page: http://epic-kitchens.github.i
An Effective and Efficient Method for Detecting Hands in Egocentric Videos for Rehabilitation Applications
Objective: Individuals with spinal cord injury (SCI) report upper limb
function as their top recovery priority. To accurately represent the true
impact of new interventions on patient function and independence, evaluation
should occur in a natural setting. Wearable cameras can be used to monitor hand
function at home, using computer vision to automatically analyze the resulting
videos (egocentric video). A key step in this process, hand detection, is
difficult to do robustly and reliably, hindering deployment of a complete
monitoring system in the home and community. We propose an accurate and
efficient hand detection method that uses a simple combination of existing
detection and tracking algorithms. Methods: Detection, tracking, and
combination methods were evaluated on a new hand detection dataset, consisting
of 167,622 frames of egocentric videos collected on 17 individuals with SCI
performing activities of daily living in a home simulation laboratory. Results:
The F1-scores for the best detector and tracker alone (SSD and Median Flow)
were 0.900.07 and 0.420.18, respectively. The best combination
method, in which a detector was used to initialize and reset a tracker,
resulted in an F1-score of 0.870.07 while being two times faster than the
fastest detector alone. Conclusion: The combination of the fastest detector and
best tracker improved the accuracy over online trackers while improving the
speed of detectors. Significance: The method proposed here, in combination with
wearable cameras, will help clinicians directly measure hand function in a
patient's daily life at home, enabling independence after SCI.Comment: 7 pages, 3 figures, 5 table
CAD-based Learning for Egocentric Object Detection in Industrial Context
International audienceIndustries nowadays have an increasing need of real-time and accurate vision-based algorithms. Although the performance of object detection methods improved a lot thanks to massive public datasets, instance detection in industrial context must be approached differently, since annotated images are usually unavailable or rare. In addition, when the video stream comes from a head-mounted camera, we observe a lot of movements and blurred frames altering the image content. For this purpose, we propose a framework to generate a dataset of egocentric synthetic images using only CAD models of the objects of interest. To evaluate different strategies exploiting synthetic and real images, we train a Convolutional Neural Network (CNN) for the task of object detection in egocentric images. Results show that training a CNN on synthetic images that reproduce the characteristics of egocentric vision may perform as well as training on a set of real images, reducing, if not removing, the need to manually annotate a large quantity of images to achieve an accurate performance
1st International Conference on Computer Vision and Image Processing
This edited volume contains technical contributions in the field of computer vision and image processing presented at the First International Conference on Computer Vision and Image Processing (CVIP 2016). The contributions are thematically divided based on their relation to operations at the lower, middle and higher levels of vision systems, and their applications. The technical contributions in the areas of sensors, acquisition, visualization and enhancement are classified as related to low-level operations. They discuss various modern topics – reconfigurable image system architecture, Scheimpflug camera calibration, real-time autofocusing, climate visualization, tone mapping, super-resolution and image resizing. The technical contributions in the areas of segmentation and retrieval are classified as related to mid-level operations. They discuss some state-of-the-art techniques – non-rigid image registration, iterative image partitioning, egocentric object detection and video shot boundary detection. The technical contributions in the areas of classification and retrieval are categorized as related to high-level operations. They discuss some state-of-the-art approaches – extreme learning machines, and target, gesture and action recognition. A non-regularized state preserving extreme learning machine is presented for natural scene classification. An algorithm for human action recognition through dynamic frame warping based on depth cues is given. Target recognition in night vision through convolutional neural network is also presented. Use of convolutional neural network in detecting static hand gesture is also discussed. Finally, the technical contributions in the areas of surveillance, coding and data security, and biometrics and document processing are considered as applications of computer vision and image processing. They discuss some contemporary applications. A few of them are a system for tackling blind curves, a quick reaction target acquisition and tracking system, an algorithm to detect for copy-move forgery based on circle block, a novel visual secret sharing scheme using affine cipher and image interleaving, a finger knuckle print recognition system based on wavelet and Gabor filtering, and a palmprint recognition based on minutiae quadruplets