8 research outputs found

    Scaling Egocentric Vision: The EPIC-KITCHENS Dataset

    Get PDF
    First-person vision is gaining interest as it offers a unique viewpoint on people's interaction with objects, their attention, and even intention. However, progress in this challenging domain has been relatively slow due to the lack of sufficiently large datasets. In this paper, we introduce EPIC-KITCHENS, a large-scale egocentric video benchmark recorded by 32 participants in their native kitchen environments. Our videos depict nonscripted daily activities: we simply asked each participant to start recording every time they entered their kitchen. Recording took place in 4 cities (in North America and Europe) by participants belonging to 10 different nationalities, resulting in highly diverse cooking styles. Our dataset features 55 hours of video consisting of 11.5M frames, which we densely labeled for a total of 39.6K action segments and 454.3K object bounding boxes. Our annotation is unique in that we had the participants narrate their own videos (after recording), thus reflecting true intention, and we crowd-sourced ground-truths based on these. We describe our object, action and anticipation challenges, and evaluate several baselines over two test splits, seen and unseen kitchens. Dataset and Project page: http://epic-kitchens.github.ioComment: European Conference on Computer Vision (ECCV) 2018 Dataset and Project page: http://epic-kitchens.github.i

    An Effective and Efficient Method for Detecting Hands in Egocentric Videos for Rehabilitation Applications

    Full text link
    Objective: Individuals with spinal cord injury (SCI) report upper limb function as their top recovery priority. To accurately represent the true impact of new interventions on patient function and independence, evaluation should occur in a natural setting. Wearable cameras can be used to monitor hand function at home, using computer vision to automatically analyze the resulting videos (egocentric video). A key step in this process, hand detection, is difficult to do robustly and reliably, hindering deployment of a complete monitoring system in the home and community. We propose an accurate and efficient hand detection method that uses a simple combination of existing detection and tracking algorithms. Methods: Detection, tracking, and combination methods were evaluated on a new hand detection dataset, consisting of 167,622 frames of egocentric videos collected on 17 individuals with SCI performing activities of daily living in a home simulation laboratory. Results: The F1-scores for the best detector and tracker alone (SSD and Median Flow) were 0.90±\pm0.07 and 0.42±\pm0.18, respectively. The best combination method, in which a detector was used to initialize and reset a tracker, resulted in an F1-score of 0.87±\pm0.07 while being two times faster than the fastest detector alone. Conclusion: The combination of the fastest detector and best tracker improved the accuracy over online trackers while improving the speed of detectors. Significance: The method proposed here, in combination with wearable cameras, will help clinicians directly measure hand function in a patient's daily life at home, enabling independence after SCI.Comment: 7 pages, 3 figures, 5 table

    CAD-based Learning for Egocentric Object Detection in Industrial Context

    No full text
    International audienceIndustries nowadays have an increasing need of real-time and accurate vision-based algorithms. Although the performance of object detection methods improved a lot thanks to massive public datasets, instance detection in industrial context must be approached differently, since annotated images are usually unavailable or rare. In addition, when the video stream comes from a head-mounted camera, we observe a lot of movements and blurred frames altering the image content. For this purpose, we propose a framework to generate a dataset of egocentric synthetic images using only CAD models of the objects of interest. To evaluate different strategies exploiting synthetic and real images, we train a Convolutional Neural Network (CNN) for the task of object detection in egocentric images. Results show that training a CNN on synthetic images that reproduce the characteristics of egocentric vision may perform as well as training on a set of real images, reducing, if not removing, the need to manually annotate a large quantity of images to achieve an accurate performance

    1st International Conference on Computer Vision and Image Processing

    No full text
    This edited volume contains technical contributions in the field of computer vision and image processing presented at the First International Conference on Computer Vision and Image Processing (CVIP 2016). The contributions are thematically divided based on their relation to operations at the lower, middle and higher levels of vision systems, and their applications. The technical contributions in the areas of sensors, acquisition, visualization and enhancement are classified as related to low-level operations. They discuss various modern topics – reconfigurable image system architecture, Scheimpflug camera calibration, real-time autofocusing, climate visualization, tone mapping, super-resolution and image resizing. The technical contributions in the areas of segmentation and retrieval are classified as related to mid-level operations. They discuss some state-of-the-art techniques – non-rigid image registration, iterative image partitioning, egocentric object detection and video shot boundary detection. The technical contributions in the areas of classification and retrieval are categorized as related to high-level operations. They discuss some state-of-the-art approaches – extreme learning machines, and target, gesture and action recognition. A non-regularized state preserving extreme learning machine is presented for natural scene classification. An algorithm for human action recognition through dynamic frame warping based on depth cues is given. Target recognition in night vision through convolutional neural network is also presented. Use of convolutional neural network in detecting static hand gesture is also discussed. Finally, the technical contributions in the areas of surveillance, coding and data security, and biometrics and document processing are considered as applications of computer vision and image processing. They discuss some contemporary applications. A few of them are a system for tackling blind curves, a quick reaction target acquisition and tracking system, an algorithm to detect for copy-move forgery based on circle block, a novel visual secret sharing scheme using affine cipher and image interleaving, a finger knuckle print recognition system based on wavelet and Gabor filtering, and a palmprint recognition based on minutiae quadruplets
    corecore