Search CORE

391 research outputs found

The Evolution of First Person Vision Methods: A Survey

Author: Betancourt Alejandro
Morerio Pietro
Rauterberg Matthias
Regazzoni Carlo S.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

The emergence of new wearable technologies such as action cameras and smart-glasses has increased the interest of computer vision scientists in the First Person perspective. Nowadays, this field is attracting attention and investments of companies aiming to develop commercial devices with First Person Vision recording capabilities. Due to this interest, an increasing demand of methods to process these videos, possibly in real-time, is expected. Current approaches present a particular combinations of different image features and quantitative methods to accomplish specific objectives like object detection, activity recognition, user machine interaction and so on. This paper summarizes the evolution of the state of the art in First Person Vision video analysis between 1997 and 2014, highlighting, among others, most commonly used features, methods, challenges and opportunities within the field.Comment: First Person Vision, Egocentric Vision, Wearable Devices, Smart Glasses, Computer Vision, Video Analytics, Human-machine Interactio

arXiv.org e-Print Archive

CiteSeerX

Crossref

Pure OAI Repository

Archivio istituzionale della ricerca - Università di Genova

Learning Action Maps of Large Environments via First-Person Vision

Author: Kitani Kris M.
Rhinehart Nicholas
Publication venue
Publication date: 05/05/2016
Field of study

When people observe and interact with physical spaces, they are able to associate functionality to regions in the environment. Our goal is to automate dense functional understanding of large spaces by leveraging sparse activity demonstrations recorded from an ego-centric viewpoint. The method we describe enables functionality estimation in large scenes where people have behaved, as well as novel scenes where no behaviors are observed. Our method learns and predicts "Action Maps", which encode the ability for a user to perform activities at various locations. With the usage of an egocentric camera to observe human activities, our method scales with the size of the scene without the need for mounting multiple static surveillance cameras and is well-suited to the task of observing activities up-close. We demonstrate that by capturing appearance-based attributes of the environment and associating these attributes with activity demonstrations, our proposed mathematical framework allows for the prediction of Action Maps in new environments. Additionally, we offer a preliminary glance of the applicability of Action Maps by demonstrating a proof-of-concept application in which they are used in concert with activity detections to perform localization.Comment: To appear at CVPR 201

arXiv.org e-Print Archive

Crossref

Visual Object Tracking in First Person Vision

Author: Dunnhofer M.
Farinella G. M.
Furnari A.
Micheloni C.
Publication venue
Publication date: 01/01/2022
Field of study

The understanding of human-object interactions is fundamental in First Person Vision (FPV). Visual tracking algorithms which follow the objects manipulated by the camera wearer can provide useful information to effectively model such interactions. In the last years, the computer vision community has significantly improved the performance of tracking algorithms for a large variety of target objects and scenarios. Despite a few previous attempts to exploit trackers in the FPV domain, a methodical analysis of the performance of state-of-the-art trackers is still missing. This research gap raises the question of whether current solutions can be used “off-the-shelf” or more domain-specific investigations should be carried out. This paper aims to provide answers to such questions. We present the first systematic investigation of single object tracking in FPV. Our study extensively analyses the performance of 42 algorithms including generic object trackers and baseline FPV-specific trackers. The analysis is carried out by focusing on different aspects of the FPV setting, introducing new performance measures, and in relation to FPV-specific tasks. The study is made possible through the introduction of TREK-150, a novel benchmark dataset composed of 150 densely annotated video sequences. Our results show that object tracking in FPV poses new challenges to current visual trackers. We highlight the factors causing such behavior and point out possible research directions. Despite their difficulties, we prove that trackers bring benefits to FPV downstream tasks requiring short-term object tracking. We expect that generic object tracking will gain popularity in FPV as new and FPV-specific methodologies are investigated

Archivio istituzionale della ricerca - Università degli Studi di Udine

Hierarchical modeling for first-person vision activity recognition

Author: Abebe G
Cavallaro A
Publication venue: 'Elsevier BV'
Publication date: 01/12/2017
Field of study

Crossref

Queen Mary Research Online

Is First Person Vision Challenging for Object Tracking?

Author: Dunnhofer Matteo
Farinella Giovanni Maria
Furnari Antonino
Micheloni Christian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 31/08/2021
Field of study

Understanding human-object interactions is fundamental in First Person Vision (FPV). Tracking algorithms which follow the objects manipulated by the camera wearer can provide useful cues to effectively model such interactions. Visual tracking solutions available in the computer vision literature have significantly improved their performance in the last years for a large variety of target objects and tracking scenarios. However, despite a few previous attempts to exploit trackers in FPV applications, a methodical analysis of the performance of state-of-the-art trackers in this domain is still missing. In this paper, we fill the gap by presenting the first systematic study of object tracking in FPV. Our study extensively analyses the performance of recent visual trackers and baseline FPV trackers with respect to different aspects and considering a new performance measure. This is achieved through TREK-150, a novel benchmark dataset composed of 150 densely annotated video sequences. Our results show that object tracking in FPV is challenging, which suggests that more research efforts should be devoted to this problem so that tracking could benefit FPV tasks.Comment: IEEE/CVF International Conference on Computer Vision (ICCV) 2021, Visual Object Tracking Challenge VOT2021 workshop. arXiv admin note: text overlap with arXiv:2011.1226

arXiv.org e-Print Archive

View-Action Representation Learning for Active First-Person Vision

Author: Cavallaro A
Oh C
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/04/2020
Field of study

In visual navigation, a moving agent equipped with a camera is traditionally controlled by an input action and the estimation of the features from a sensory state (i.e. the camera view) is treated as a pre-processing step to perform high-level vision tasks. In this paper, we present a representation learning approach that, instead, considers both state and action as inputs. We condition the encoded feature from the state transition network on the action that changes the view of the camera, thus describing the scene more effectively. Specifically, we introduce an action representation module that generates decoded higher dimensional representations from an input action to increase the representational power. We then fuse the output from the action representation module with the intermediate response of the state transition network that predicts the future state. To enhance the discrimination capability among predictions from different input actions, we further introduce triplet ranking loss and N-tuplet loss functions, which in turn can be integrated with the regression loss. We demonstrate the proposed representation learning approach in reinforcement and imitation learning-based mapless navigation tasks, where the camera agent learns to navigate only through the view of the camera and the performed action, without external information

Queen Mary Research Online

Left/Right Hand Segmentation in Egocentric Videos

Author: Barakova Emilia
Betancourt Alejandro
Marcenaro Lucio
Morerio Pietro
Rauterberg Matthias
Regazzoni Carlo
Publication venue: 'Elsevier BV'
Publication date: 21/07/2016
Field of study

Wearable cameras allow people to record their daily activities from a user-centered (First Person Vision) perspective. Due to their favorable location, wearable cameras frequently capture the hands of the user, and may thus represent a promising user-machine interaction tool for different applications. Existent First Person Vision methods handle hand segmentation as a background-foreground problem, ignoring two important facts: i) hands are not a single "skin-like" moving element, but a pair of interacting cooperative entities, ii) close hand interactions may lead to hand-to-hand occlusions and, as a consequence, create a single hand-like segment. These facts complicate a proper understanding of hand movements and interactions. Our approach extends traditional background-foreground strategies, by including a hand-identification step (left-right) based on a Maxwell distribution of angle and position. Hand-to-hand occlusions are addressed by exploiting temporal superpixels. The experimental results show that, in addition to a reliable left/right hand-segmentation, our approach considerably improves the traditional background-foreground hand-segmentation

arXiv.org e-Print Archive

Repository TU/e

Crossref

Pure OAI Repository

Archivio istituzionale della ricerca - Università di Genova

Towards a unified framework for hand-based methods in First Person Vision

Author: Barakova Emilia
Betancourt Alejandro Arango
Marcenaro Lucio
Morerio Pietro
Rauterberg Matthias
Regazzoni Carlo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

First Person Vision (Egocentric) video analysis stands nowadays as one of the emerging fields in computer vision. The availability of wearable devices recording exactly what the user is looking at is ineluctable and the opportunities and challenges carried by this kind of devices are broad. Particularly, for the first time a device is so intimate with the user to be able to record the movements of his hands, making hand-based applications for First Person Vision one the most explored area in the field. This paper explores the more popular processing steps to develop hand-based applications, and proposes a hierarchical structure that optimally switches between each of the levels to reduce the computational cost of the system and improve its performance

Repository TU/e

Crossref

Pure OAI Repository

Archivio istituzionale della ricerca - Università di Genova

Vision: mapping the world in 3D through first-person vision devices with mercator

Author: Dhoedt Bart
Simoens Pieter
Verbelen Tim
Publication venue
Publication date: 01/01/2013
Field of study

Ghent University Academic Bibliography