Search CORE

975 research outputs found

Egocentric Video-Language Pretraining

Author: Damen Dima
Gao Difei
Ghanem Bernard
Lin Kevin Qinghong
Liu Wei
Shou Mike Zheng
Soldan Mattia
Wang Alex Jinpeng
Wray Michael
Xu Eric Zhongcong
Yan Rui
Publication venue
Publication date: 12/12/2022
Field of study

Egocentric Video Task Translation

Author: Grauman Kristen
Song Yale
Torresani Lorenzo
Xue Zihui
Publication venue
Publication date: 12/12/2022
Field of study

Different video understanding tasks are typically treated in isolation, and even with distinct types of curated data (e.g., classifying sports in one dataset, tracking animals in another). However, in wearable cameras, the immersive egocentric perspective of a person engaging with the world around them presents an interconnected web of video understanding tasks -- hand-object manipulations, navigation in the space, or human-human interactions -- that unfold continuously, driven by the person's goals. We argue that this calls for a much more unified approach. We propose EgoTask Translation (EgoT2), which takes a collection of models optimized on separate tasks and learns to translate their outputs for improved performance on any or all of them at once. Unlike traditional transfer or multi-task learning, EgoT2's flipped design entails separate task-specific backbones and a task translator shared across all tasks, which captures synergies between even heterogeneous tasks and mitigates task competition. Demonstrating our model on a wide array of video tasks from Ego4D, we show its advantages over existing transfer paradigms and achieve top-ranked results on four of the Ego4D 2022 benchmark challenges.Comment: Project page: https://vision.cs.utexas.edu/projects/egot2

arXiv.org e-Print Archive

Keyframe Summarisation of Egocentric Video

Author: Yousefi Paria
Publication venue
Publication date: 04/11/2019
Field of study

Bangor University Research Portal

Video Registration in Egocentric Vision under Day and Night Illumination Changes

Author: Alletto Stefano
Cucchiara Rita
Serra Giuseppe
Publication venue
Publication date: 28/07/2016
Field of study

With the spread of wearable devices and head mounted cameras, a wide range of application requiring precise user localization is now possible. In this paper we propose to treat the problem of obtaining the user position with respect to a known environment as a video registration problem. Video registration, i.e. the task of aligning an input video sequence to a pre-built 3D model, relies on a matching process of local keypoints extracted on the query sequence to a 3D point cloud. The overall registration performance is strictly tied to the actual quality of this 2D-3D matching, and can degrade if environmental conditions such as steep changes in lighting like the ones between day and night occur. To effectively register an egocentric video sequence under these conditions, we propose to tackle the source of the problem: the matching process. To overcome the shortcomings of standard matching techniques, we introduce a novel embedding space that allows us to obtain robust matches by jointly taking into account local descriptors, their spatial arrangement and their temporal robustness. The proposal is evaluated using unconstrained egocentric video sequences both in terms of matching quality and resulting registration performance using different 3D models of historical landmarks. The results show that the proposed method can outperform state of the art registration algorithms, in particular when dealing with the challenges of night and day sequences

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università degli Studi di Udine

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia