Search CORE

6 research outputs found

Multitask Learning to Improve Egocentric Action Recognition

Author: Kapidis Georgios
Noldus Lucas
Poppe Ronald
van Dam Elsbeth
Veltkamp Remco
Publication venue
Publication date: 01/01/2019
Field of study

In this work we employ multitask learning to capitalize on the structure that exists in related supervised tasks to train complex neural networks. It allows training a network for multiple objectives in parallel, in order to improve performance on at least one of them by capitalizing on a shared representation that is developed to accommodate more information than it otherwise would for a single task. We employ this idea to tackle action recognition in egocentric videos by introducing additional supervised tasks. We consider learning the verbs and nouns from which action labels consist of and predict coordinates that capture the hand locations and the gaze-based visual saliency for all the frames of the input video segments. This forces the network to explicitly focus on cues from secondary tasks that it might otherwise have missed resulting in improved inference. Our experiments on EPIC-Kitchens and EGTEA Gaze+ show consistent improvements when training with multiple tasks over the single-task baseline. Furthermore, in EGTEA Gaze+ we outperform the state-of-the-art in action recognition by 3.84%. Apart from actions, our method produces accurate hand and gaze estimations as side tasks, without requiring any additional input at test time other than the RGB video clips.Comment: 10 pages, 3 figures, accepted at the 5th Egocentric Perception, Interaction and Computing (EPIC) workshop at ICCV 2019, code repository: https://github.com/georkap/hand_track_classificatio

arXiv.org e-Print Archive

Crossref

Utrecht University Repository

Rescaling Egocentric Vision: Collection, Pipeline and Challenges for EPIC-KITCHENS-100

Author: Damen D.
Doughty H.
Farinella G.M.
Furnari A.
Kazakos E.
Ma J.
Moltisanti D.
Munro J.
Perrett T.
Price W.
Wray M.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2022
Field of study

This paper introduces the pipeline to extend the largest dataset in egocentric vision, EPIC-KITCHENS. The effort culminates in EPIC-KITCHENS-100, a collection of 100 hours, 20M frames, 90K actions in 700 variable-length videos, capturing long-term unscripted activities in 45 environments, using head-mounted cameras. Compared to its previous version (Damen in Scaling egocentric vision: ECCV, 2018), EPIC-KITCHENS-100 has been annotated using a novel pipeline that allows denser (54% more actions per minute) and more complete annotations of fine-grained actions (+128% more action segments). This collection enables new challenges such as action detection and evaluating the “test of time”—i.e. whether models trained on data collected in 2018 can generalise to new footage collected two years later. The dataset is aligned with 6 challenges: action recognition (full and weak supervision), action detection, action anticipation, cross-modal retrieval (from captions), as well as unsupervised domain adaptation for action recognition. For each challenge, we define the task, provide baselines and evaluation metrics.Published versionResearch at Bristol is supported by Engineering and Physical Sciences Research Council (EPSRC) Doctoral Training Program (DTP), EPSRC Fellowship UMPIRE (EP/T004991/1). Research at Catania is sponsored by Piano della Ricerca 2016-2018 linea di Intervento 2 of DMI, by MISE - PON I&C 2014-2020, ENIGMA project (CUP: B61B19000520008) and by MIUR AIM - Attrazione e Mobilita Internazionale Linea 1 - AIM1893589 - CUP E64118002540007

DR-NTU (Digital Repository of NTU)

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Rescaling Egocentric Vision: Collection, Pipeline and Challenges for EPIC-KITCHENS-100

Author: Damen D.
Doughty H.
Farinella G.M.
Furnari A.
Kazakos E.
Ma J.
Moltisanti D.
Munro J.
Perrett T.
Price W.
Wray M.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2022
Field of study

International Migration, Integration and Social Cohesion online publications

Rescaling Egocentric Vision:Collection Pipeline and Challenges for EPIC-KITCHENS-100

Author: Damen Dima
Doughty Hazel R
Farinella Giovanni Maria
Furnari Antonino
Kazakos Vangelis
Ma Jian
Moltisanti Davide
Munro Jonathan P N
Perrett Toby J
Price Will
Wray Michael
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2022
Field of study

DR-NTU (Digital Repository of NTU)

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Explore Bristol Research

Ego4D:Around the World in 3,000 Hours of Egocentric Video

Author: Arbelaez Pablo
Crandall David
Damen Dima
Farinella Giovanni Maria
Fragomeni Adriano
Ghanem Bernard
Grauman Kristen
Jawahar C.V.
Kitani Kris
Malik Jitendra
Munro Jonathan P N
Oliva Aude
Park Hyun Soo
Price Will
Rehg James M.
Sato Yoichi
Shou Mike Zheng
Torrallba Antonio
Wray Michael
Publication venue: Institute of Electrical and Electronics Engineers (IEEE)
Publication date: 19/06/2022
Field of study

Explore Bristol Research

Recommended from our members

Leveraging Depth for 3D Scene Perception

Author: Zhao Yunhan
Publication venue: eScholarship, University of California
Publication date: 01/01/2024
Field of study

3D scene perception aims to understand the geometric and semantic information of the surrounding environment. It is crucial in many downstream applications, such as autonomous driving, robotics, AR/VR, and human-computer interaction. Despite its significance, understanding the 3D scene has been a challenging task, due to the complex interactions between objects, heavy occlusions, cluttered indoor environments, major appearance, viewpoint and scale changes, etc. The study of 3D scene perception has been significantly reshaped by the powerful deep learning models. These models are capable of leveraging large-scale training data to achieve outstanding performance. Learning-based models unlock new challenges and opportunities in the field.In this dissertation, we first present learning-based approaches to estimate depth maps, one of the crucial information in many 3D scene perception models. We describe two overlooked challenges in learning monocular depth estimators and present our proposed solutions. Specifically, we address the high-level domain gap between real and synthetic training data and the shift in camera pose distribution between training and testing data. Following that we present two application-driven works that leverage depth maps to achieve better 3D scene perception. We explore in detail the tasks of reference-based image inpainting and 3D object instance tracking in scenes from egocentric videos

eScholarship - University of California