111 research outputs found
Streaming egocentric action anticipation: An evaluation scheme and approach
Egocentric action anticipation aims to predict the future actions the camera
wearer will perform from the observation of the past. While predictions about
the future should be available before the predicted events take place, most
approaches do not pay attention to the computational time required to make such
predictions. As a result, current evaluation schemes assume that predictions
are available right after the input video is observed, i.e., presuming a
negligible runtime, which may lead to overly optimistic evaluations. We propose
a streaming egocentric action evaluation scheme which assumes that predictions
are performed online and made available only after the model has processed the
current input segment, which depends on its runtime. To evaluate all models
considering the same prediction horizon, we hence propose that slower models
should base their predictions on temporal segments sampled ahead of time. Based
on the observation that model runtime can affect performance in the considered
streaming evaluation scenario, we further propose a lightweight action
anticipation model based on feed-forward 3D CNNs which is optimized using
knowledge distillation techniques with a novel past-to-future distillation
loss. Experiments on the three popular datasets EPIC-KITCHENS-55,
EPIC-KITCHENS-100 and EGTEA Gaze+ show that (i) the proposed evaluation scheme
induces a different ranking on state-of-the-art methods as compared to classic
evaluations, (ii) lightweight approaches tend to outmatch more computationally
expensive ones, and (iii) the proposed model based on feed-forward 3D CNNs and
knowledge distillation outperforms current art in the streaming egocentric
action anticipation scenario.Comment: Published in Computer Vision and Image Understanding, 2023. arXiv
admin note: text overlap with arXiv:2110.0538
MECCANO: A Multimodal Egocentric Dataset for Humans Behavior Understanding in the Industrial-like Domain
Wearable cameras allow to acquire images and videos from the user's
perspective. These data can be processed to understand humans behavior. Despite
human behavior analysis has been thoroughly investigated in third person
vision, it is still understudied in egocentric settings and in particular in
industrial scenarios. To encourage research in this field, we present MECCANO,
a multimodal dataset of egocentric videos to study humans behavior
understanding in industrial-like settings. The multimodality is characterized
by the presence of gaze signals, depth maps and RGB videos acquired
simultaneously with a custom headset. The dataset has been explicitly labeled
for fundamental tasks in the context of human behavior understanding from a
first person view, such as recognizing and anticipating human-object
interactions. With the MECCANO dataset, we explored five different tasks
including 1) Action Recognition, 2) Active Objects Detection and Recognition,
3) Egocentric Human-Objects Interaction Detection, 4) Action Anticipation and
5) Next-Active Objects Detection. We propose a benchmark aimed to study human
behavior in the considered industrial-like scenario which demonstrates that the
investigated tasks and the considered scenario are challenging for
state-of-the-art algorithms. To support research in this field, we publicy
release the dataset at https://iplab.dmi.unict.it/MECCANO/.Comment: arXiv admin note: text overlap with arXiv:2010.0565
StillFast: An End-to-End Approach for Short-Term Object Interaction Anticipation
Anticipation problem has been studied considering different aspects such as
predicting humans' locations, predicting hands and objects trajectories, and
forecasting actions and human-object interactions. In this paper, we studied
the short-term object interaction anticipation problem from the egocentric
point of view, proposing a new end-to-end architecture named StillFast. Our
approach simultaneously processes a still image and a video detecting and
localizing next-active objects, predicting the verb which describes the future
interaction and determining when the interaction will start. Experiments on the
large-scale egocentric dataset EGO4D show that our method outperformed
state-of-the-art approaches on the considered task. Our method is ranked first
in the public leaderboard of the EGO4D short term object interaction
anticipation challenge 2022. Please see the project web page for code and
additional details: https://iplab.dmi.unict.it/stillfast/
- …