Anticipation problem has been studied considering different aspects such as
predicting humans' locations, predicting hands and objects trajectories, and
forecasting actions and human-object interactions. In this paper, we studied
the short-term object interaction anticipation problem from the egocentric
point of view, proposing a new end-to-end architecture named StillFast. Our
approach simultaneously processes a still image and a video detecting and
localizing next-active objects, predicting the verb which describes the future
interaction and determining when the interaction will start. Experiments on the
large-scale egocentric dataset EGO4D show that our method outperformed
state-of-the-art approaches on the considered task. Our method is ranked first
in the public leaderboard of the EGO4D short term object interaction
anticipation challenge 2022. Please see the project web page for code and
additional details: https://iplab.dmi.unict.it/stillfast/