43,012 research outputs found
Encouraging LSTMs to Anticipate Actions Very Early
In contrast to the widely studied problem of recognizing an action given a
complete sequence, action anticipation aims to identify the action from only
partially available videos. As such, it is therefore key to the success of
computer vision applications requiring to react as early as possible, such as
autonomous navigation. In this paper, we propose a new action anticipation
method that achieves high prediction accuracy even in the presence of a very
small percentage of a video sequence. To this end, we develop a multi-stage
LSTM architecture that leverages context-aware and action-aware features, and
introduce a novel loss function that encourages the model to predict the
correct class as early as possible. Our experiments on standard benchmark
datasets evidence the benefits of our approach; We outperform the
state-of-the-art action anticipation methods for early prediction by a relative
increase in accuracy of 22.0% on JHMDB-21, 14.0% on UT-Interaction and 49.9% on
UCF-101.Comment: 13 Pages, 7 Figures, 11 Tables. Accepted in ICCV 2017. arXiv admin
note: text overlap with arXiv:1611.0552
VIENA2: A Driving Anticipation Dataset
Action anticipation is critical in scenarios where one needs to react before
the action is finalized. This is, for instance, the case in automated driving,
where a car needs to, e.g., avoid hitting pedestrians and respect traffic
lights. While solutions have been proposed to tackle subsets of the driving
anticipation tasks, by making use of diverse, task-specific sensors, there is
no single dataset or framework that addresses them all in a consistent manner.
In this paper, we therefore introduce a new, large-scale dataset, called
VIENA2, covering 5 generic driving scenarios, with a total of 25 distinct
action classes. It contains more than 15K full HD, 5s long videos acquired in
various driving conditions, weathers, daytimes and environments, complemented
with a common and realistic set of sensor measurements. This amounts to more
than 2.25M frames, each annotated with an action label, corresponding to 600
samples per action class. We discuss our data acquisition strategy and the
statistics of our dataset, and benchmark state-of-the-art action anticipation
techniques, including a new multi-modal LSTM architecture with an effective
loss function for action anticipation in driving scenarios.Comment: Accepted in ACCV 201
Scaling Egocentric Vision: The EPIC-KITCHENS Dataset
First-person vision is gaining interest as it offers a unique viewpoint on
people's interaction with objects, their attention, and even intention.
However, progress in this challenging domain has been relatively slow due to
the lack of sufficiently large datasets. In this paper, we introduce
EPIC-KITCHENS, a large-scale egocentric video benchmark recorded by 32
participants in their native kitchen environments. Our videos depict
nonscripted daily activities: we simply asked each participant to start
recording every time they entered their kitchen. Recording took place in 4
cities (in North America and Europe) by participants belonging to 10 different
nationalities, resulting in highly diverse cooking styles. Our dataset features
55 hours of video consisting of 11.5M frames, which we densely labeled for a
total of 39.6K action segments and 454.3K object bounding boxes. Our annotation
is unique in that we had the participants narrate their own videos (after
recording), thus reflecting true intention, and we crowd-sourced ground-truths
based on these. We describe our object, action and anticipation challenges, and
evaluate several baselines over two test splits, seen and unseen kitchens.
Dataset and Project page: http://epic-kitchens.github.ioComment: European Conference on Computer Vision (ECCV) 2018 Dataset and
Project page: http://epic-kitchens.github.i
Anticipation and Risk – From the inverse problem to reverse computation
Abstract. Risk assessment is relevant only if it has predictive relevance. In this sense, the anticipatory perspective has yet to contribute to more adequate predictions. For purely physics-based phenomena, predictions are as good as the science describing such phenomena. For the dynamics of the living, the physics of the matter making up the living is only a partial description of their change over time. The space of possibilities is the missing component, complementary to physics and its associated predictions based on probabilistic methods. The inverse modeling problem, and moreover the reverse computation model guide anticipatory-based predictive methodologies. An experimental setting for the quantification of anticipation is advanced and structural measurement is suggested as a possible mathematics for anticipation-based risk assessment
- …