91 research outputs found
Am I Done? Predicting Action Progress in Videos
In this paper we deal with the problem of predicting action progress in
videos. We argue that this is an extremely important task since it can be
valuable for a wide range of interaction applications. To this end we introduce
a novel approach, named ProgressNet, capable of predicting when an action takes
place in a video, where it is located within the frames, and how far it has
progressed during its execution. To provide a general definition of action
progress, we ground our work in the linguistics literature, borrowing terms and
concepts to understand which actions can be the subject of progress estimation.
As a result, we define a categorization of actions and their phases. Motivated
by the recent success obtained from the interaction of Convolutional and
Recurrent Neural Networks, our model is based on a combination of the Faster
R-CNN framework, to make frame-wise predictions, and LSTM networks, to estimate
action progress through time. After introducing two evaluation protocols for
the task at hand, we demonstrate the capability of our model to effectively
predict action progress on the UCF-101 and J-HMDB datasets
Indexing ensembles of exemplar-SVMs with rejecting taxonomies
Ensembles of Exemplar-SVMs have been used for a wide variety of tasks, such as object detection, segmentation, label transfer and mid-level feature learning. In order to make this technique effective though a large collection of classifiers is needed, which often makes the evaluation phase prohibitive. To overcome this issue we exploit the joint distribution of exemplar classifier scores to build a taxonomy capable of indexing each Exemplar-SVM and enabling a fast evaluation of the whole ensemble. We experiment with the Pascal 2007 benchmark on the task of object detection and on a simple segmentation task, in order to verify the robustness of our indexing data structure with reference to the standard Ensemble. We also introduce a rejection strategy to discard not relevant image patches for a more efficient access to the data
MANTRA: Memory Augmented Networks for Multiple Trajectory Prediction
Autonomous vehicles are expected to drive in complex scenarios with several
independent non cooperating agents. Path planning for safely navigating in such
environments can not just rely on perceiving present location and motion of
other agents. It requires instead to predict such variables in a far enough
future. In this paper we address the problem of multimodal trajectory
prediction exploiting a Memory Augmented Neural Network. Our method learns past
and future trajectory embeddings using recurrent neural networks and exploits
an associative external memory to store and retrieve such embeddings.
Trajectory prediction is then performed by decoding in-memory future encodings
conditioned with the observed past. We incorporate scene knowledge in the
decoding state by learning a CNN on top of semantic scene maps. Memory growth
is limited by learning a writing controller based on the predictive capability
of existing embeddings. We show that our method is able to natively perform
multi-modal trajectory prediction obtaining state-of-the art results on three
datasets. Moreover, thanks to the non-parametric nature of the memory module,
we show how once trained our system can continuously improve by ingesting novel
patterns.Comment: Accepted at CVPR2
Multiple Trajectory Prediction of Moving Agents with Memory Augmented Networks
Pedestrians and drivers are expected to safely navigate complex urban environments along with several non cooperating agents. Autonomous vehicles will soon replicate this capability. Each agent acquires a representation of the world from an egocentric perspective and must make decisions ensuring safety for itself and others. This requires to predict motion patterns of observed agents for a far enough future. In this paper we propose MANTRA, a model that exploits memory augmented networks to effectively predict multiple trajectories of other agents, observed from an egocentric perspective. Our model stores observations in memory and uses trained controllers to write meaningful pattern encodings and read trajectories that are most likely to occur in future. We show that our method is able to natively perform multi-modal trajectory prediction obtaining state-of-the art results on four datasets. Moreover, thanks to the non-parametric nature of the memory module, we show how once trained our system can continuously improve by ingesting novel patterns
- …