Search CORE

43,260 research outputs found

Recognize Human Activities from Partially Observed Videos

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Am I Done? Predicting Action Progress in Videos

Author: Ballan Lamberto
Becattini Federico
Del Bimbo Alberto
Seidenari Lorenzo
Uricchio Tiberio
Publication venue
Publication date: 01/01/2020
Field of study

In this paper we deal with the problem of predicting action progress in videos. We argue that this is an extremely important task since it can be valuable for a wide range of interaction applications. To this end we introduce a novel approach, named ProgressNet, capable of predicting when an action takes place in a video, where it is located within the frames, and how far it has progressed during its execution. To provide a general definition of action progress, we ground our work in the linguistics literature, borrowing terms and concepts to understand which actions can be the subject of progress estimation. As a result, we define a categorization of actions and their phases. Motivated by the recent success obtained from the interaction of Convolutional and Recurrent Neural Networks, our model is based on a combination of the Faster R-CNN framework, to make frame-wise predictions, and LSTM networks, to estimate action progress through time. After introducing two evaluation protocols for the task at hand, we demonstrate the capability of our model to effectively predict action progress on the UCF-101 and J-HMDB datasets

arXiv.org e-Print Archive

Archivio della Ricerca - Università degli Studi di Siena

Florence Research

Archivio istituzionale della ricerca - Università di Macerata

Archivio istituzionale della ricerca - Università di Padova

Temporal Relational Reasoning in Videos

Author: A Gaidon
A Gaidon
BM Lake
GA Sigurdsson
L Pinto
L Wang
L Wang
Publication venue
Publication date: 24/07/2018
Field of study

Temporal relational reasoning, the ability to link meaningful transformations of objects or entities over time, is a fundamental property of intelligent species. In this paper, we introduce an effective and interpretable network module, the Temporal Relation Network (TRN), designed to learn and reason about temporal dependencies between video frames at multiple time scales. We evaluate TRN-equipped networks on activity recognition tasks using three recent video datasets - Something-Something, Jester, and Charades - which fundamentally depend on temporal relational reasoning. Our results demonstrate that the proposed TRN gives convolutional neural networks a remarkable capacity to discover temporal relations in videos. Through only sparsely sampled video frames, TRN-equipped networks can accurately predict human-object interactions in the Something-Something dataset and identify various human gestures on the Jester dataset with very competitive performance. TRN-equipped networks also outperform two-stream networks and 3D convolution networks in recognizing daily activities in the Charades dataset. Further analyses show that the models learn intuitive and interpretable visual common sense knowledge in videos.Comment: camera-ready version for ECCV'1

arXiv.org e-Print Archive

DSpace@MIT

Crossref

Progressive Teacher-student Learning for Early Action Prediction

Author: Hu Jian-Fang
Lai Jianhuang
Wang Xionghui
Zhang Jianguo
Zheng Wei-Shi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 09/01/2020
Field of study

Crossref

University of Dundee Online Publications

Learning activity progression in LSTMs for activity detection and early detection

Author: Ma Shugao
Sclaroff Stan
Sigal Leonid
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

In this work we improve training of temporal deep models to better learn activity progression for activity detection and early detection tasks. Conventionally, when training a Recurrent Neural Network, specifically a Long Short Term Memory (LSTM) model, the training loss only considers classification error. However, we argue that the detection score of the correct activity category, or the detection score margin between the correct and incorrect categories, should be monotonically non-decreasing as the model observes more of the activity. We design novel ranking losses that directly penalize the model on violation of such monotonicities, which are used together with classification loss in training of LSTM models. Evaluation on ActivityNet shows significant benefits of the proposed ranking losses in both activity detection and early detection tasks.https://www.cv-foundation.org/openaccess/content_cvpr_2016/html/Ma_Learning_Activity_Progression_CVPR_2016_paper.htmlPublished versio

Crossref

Boston University Institutional Repository (OpenBU)