Search CORE

1,931 research outputs found

Localizing Actions from Video Labels and Pseudo-Annotations

Author: Chang Shih-Fu
Mettes Pascal
Snoek Cees G. M.
Publication venue
Publication date: 01/01/2017
Field of study

The goal of this paper is to determine the spatio-temporal location of actions in video. Where training from hard to obtain box annotations is the norm, we propose an intuitive and effective algorithm that localizes actions from their class label only. We are inspired by recent work showing that unsupervised action proposals selected with human point-supervision perform as well as using expensive box annotations. Rather than asking users to provide point supervision, we propose fully automatic visual cues that replace manual point annotations. We call the cues pseudo-annotations, introduce five of them, and propose a correlation metric for automatically selecting and combining them. Thorough evaluation on challenging action localization datasets shows that we reach results comparable to results with full box supervision. We also show that pseudo-annotations can be leveraged during testing to improve weakly- and strongly-supervised localizers.Comment: BMV

arXiv.org e-Print Archive

Crossref

International Migration, Integration and Social Cohesion online publications

UvA-DARE

TubeR: Tubelet Transformer for Video Action Detection

Author: Bing Shuai
Chen Hao
Kundu Kaustav
Li Xinyu
Liu Chunhui
Marsic Ivan
Modolo Davide
Snoek Cees G. M.
Tighe Joseph
Xiong Yuanjun
Xu Mingze
Zhang Yanyi
Zhao Jiaojiao
Publication venue
Publication date: 06/12/2021
Field of study

We propose TubeR: a simple solution for spatio-temporal video action detection. Different from existing methods that depend on either an off-line actor detector or hand-designed actor-positional hypotheses like proposals or anchors, we propose to directly detect an action tubelet in a video by simultaneously performing action localization and recognition from a single representation. TubeR learns a set of tubelet-queries and utilizes a tubelet-attention module to model the dynamic spatio-temporal nature of a video clip, which effectively reinforces the model capacity compared to using actor-positional hypotheses in the spatio-temporal space. For videos containing transitional states or scene changes, we propose a context aware classification head to utilize short-term and long-term context to strengthen action classification, and an action switch regression head for detecting the precise temporal action extent. TubeR directly produces action tubelets with variable lengths and even maintains good results for long video clips. TubeR outperforms the previous state-of-the-art on commonly used action detection datasets AVA, UCF101-24 and JHMDB51-21

arXiv.org e-Print Archive

Object and action annotation in visual media beyond categories

Author: Becattini Federico
Publication venue
Publication date: 01/01/2018
Field of study

Florence Research

Detecting events and key actors in multi-person videos

Author: Abu-El-Haija Sami
Fei-Fei Li
Gorban Alexander
Huang Jonathan
Murphy Kevin
Ramanathan Vignesh
Publication venue
Publication date: 16/03/2016
Field of study

Multi-person event recognition is a challenging task, often with many people active in the scene but only a small subset contributing to an actual event. In this paper, we propose a model which learns to detect events in such videos while automatically "attending" to the people responsible for the event. Our model does not use explicit annotations regarding who or where those people are during training and testing. In particular, we track people in videos and use a recurrent neural network (RNN) to represent the track features. We learn time-varying attention weights to combine these features at each time-instant. The attended features are then processed using another RNN for event detection/classification. Since most video datasets with multiple people are restricted to a small number of videos, we also collected a new basketball dataset comprising 257 basketball games with 14K event annotations corresponding to 11 event classes. Our model outperforms state-of-the-art methods for both event classification and detection on this new dataset. Additionally, we show that the attention mechanism is able to consistently localize the relevant players.Comment: Accepted for publication in CVPR'1

arXiv.org e-Print Archive

Crossref

Objects for spatio-temporal activity recognition in videos

Author: Mettes P.S.M.
Publication venue
Publication date: 01/01/2017
Field of study

International Migration, Integration and Social Cohesion online publications

Im2Flow: Motion Hallucination from Static Images for Action Recognition

Author: Gao Ruohan
Grauman Kristen
Xiong Bo
Publication venue
Publication date: 30/05/2018
Field of study

Existing methods to recognize actions in static images take the images at their face value, learning the appearances---objects, scenes, and body poses---that distinguish each action class. However, such models are deprived of the rich dynamic structure and motions that also define human activity. We propose an approach that hallucinates the unobserved future motion implied by a single snapshot to help static-image action recognition. The key idea is to learn a prior over short-term dynamics from thousands of unlabeled videos, infer the anticipated optical flow on novel static images, and then train discriminative models that exploit both streams of information. Our main contributions are twofold. First, we devise an encoder-decoder convolutional neural network and a novel optical flow encoding that can translate a static image into an accurate flow map. Second, we show the power of hallucinated flow for recognition, successfully transferring the learned motion into a standard two-stream network for activity recognition. On seven datasets, we demonstrate the power of the approach. It not only achieves state-of-the-art accuracy for dense optical flow prediction, but also consistently enhances recognition of actions and dynamic scenes.Comment: Published in CVPR 2018, project page: http://vision.cs.utexas.edu/projects/im2flow

arXiv.org e-Print Archive

Crossref