Search CORE

10,384 research outputs found

Spatial-Aware Object Embeddings for Zero-Shot Localization and Classification of Actions

Author: Mettes Pascal
Snoek Cees G. M.
Publication venue
Publication date: 01/01/2017
Field of study

We aim for zero-shot localization and classification of human actions in video. Where traditional approaches rely on global attribute or object classification scores for their zero-shot knowledge transfer, our main contribution is a spatial-aware object embedding. To arrive at spatial awareness, we build our embedding on top of freely available actor and object detectors. Relevance of objects is determined in a word embedding space and further enforced with estimated spatial preferences. Besides local object awareness, we also embed global object awareness into our embedding to maximize actor and object interaction. Finally, we exploit the object positions and sizes in the spatial-aware embedding to demonstrate a new spatio-temporal action retrieval scenario with composite queries. Action localization and classification experiments on four contemporary action video datasets support our proposal. Apart from state-of-the-art results in the zero-shot localization and classification settings, our spatial-aware embedding is even competitive with recent supervised action localization alternatives.Comment: ICC

arXiv.org e-Print Archive

Crossref

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Spatio-temporal wardrobe generation of actor's clothing in video content

Author: E Simo-Serra
F Wang
H Wang
J Liaukonyte
K Nogueira
K Taşdemir
L Baraldi
L dos Santos Belo
M Ajmal
P Šaloun
R Achanta
SA Chatzichristofis
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Crossref

Ghent University Academic Bibliography

Action Recognition in Videos: from Motion Capture Labs to the Web

Author: Ana Paula Br
Arnaldo Albuquerque De Araújo
De Almeida
Eduardo Alves
Jussara Marques
Publication venue
Publication date: 17/06/2010
Field of study

This paper presents a survey of human action recognition approaches based on visual data recorded from a single video camera. We propose an organizing framework which puts in evidence the evolution of the area, with techniques moving from heavily constrained motion capture scenarios towards more challenging, realistic, "in the wild" videos. The proposed organization is based on the representation used as input for the recognition task, emphasizing the hypothesis assumed and thus, the constraints imposed on the type of video that each technique is able to address. Expliciting the hypothesis and constraints makes the framework particularly useful to select a method, given an application. Another advantage of the proposed organization is that it allows categorizing newest approaches seamlessly with traditional ones, while providing an insightful perspective of the evolution of the action recognition task up to now. That perspective is the basis for the discussion in the end of the paper, where we also present the main open issues in the area.Comment: Preprint submitted to CVIU, survey paper, 46 pages, 2 figures, 4 table

arXiv.org e-Print Archive

CiteSeerX

Content-based Video Retrieval

Author: Petkovic M.
Publication venue: University of Konstanz
Publication date: 01/01/2000
Field of study

no abstract

CiteSeerX

Pure OAI Repository

University of Twente Research Information

Spatio-temporal Video Re-localization by Warp LSTM

Author: Feng Yang
Liu Wei
Luo Jiebo
Ma Lin
Publication venue
Publication date: 09/05/2019
Field of study

The need for efficiently finding the video content a user wants is increasing because of the erupting of user-generated videos on the Web. Existing keyword-based or content-based video retrieval methods usually determine what occurs in a video but not when and where. In this paper, we make an answer to the question of when and where by formulating a new task, namely spatio-temporal video re-localization. Specifically, given a query video and a reference video, spatio-temporal video re-localization aims to localize tubelets in the reference video such that the tubelets semantically correspond to the query. To accurately localize the desired tubelets in the reference video, we propose a novel warp LSTM network, which propagates the spatio-temporal information for a long period and thereby captures the corresponding long-term dependencies. Another issue for spatio-temporal video re-localization is the lack of properly labeled video datasets. Therefore, we reorganize the videos in the AVA dataset to form a new dataset for spatio-temporal video re-localization research. Extensive experimental results show that the proposed model achieves superior performances over the designed baselines on the spatio-temporal video re-localization task

arXiv.org e-Print Archive

Crossref

Unsupervised Object Discovery and Tracking in Video Collections

Author: Cho Minsu
Kwak Suha
Laptev Ivan
Ponce Jean
Schmid Cordelia
Publication venue
Publication date: 14/05/2015
Field of study

This paper addresses the problem of automatically localizing dominant objects as spatio-temporal tubes in a noisy collection of videos with minimal or even no supervision. We formulate the problem as a combination of two complementary processes: discovery and tracking. The first one establishes correspondences between prominent regions across videos, and the second one associates successive similar object regions within the same video. Interestingly, our algorithm also discovers the implicit topology of frames associated with instances of the same object class across different videos, a role normally left to supervisory information in the form of class labels in conventional image and video understanding methods. Indeed, as demonstrated by our experiments, our method can handle video collections featuring multiple object classes, and substantially outperforms the state of the art in colocalization, even though it tackles a broader problem with much less supervision

arXiv.org e-Print Archive

CiteSeerX

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

포항공과대학교

Search Tracker: Human-derived object tracking in-the-wild through large-scale search and retrieval

Author: Bency Archith J.
De Leo Carter
Karthikeyan S.
Manjunath B. S.
Sunderrajan Santhoshkumar
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 04/02/2016
Field of study

Humans use context and scene knowledge to easily localize moving objects in conditions of complex illumination changes, scene clutter and occlusions. In this paper, we present a method to leverage human knowledge in the form of annotated video libraries in a novel search and retrieval based setting to track objects in unseen video sequences. For every video sequence, a document that represents motion information is generated. Documents of the unseen video are queried against the library at multiple scales to find videos with similar motion characteristics. This provides us with coarse localization of objects in the unseen video. We further adapt these retrieved object locations to the new video using an efficient warping scheme. The proposed method is validated on in-the-wild video surveillance datasets where we outperform state-of-the-art appearance-based trackers. We also introduce a new challenging dataset with complex object appearance changes.Comment: Under review with the IEEE Transactions on Circuits and Systems for Video Technolog

arXiv.org e-Print Archive

Crossref

eScholarship - University of California