CORE
🇺🇦
make metadata, not war
Services
Services overview
Explore all CORE services
Access to raw data
API
Dataset
FastSync
Content discovery
Recommender
Discovery
OAI identifiers
OAI Resolver
Managing content
Dashboard
Bespoke contracts
Consultancy services
Support us
Support us
Membership
Sponsorship
Community governance
Advisory Board
Board of supporters
Research network
About
About us
Our mission
Team
Blog
FAQs
Contact us
research
Action recognition based on efficient deep feature learning in the spatio-temporal domain
Authors
Babette Dellen
Syed Farzad Husain
Carme Torras
Publication date
1 January 2016
Publisher
'Institute of Electrical and Electronics Engineers (IEEE)'
Doi
Cite
Abstract
© 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Hand-crafted feature functions are usually designed based on the domain knowledge of a presumably controlled environment and often fail to generalize, as the statistics of real-world data cannot always be modeled correctly. Data-driven feature learning methods, on the other hand, have emerged as an alternative that often generalize better in uncontrolled environments. We present a simple, yet robust, 2D convolutional neural network extended to a concatenated 3D network that learns to extract features from the spatio-temporal domain of raw video data. The resulting network model is used for content-based recognition of videos. Relying on a 2D convolutional neural network allows us to exploit a pretrained network as a descriptor that yielded the best results on the largest and challenging ILSVRC-2014 dataset. Experimental results on commonly used benchmarking video datasets demonstrate that our results are state-of-the-art in terms of accuracy and computational time without requiring any preprocessing (e.g., optic flow) or a priori knowledge on data capture (e.g., camera motion estimation), which makes it more general and flexible than other approaches. Our implementation is made available.Peer ReviewedPostprint (author's final draft
Similar works
Full text
Open in the Core reader
Download PDF
Available Versions
UPCommons
See this paper in CORE
Go to the repository landing page
Download from data provider
oai:upcommons.upc.edu:2117/103...
Last time updated on 17/04/2020
UPCommons. Portal del coneixement obert de la UPC
See this paper in CORE
Go to the repository landing page
Download from data provider
oai:upcommons.upc.edu:2117/103...
Last time updated on 01/05/2017