Search CORE

63,747 research outputs found

A robust and efficient video representation for action recognition

Author: Oneata Dan
Schmid Cordelia
Verbeek Jakob
Wang Heng
Publication venue
Publication date: 21/04/2015
Field of study

This paper introduces a state-of-the-art video representation and applies it to efficient action recognition and detection. We first propose to improve the popular dense trajectory features by explicit camera motion estimation. More specifically, we extract feature point matches between frames using SURF descriptors and dense optical flow. The matches are used to estimate a homography with RANSAC. To improve the robustness of homography estimation, a human detector is employed to remove outlier matches from the human body as human motion is not constrained by the camera. Trajectories consistent with the homography are considered as due to camera motion, and thus removed. We also use the homography to cancel out camera motion from the optical flow. This results in significant improvement on motion-based HOF and MBH descriptors. We further explore the recent Fisher vector as an alternative feature encoding approach to the standard bag-of-words histogram, and consider different ways to include spatial layout information in these encodings. We present a large and varied set of evaluations, considering (i) classification of short basic actions on six datasets, (ii) localization of such actions in feature-length movies, and (iii) large-scale recognition of complex events. We find that our improved trajectory features significantly outperform previous dense trajectories, and that Fisher vectors are superior to bag-of-words encodings for video recognition tasks. In all three tasks, we show substantial improvements over the state-of-the-art results

arXiv.org e-Print Archive

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Discriminatively Trained Latent Ordinal Model for Video Classification

Author: Sharma Gaurav
Sikka Karan
Publication venue
Publication date: 14/08/2017
Field of study

We study the problem of video classification for facial analysis and human action recognition. We propose a novel weakly supervised learning method that models the video as a sequence of automatically mined, discriminative sub-events (eg. onset and offset phase for "smile", running and jumping for "highjump"). The proposed model is inspired by the recent works on Multiple Instance Learning and latent SVM/HCRF -- it extends such frameworks to model the ordinal aspect in the videos, approximately. We obtain consistent improvements over relevant competitive baselines on four challenging and publicly available video based facial analysis datasets for prediction of expression, clinical pain and intent in dyadic conversations and on three challenging human action datasets. We also validate the method with qualitative results and show that they largely support the intuitions behind the method.Comment: Paper accepted in IEEE TPAMI. arXiv admin note: substantial text overlap with arXiv:1604.0150

arXiv.org e-Print Archive

MPG.PuRe

Action recognition based on efficient deep feature learning in the spatio-temporal domain

Author: Dellen Babette
Husain Syed Farzad
Torras Carme
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

© 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Hand-crafted feature functions are usually designed based on the domain knowledge of a presumably controlled environment and often fail to generalize, as the statistics of real-world data cannot always be modeled correctly. Data-driven feature learning methods, on the other hand, have emerged as an alternative that often generalize better in uncontrolled environments. We present a simple, yet robust, 2D convolutional neural network extended to a concatenated 3D network that learns to extract features from the spatio-temporal domain of raw video data. The resulting network model is used for content-based recognition of videos. Relying on a 2D convolutional neural network allows us to exploit a pretrained network as a descriptor that yielded the best results on the largest and challenging ILSVRC-2014 dataset. Experimental results on commonly used benchmarking video datasets demonstrate that our results are state-of-the-art in terms of accuracy and computational time without requiring any preprocessing (e.g., optic flow) or a priori knowledge on data capture (e.g., camera motion estimation), which makes it more general and flexible than other approaches. Our implementation is made available.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Multi-Action Recognition via Stochastic Modelling of Optical Flow and Gradients

Author: Carvajal Johanna
Lovell Brian C.
McCool Chris
Sanderson Conrad
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

In this paper we propose a novel approach to multi-action recognition that performs joint segmentation and classification. This approach models each action using a Gaussian mixture using robust low-dimensional action features. Segmentation is achieved by performing classification on overlapping temporal windows, which are then merged to produce the final result. This approach is considerably less complicated than previous methods which use dynamic programming or computationally expensive hidden Markov models (HMMs). Initial experiments on a stitched version of the KTH dataset show that the proposed approach achieves an accuracy of 78.3%, outperforming a recent HMM-based approach which obtained 71.2%

arXiv.org e-Print Archive

CiteSeerX

Crossref

Queensland University of Technology ePrints Archive

University of Queensland eSpace

Towards Vision-Based Smart Hospitals: A System for Tracking and Monitoring Hand Hygiene Compliance

Author: Alahi Alexandre
Beninati William
Downing Lance
Fei-Fei Li
Guo Michelle
Haque Albert
Jopling Jeffrey
Luo Zelun
Milstein Arnold
Platchek Terry
Rege Alisha
Singh Amit
Yeung Serena
Publication venue
Publication date: 18/08/2017
Field of study

One in twenty-five patients admitted to a hospital will suffer from a hospital acquired infection. If we can intelligently track healthcare staff, patients, and visitors, we can better understand the sources of such infections. We envision a smart hospital capable of increasing operational efficiency and improving patient care with less spending. In this paper, we propose a non-intrusive vision-based system for tracking people's activity in hospitals. We evaluate our method for the problem of measuring hand hygiene compliance. Empirically, our method outperforms existing solutions such as proximity-based techniques and covert in-person observational studies. We present intuitive, qualitative results that analyze human movement patterns and conduct spatial analytics which convey our method's interpretability. This work is a step towards a computer-vision based smart hospital and demonstrates promising results for reducing hospital acquired infections.Comment: Machine Learning for Healthcare Conference (MLHC

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Going Deeper into Action Recognition: A Survey

Author: Harandi Mehrtash
Herath Samitha
Porikli Fatih
Publication venue
Publication date: 01/01/2017
Field of study

Understanding human actions in visual data is tied to advances in complementary research areas including object recognition, human dynamics, domain adaptation and semantic segmentation. Over the last decade, human action analysis evolved from earlier schemes that are often limited to controlled environments to nowadays advanced solutions that can learn from millions of videos and apply to almost all daily activities. Given the broad range of applications from video surveillance to human-computer interaction, scientific milestones in action recognition are achieved more rapidly, eventually leading to the demise of what used to be good in a short time. This motivated us to provide a comprehensive review of the notable steps taken towards recognizing human actions. To this end, we start our discussion with the pioneering methods that use handcrafted representations, and then, navigate into the realm of deep learning based approaches. We aim to remain objective throughout this survey, touching upon encouraging improvements as well as inevitable fallbacks, in the hope of raising fresh questions and motivating new research directions for the reader

arXiv.org e-Print Archive

The Australian National University

Temporal Extension of Scale Pyramid and Spatial Pyramid Matching for Action Recognition

Author: Hauptmann Alexandar G.
Lan Zhenzhong
Li Xuanchong
Publication venue
Publication date: 29/08/2014
Field of study

Historically, researchers in the field have spent a great deal of effort to create image representations that have scale invariance and retain spatial location information. This paper proposes to encode equivalent temporal characteristics in video representations for action recognition. To achieve temporal scale invariance, we develop a method called temporal scale pyramid (TSP). To encode temporal information, we present and compare two methods called temporal extension descriptor (TED) and temporal division pyramid (TDP) . Our purpose is to suggest solutions for matching complex actions that have large variation in velocity and appearance, which is missing from most current action representations. The experimental results on four benchmark datasets, UCF50, HMDB51, Hollywood2 and Olympic Sports, support our approach and significantly outperform state-of-the-art methods. Most noticeably, we achieve 65.0% mean accuracy and 68.2% mean average precision on the challenging HMDB51 and Hollywood2 datasets which constitutes an absolute improvement over the state-of-the-art by 7.8% and 3.9%, respectively

arXiv.org e-Print Archive

CiteSeerX

Nonlinear brain dynamics as macroscopic manifestation of underlying many-body field dynamics

Author: Freeman Walter J.
Vitiello Giuseppe
Publication venue: 'Elsevier BV'
Publication date: 22/11/2005
Field of study

Neural activity patterns related to behavior occur at many scales in time and space from the atomic and molecular to the whole brain. Here we explore the feasibility of interpreting neurophysiological data in the context of many-body physics by using tools that physicists have devised to analyze comparable hierarchies in other fields of science. We focus on a mesoscopic level that offers a multi-step pathway between the microscopic functions of neurons and the macroscopic functions of brain systems revealed by hemodynamic imaging. We use electroencephalographic (EEG) records collected from high-density electrode arrays fixed on the epidural surfaces of primary sensory and limbic areas in rabbits and cats trained to discriminate conditioned stimuli (CS) in the various modalities. High temporal resolution of EEG signals with the Hilbert transform gives evidence for diverse intermittent spatial patterns of amplitude (AM) and phase modulations (PM) of carrier waves that repeatedly re-synchronize in the beta and gamma ranges at near zero time lags over long distances. The dominant mechanism for neural interactions by axodendritic synaptic transmission should impose distance-dependent delays on the EEG oscillations owing to finite propagation velocities. It does not. EEGs instead show evidence for anomalous dispersion: the existence in neural populations of a low velocity range of information and energy transfers, and a high velocity range of the spread of phase transitions. This distinction labels the phenomenon but does not explain it. In this report we explore the analysis of these phenomena using concepts of energy dissipation, the maintenance by cortex of multiple ground states corresponding to AM patterns, and the exclusive selection by spontaneous breakdown of symmetry (SBS) of single states in sequences.Comment: 31 page

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

Archivio della Ricerca - Università di Salerno