3,389 research outputs found
Multi-View Region Adaptive Multi-temporal DMM and RGB Action Recognition
Human action recognition remains an important yet challenging task. This work
proposes a novel action recognition system. It uses a novel Multiple View
Region Adaptive Multi-resolution in time Depth Motion Map (MV-RAMDMM)
formulation combined with appearance information. Multiple stream 3D
Convolutional Neural Networks (CNNs) are trained on the different views and
time resolutions of the region adaptive Depth Motion Maps. Multiple views are
synthesised to enhance the view invariance. The region adaptive weights, based
on localised motion, accentuate and differentiate parts of actions possessing
faster motion. Dedicated 3D CNN streams for multi-time resolution appearance
information (RGB) are also included. These help to identify and differentiate
between small object interactions. A pre-trained 3D-CNN is used here with
fine-tuning for each stream along with multiple class Support Vector Machines
(SVM)s. Average score fusion is used on the output. The developed approach is
capable of recognising both human action and human-object interaction. Three
public domain datasets including: MSR 3D Action,Northwestern UCLA multi-view
actions and MSR 3D daily activity are used to evaluate the proposed solution.
The experimental results demonstrate the robustness of this approach compared
with state-of-the-art algorithms.Comment: 14 pages, 6 figures, 13 tables. Submitte
Action Recognition in Videos: from Motion Capture Labs to the Web
This paper presents a survey of human action recognition approaches based on
visual data recorded from a single video camera. We propose an organizing
framework which puts in evidence the evolution of the area, with techniques
moving from heavily constrained motion capture scenarios towards more
challenging, realistic, "in the wild" videos. The proposed organization is
based on the representation used as input for the recognition task, emphasizing
the hypothesis assumed and thus, the constraints imposed on the type of video
that each technique is able to address. Expliciting the hypothesis and
constraints makes the framework particularly useful to select a method, given
an application. Another advantage of the proposed organization is that it
allows categorizing newest approaches seamlessly with traditional ones, while
providing an insightful perspective of the evolution of the action recognition
task up to now. That perspective is the basis for the discussion in the end of
the paper, where we also present the main open issues in the area.Comment: Preprint submitted to CVIU, survey paper, 46 pages, 2 figures, 4
table
STV-based Video Feature Processing for Action Recognition
In comparison to still image-based processes, video features can provide rich and intuitive information about dynamic events occurred over a period of time, such as human actions, crowd behaviours, and other subject pattern changes. Although substantial progresses have been made in the last decade on image processing and seen its successful applications in face matching and object recognition, video-based event detection still remains one of the most difficult challenges in computer vision research due to its complex continuous or discrete input signals, arbitrary dynamic feature definitions, and the often ambiguous analytical methods. In this paper, a Spatio-Temporal Volume (STV) and region intersection (RI) based 3D shape-matching method has been proposed to facilitate the definition and recognition of human actions recorded in videos. The distinctive characteristics and the performance gain of the devised approach stemmed from a coefficient factor-boosted 3D region intersection and matching mechanism developed in this research. This paper also reported the investigation into techniques for efficient STV data filtering to reduce the amount of voxels (volumetric-pixels) that need to be processed in each operational cycle in the implemented system. The encouraging features and improvements on the operational performance registered in the experiments have been discussed at the end
Discriminatively Trained Latent Ordinal Model for Video Classification
We study the problem of video classification for facial analysis and human
action recognition. We propose a novel weakly supervised learning method that
models the video as a sequence of automatically mined, discriminative
sub-events (eg. onset and offset phase for "smile", running and jumping for
"highjump"). The proposed model is inspired by the recent works on Multiple
Instance Learning and latent SVM/HCRF -- it extends such frameworks to model
the ordinal aspect in the videos, approximately. We obtain consistent
improvements over relevant competitive baselines on four challenging and
publicly available video based facial analysis datasets for prediction of
expression, clinical pain and intent in dyadic conversations and on three
challenging human action datasets. We also validate the method with qualitative
results and show that they largely support the intuitions behind the method.Comment: Paper accepted in IEEE TPAMI. arXiv admin note: substantial text
overlap with arXiv:1604.0150
Vision based behavior recognition of laboratory animals for drug analysis and testing
Ankara : The Department of Electrical and Electronics Engineering and the Institute of Engineering and Sciences of Bilkent University, 2009.Thesis (Master's) -- Bilkent University, 2009.Includes bibliographical references leaves 92-101.In pharmacological experiments, a popular method to discover the effects of psychotherapeutic
drugs is to monitor behaviors of laboratory mice subjected to
drugs by vision sensors. Such surveillance operations are currently performed by
human observers for practical reasons. Automating behavior analysis of laboratory
mice by vision-based methods saves both time and human labor. In this
study, we focus on automated action recognition of laboratory mice from short
video clips in which only one action is performed. A two-stage hierarchical recognition
method is designed to address the problem. In the first stage, still actions
such as sleeping are separated from other action classes based on the amount
of the motion area. Remaining action classes are discriminated by the second
stage for which we propose four alternative methods. In the first method, we
project 3D action volume onto 2D images by encoding temporal variations of each
pixel using discrete wavelet transform (DWT). Resulting images are modeled and
classified by hidden Markov models in maximum likelihood sense. The second
method transforms action recognition problem into a sequence matching problem
by explicitly describing pose of the subject in each frame. Instead of segmenting
the subject from the background, we only take temporally active portions of the
subject into consideration in pose description. Histograms of oriented gradients
are employed to describe poses in frames. In the third method, actions are represented
by a set of histograms of normalized spatio-temporal gradients computed
from entire action volume at different temporal resolutions. The last method
assumes that actions are collections of known spatio-temporal templates and can
be described by histograms of those. To locate and describe such templates in actions,
multi-scale 3D Harris corner detector and histogram of oriented gradients
and optical flow vectors are employed, respectively. We test the proposed action
recognition framework on a publicly available mice action dataset. In addition,
we provide comparisons of each method with well-known studies in the literature.
We find that the second and the fourth methods outperform both related studies
and the other two methods in our framework in overall recognition rates. However,
the more successful methods suffer from heavy computational cost. This
study shows that representing actions as an ordered sequence of pose descriptors
is quite effective in action recognition. In addition, success of the fourth method
reveals that sparse spatio-temporal templates characterize the content of actions
quite well.Sandıkcı, SelçukM.S
Going Deeper into Action Recognition: A Survey
Understanding human actions in visual data is tied to advances in
complementary research areas including object recognition, human dynamics,
domain adaptation and semantic segmentation. Over the last decade, human action
analysis evolved from earlier schemes that are often limited to controlled
environments to nowadays advanced solutions that can learn from millions of
videos and apply to almost all daily activities. Given the broad range of
applications from video surveillance to human-computer interaction, scientific
milestones in action recognition are achieved more rapidly, eventually leading
to the demise of what used to be good in a short time. This motivated us to
provide a comprehensive review of the notable steps taken towards recognizing
human actions. To this end, we start our discussion with the pioneering methods
that use handcrafted representations, and then, navigate into the realm of deep
learning based approaches. We aim to remain objective throughout this survey,
touching upon encouraging improvements as well as inevitable fallbacks, in the
hope of raising fresh questions and motivating new research directions for the
reader
- …