1,027 research outputs found
Advantages of dynamic analysis in HOG-PCA feature space for video moving object classification
Classification of moving objects for video surveillance applications still remains a challenging problem due to the video inherently changing conditions such as lighting or resolution. This paper proposes a new approach for vehicle/pedestrian object classification based on the learning of a static kNN classifier, a dynamic Hidden Markov Model (HMM)-based classifier, and the definition of a fusion rule that combines the two outputs. The main novelty consists in the study of the dynamic aspects of the moving objects by analysing the trajectories of the features followed in the HOG-PCA feature space, instead of the classical trajectory study based on the frame coordinates. The complete hybrid system was tested on the VIRAT database and worked in real time, yielding up to 100% peak accuracy rate in the tested video sequences
Unsupervised Understanding of Location and Illumination Changes in Egocentric Videos
Wearable cameras stand out as one of the most promising devices for the
upcoming years, and as a consequence, the demand of computer algorithms to
automatically understand the videos recorded with them is increasing quickly.
An automatic understanding of these videos is not an easy task, and its mobile
nature implies important challenges to be faced, such as the changing light
conditions and the unrestricted locations recorded. This paper proposes an
unsupervised strategy based on global features and manifold learning to endow
wearable cameras with contextual information regarding the light conditions and
the location captured. Results show that non-linear manifold methods can
capture contextual patterns from global features without compromising large
computational resources. The proposed strategy is used, as an application case,
as a switching mechanism to improve the hand-detection problem in egocentric
videos.Comment: Submitted for publicatio
Assessing the Performance of Handcrafted Features for Human action Recognition
Recognition of Human action such as running, punching, bending, kicking etc. plays an vital role in futuristic applications like intelligent video surveillance, health care monitoring, robotics, smart automation system, computer gaming etc. This field relies on various approaches based on hand crafted features like PCA, HOG, LBPH, DWT, STIP, SWF, SWFHOG and deep learning techniques like CNN, RNN and their variants. Though many approaches have been proposed and implemented by researchers, the literature survey suggests that a detailed understanding of the approaches and a comparison of advantages and limitations is required to develop more accurate action recognition method. This paper focuses on this issue and gives detailed analysis of results obtained by implementing algorithms on standardize open source datasets of varying complexity namely Weizmann, KTH, UT Interaction and UCF sports. The results are compared based on the classification accuracy as it is one of the performance measure for checking reliability of the method. The comparison shows that, SHFHOG feature gives the best classification accuracy as compared to other handcrafted features and also outperforms the simple CNN
A robust and efficient video representation for action recognition
This paper introduces a state-of-the-art video representation and applies it
to efficient action recognition and detection. We first propose to improve the
popular dense trajectory features by explicit camera motion estimation. More
specifically, we extract feature point matches between frames using SURF
descriptors and dense optical flow. The matches are used to estimate a
homography with RANSAC. To improve the robustness of homography estimation, a
human detector is employed to remove outlier matches from the human body as
human motion is not constrained by the camera. Trajectories consistent with the
homography are considered as due to camera motion, and thus removed. We also
use the homography to cancel out camera motion from the optical flow. This
results in significant improvement on motion-based HOF and MBH descriptors. We
further explore the recent Fisher vector as an alternative feature encoding
approach to the standard bag-of-words histogram, and consider different ways to
include spatial layout information in these encodings. We present a large and
varied set of evaluations, considering (i) classification of short basic
actions on six datasets, (ii) localization of such actions in feature-length
movies, and (iii) large-scale recognition of complex events. We find that our
improved trajectory features significantly outperform previous dense
trajectories, and that Fisher vectors are superior to bag-of-words encodings
for video recognition tasks. In all three tasks, we show substantial
improvements over the state-of-the-art results
Dynamic texture recognition using time-causal and time-recursive spatio-temporal receptive fields
This work presents a first evaluation of using spatio-temporal receptive
fields from a recently proposed time-causal spatio-temporal scale-space
framework as primitives for video analysis. We propose a new family of video
descriptors based on regional statistics of spatio-temporal receptive field
responses and evaluate this approach on the problem of dynamic texture
recognition. Our approach generalises a previously used method, based on joint
histograms of receptive field responses, from the spatial to the
spatio-temporal domain and from object recognition to dynamic texture
recognition. The time-recursive formulation enables computationally efficient
time-causal recognition. The experimental evaluation demonstrates competitive
performance compared to state-of-the-art. Especially, it is shown that binary
versions of our dynamic texture descriptors achieve improved performance
compared to a large range of similar methods using different primitives either
handcrafted or learned from data. Further, our qualitative and quantitative
investigation into parameter choices and the use of different sets of receptive
fields highlights the robustness and flexibility of our approach. Together,
these results support the descriptive power of this family of time-causal
spatio-temporal receptive fields, validate our approach for dynamic texture
recognition and point towards the possibility of designing a range of video
analysis methods based on these new time-causal spatio-temporal primitives.Comment: 29 pages, 16 figure
- …