3,018 research outputs found
Manifold Regularized Slow Feature Analysis for Dynamic Texture Recognition
Dynamic textures exist in various forms, e.g., fire, smoke, and traffic jams,
but recognizing dynamic texture is challenging due to the complex temporal
variations. In this paper, we present a novel approach stemmed from slow
feature analysis (SFA) for dynamic texture recognition. SFA extracts slowly
varying features from fast varying signals. Fortunately, SFA is capable to
leach invariant representations from dynamic textures. However, complex
temporal variations require high-level semantic representations to fully
achieve temporal slowness, and thus it is impractical to learn a high-level
representation from dynamic textures directly by SFA. In order to learn a
robust low-level feature to resolve the complexity of dynamic textures, we
propose manifold regularized SFA (MR-SFA) by exploring the neighbor
relationship of the initial state of each temporal transition and retaining the
locality of their variations. Therefore, the learned features are not only
slowly varying, but also partly predictable. MR-SFA for dynamic texture
recognition is proposed in the following steps: 1) learning feature extraction
functions as convolution filters by MR-SFA, 2) extracting local features by
convolution and pooling, and 3) employing Fisher vectors to form a video-level
representation for classification. Experimental results on dynamic texture and
dynamic scene recognition datasets validate the effectiveness of the proposed
approach.Comment: 12 page
Low Rank Representation on Grassmann Manifolds: An Extrinsic Perspective
Many computer vision algorithms employ subspace models to represent data. The
Low-rank representation (LRR) has been successfully applied in subspace
clustering for which data are clustered according to their subspace structures.
The possibility of extending LRR on Grassmann manifold is explored in this
paper. Rather than directly embedding Grassmann manifold into a symmetric
matrix space, an extrinsic view is taken by building the self-representation of
LRR over the tangent space of each Grassmannian point. A new algorithm for
solving the proposed Grassmannian LRR model is designed and implemented.
Several clustering experiments are conducted on handwritten digits dataset,
dynamic texture video clips and YouTube celebrity face video data. The
experimental results show our method outperforms a number of existing methods.Comment: 9 page
Covariance of Motion and Appearance Featuresfor Spatio Temporal Recognition Tasks
In this paper, we introduce an end-to-end framework for video analysis
focused towards practical scenarios built on theoretical foundations from
sparse representation, including a novel descriptor for general purpose video
analysis. In our approach, we compute kinematic features from optical flow and
first and second-order derivatives of intensities to represent motion and
appearance respectively. These features are then used to construct covariance
matrices which capture joint statistics of both low-level motion and appearance
features extracted from a video. Using an over-complete dictionary of the
covariance based descriptors built from labeled training samples, we formulate
low-level event recognition as a sparse linear approximation problem. Within
this, we pose the sparse decomposition of a covariance matrix, which also
conforms to the space of semi-positive definite matrices, as a determinant
maximization problem. Also since covariance matrices lie on non-linear
Riemannian manifolds, we compare our former approach with a sparse linear
approximation alternative that is suitable for equivalent vector spaces of
covariance matrices. This is done by searching for the best projection of the
query data on a dictionary using an Orthogonal Matching pursuit algorithm. We
show the applicability of our video descriptor in two different application
domains - namely low-level event recognition in unconstrained scenarios and
gesture recognition using one shot learning. Our experiments provide promising
insights in large scale video analysis
Crowd Behavior Analysis: A Review where Physics meets Biology
Although the traits emerged in a mass gathering are often non-deliberative,
the act of mass impulse may lead to irre- vocable crowd disasters. The two-fold
increase of carnage in crowd since the past two decades has spurred significant
advances in the field of computer vision, towards effective and proactive crowd
surveillance. Computer vision stud- ies related to crowd are observed to
resonate with the understanding of the emergent behavior in physics (complex
systems) and biology (animal swarm). These studies, which are inspired by
biology and physics, share surprisingly common insights, and interesting
contradictions. However, this aspect of discussion has not been fully explored.
Therefore, this survey provides the readers with a review of the
state-of-the-art methods in crowd behavior analysis from the physics and
biologically inspired perspectives. We provide insights and comprehensive
discussions for a broader understanding of the underlying prospect of blending
physics and biology studies in computer vision.Comment: Accepted in Neurocomputing, 31 pages, 180 reference
Kernelized Low Rank Representation on Grassmann Manifolds
Low rank representation (LRR) has recently attracted great interest due to
its pleasing efficacy in exploring low-dimensional subspace structures embedded
in data. One of its successful applications is subspace clustering which means
data are clustered according to the subspaces they belong to. In this paper, at
a higher level, we intend to cluster subspaces into classes of subspaces. This
is naturally described as a clustering problem on Grassmann manifold. The
novelty of this paper is to generalize LRR on Euclidean space onto an LRR model
on Grassmann manifold in a uniform kernelized framework. The new methods have
many applications in computer vision tasks. Several clustering experiments are
conducted on handwritten digit images, dynamic textures, human face clips and
traffic scene sequences. The experimental results show that the proposed
methods outperform a number of state-of-the-art subspace clustering methods.Comment: 13 page
Anomaly Detection using Edge Computing in Video Surveillance System: Review
The current concept of Smart Cities influences urban planners and researchers
to provide modern, secured and sustainable infrastructure and give a decent
quality of life to its residents. To fulfill this need video surveillance
cameras have been deployed to enhance the safety and well-being of the
citizens. Despite technical developments in modern science, abnormal event
detection in surveillance video systems is challenging and requires exhaustive
human efforts. In this paper, we surveyed various methodologies developed to
detect anomalies in intelligent video surveillance. Firstly, we revisit the
surveys on anomaly detection in the last decade. We then present a systematic
categorization of methodologies developed for ease of understanding.
Considering the notion of anomaly depends on context, we identify different
objects-of-interest and publicly available datasets in anomaly detection. Since
anomaly detection is considered a time-critical application of computer vision,
our emphasis is on anomaly detection using edge devices and approaches
explicitly designed for them. Further, we discuss the challenges and
opportunities involved in anomaly detection at the edge.Comment: 26 pages, 6 figures, 5 Table
Human Action Recognition using Factorized Spatio-Temporal Convolutional Networks
Human actions in video sequences are three-dimensional (3D) spatio-temporal
signals characterizing both the visual appearance and motion dynamics of the
involved humans and objects. Inspired by the success of convolutional neural
networks (CNN) for image classification, recent attempts have been made to
learn 3D CNNs for recognizing human actions in videos. However, partly due to
the high complexity of training 3D convolution kernels and the need for large
quantities of training videos, only limited success has been reported. This has
triggered us to investigate in this paper a new deep architecture which can
handle 3D signals more effectively. Specifically, we propose factorized
spatio-temporal convolutional networks (FstCN) that factorize the original 3D
convolution kernel learning as a sequential process of learning 2D spatial
kernels in the lower layers (called spatial convolutional layers), followed by
learning 1D temporal kernels in the upper layers (called temporal convolutional
layers). We introduce a novel transformation and permutation operator to make
factorization in FstCN possible. Moreover, to address the issue of sequence
alignment, we propose an effective training and inference strategy based on
sampling multiple video clips from a given action video sequence. We have
tested FstCN on two commonly used benchmark datasets (UCF-101 and HMDB-51).
Without using auxiliary training videos to boost the performance, FstCN
outperforms existing CNN based methods and achieves comparable performance with
a recent method that benefits from using auxiliary training videos
Localized LRR on Grassmann Manifolds: An Extrinsic View
Subspace data representation has recently become a common practice in many
computer vision tasks. It demands generalizing classical machine learning
algorithms for subspace data. Low-Rank Representation (LRR) is one of the most
successful models for clustering vectorial data according to their subspace
structures. This paper explores the possibility of extending LRR for subspace
data on Grassmann manifolds. Rather than directly embedding the Grassmann
manifolds into the symmetric matrix space, an extrinsic view is taken to build
the LRR self-representation in the local area of the tangent space at each
Grassmannian point, resulting in a localized LRR method on Grassmann manifolds.
A novel algorithm for solving the proposed model is investigated and
implemented. The performance of the new clustering algorithm is assessed
through experiments on several real-world datasets including MNIST handwritten
digits, ballet video clips, SKIG action clips, DynTex++ dataset and highway
traffic video clips. The experimental results show the new method outperforms a
number of state-of-the-art clustering methodsComment: IEEE Transactions on Circuits and Systems for Video Technology with
Minor Revisions. arXiv admin note: text overlap with arXiv:1504.0180
Review on Computer Vision Techniques in Emergency Situation
In emergency situations, actions that save lives and limit the impact of
hazards are crucial. In order to act, situational awareness is needed to decide
what to do. Geolocalized photos and video of the situations as they evolve can
be crucial in better understanding them and making decisions faster. Cameras
are almost everywhere these days, either in terms of smartphones, installed
CCTV cameras, UAVs or others. However, this poses challenges in big data and
information overflow. Moreover, most of the time there are no disasters at any
given location, so humans aiming to detect sudden situations may not be as
alert as needed at any point in time. Consequently, computer vision tools can
be an excellent decision support. The number of emergencies where computer
vision tools has been considered or used is very wide, and there is a great
overlap across related emergency research. Researchers tend to focus on
state-of-the-art systems that cover the same emergency as they are studying,
obviating important research in other fields. In order to unveil this overlap,
the survey is divided along four main axes: the types of emergencies that have
been studied in computer vision, the objective that the algorithms can address,
the type of hardware needed and the algorithms used. Therefore, this review
provides a broad overview of the progress of computer vision covering all sorts
of emergencies.Comment: 25 page
A Survey on Object Detection in Optical Remote Sensing Images
Object detection in optical remote sensing images, being a fundamental but
challenging problem in the field of aerial and satellite image analysis, plays
an important role for a wide range of applications and is receiving significant
attention in recent years. While enormous methods exist, a deep review of the
literature concerning generic object detection is still lacking. This paper
aims to provide a review of the recent progress in this field. Different from
several previously published surveys that focus on a specific object class such
as building and road, we concentrate on more generic object categories
including, but are not limited to, road, building, tree, vehicle, ship,
airport, urban-area. Covering about 270 publications we survey 1) template
matching-based object detection methods, 2) knowledge-based object detection
methods, 3) object-based image analysis (OBIA)-based object detection methods,
4) machine learning-based object detection methods, and 5) five publicly
available datasets and three standard evaluation metrics. We also discuss the
challenges of current studies and propose two promising research directions,
namely deep learning-based feature representation and weakly supervised
learning-based geospatial object detection. It is our hope that this survey
will be beneficial for the researchers to have better understanding of this
research field.Comment: This manuscript is the accepted version for ISPRS Journal of
Photogrammetry and Remote Sensin
- …