2,717 research outputs found
Recurrent Attention Models for Depth-Based Person Identification
We present an attention-based model that reasons on human body shape and
motion dynamics to identify individuals in the absence of RGB information,
hence in the dark. Our approach leverages unique 4D spatio-temporal signatures
to address the identification problem across days. Formulated as a
reinforcement learning task, our model is based on a combination of
convolutional and recurrent neural networks with the goal of identifying small,
discriminative regions indicative of human identity. We demonstrate that our
model produces state-of-the-art results on several published datasets given
only depth images. We further study the robustness of our model towards
viewpoint, appearance, and volumetric changes. Finally, we share insights
gleaned from interpretable 2D, 3D, and 4D visualizations of our model's
spatio-temporal attention.Comment: Computer Vision and Pattern Recognition (CVPR) 201
Extrinsic Methods for Coding and Dictionary Learning on Grassmann Manifolds
Sparsity-based representations have recently led to notable results in
various visual recognition tasks. In a separate line of research, Riemannian
manifolds have been shown useful for dealing with features and models that do
not lie in Euclidean spaces. With the aim of building a bridge between the two
realms, we address the problem of sparse coding and dictionary learning over
the space of linear subspaces, which form Riemannian structures known as
Grassmann manifolds. To this end, we propose to embed Grassmann manifolds into
the space of symmetric matrices by an isometric mapping. This in turn enables
us to extend two sparse coding schemes to Grassmann manifolds. Furthermore, we
propose closed-form solutions for learning a Grassmann dictionary, atom by
atom. Lastly, to handle non-linearity in data, we extend the proposed Grassmann
sparse coding and dictionary learning algorithms through embedding into Hilbert
spaces.
Experiments on several classification tasks (gender recognition, gesture
classification, scene analysis, face recognition, action recognition and
dynamic texture classification) show that the proposed approaches achieve
considerable improvements in discrimination accuracy, in comparison to
state-of-the-art methods such as kernelized Affine Hull Method and
graph-embedding Grassmann discriminant analysis.Comment: Appearing in International Journal of Computer Visio
Sparse error gait image: a new representation for gait recognition
The performance of a gait recognition system is very much related to the usage of efficient feature representation and recognition modules. The first extracts features from an input image sequence to represent a user's distinctive gait pattern. The recognition module then compares the features of a probe user with those registered in the gallery database. This paper presents a novel gait feature representation, called Sparse Error Gait Image (SEGI), derived from the application of Robust Principal Component Analysis (RPCA) to Gait Energy Images (GEI). GEIs obtained from the same user at different instants always present some differences. Applying RPCA results in low-rank and sparse error components, the former capturing the commonalities and encompassing the small differences between input GEIs, while the larger differences are captured by the sparse error component. The proposed SEGI representation exploits the latter for recognition purposes. This paper also proposes two simple approaches for the recognition module, to exploit the SEGI, based on the computation of a Euclidean norm or the Euclidean distance. Using these simple recognition methods and the proposed SEGI representation gait recognition, results equivalent to the state-of-the-art are obtained
Review of Person Re-identification Techniques
Person re-identification across different surveillance cameras with disjoint
fields of view has become one of the most interesting and challenging subjects
in the area of intelligent video surveillance. Although several methods have
been developed and proposed, certain limitations and unresolved issues remain.
In all of the existing re-identification approaches, feature vectors are
extracted from segmented still images or video frames. Different similarity or
dissimilarity measures have been applied to these vectors. Some methods have
used simple constant metrics, whereas others have utilised models to obtain
optimised metrics. Some have created models based on local colour or texture
information, and others have built models based on the gait of people. In
general, the main objective of all these approaches is to achieve a
higher-accuracy rate and lowercomputational costs. This study summarises
several developments in recent literature and discusses the various available
methods used in person re-identification. Specifically, their advantages and
disadvantages are mentioned and compared.Comment: Published 201
Person re-Identification over distributed spaces and time
PhDReplicating the human visual system and cognitive abilities that the brain uses to process the
information it receives is an area of substantial scientific interest. With the prevalence of video
surveillance cameras a portion of this scientific drive has been into providing useful automated
counterparts to human operators. A prominent task in visual surveillance is that of matching
people between disjoint camera views, or re-identification. This allows operators to locate people
of interest, to track people across cameras and can be used as a precursory step to multi-camera
activity analysis. However, due to the contrasting conditions between camera views and their
effects on the appearance of people re-identification is a non-trivial task. This thesis proposes
solutions for reducing the visual ambiguity in observations of people between camera views
This thesis first looks at a method for mitigating the effects on the appearance of people under
differing lighting conditions between camera views. This thesis builds on work modelling
inter-camera illumination based on known pairs of images. A Cumulative Brightness Transfer
Function (CBTF) is proposed to estimate the mapping of colour brightness values based on limited
training samples. Unlike previous methods that use a mean-based representation for a set of
training samples, the cumulative nature of the CBTF retains colour information from underrepresented
samples in the training set. Additionally, the bi-directionality of the mapping function
is explored to try and maximise re-identification accuracy by ensuring samples are accurately
mapped between cameras.
Secondly, an extension is proposed to the CBTF framework that addresses the issue of changing
lighting conditions within a single camera. As the CBTF requires manually labelled training
samples it is limited to static lighting conditions and is less effective if the lighting changes. This
Adaptive CBTF (A-CBTF) differs from previous approaches that either do not consider lighting
change over time, or rely on camera transition time information to update. By utilising contextual
information drawn from the background in each camera view, an estimation of the lighting
change within a single camera can be made. This background lighting model allows the mapping
of colour information back to the original training conditions and thus remove the need for
3
retraining.
Thirdly, a novel reformulation of re-identification as a ranking problem is proposed. Previous
methods use a score based on a direct distance measure of set features to form a correct/incorrect
match result. Rather than offering an operator a single outcome, the ranking paradigm is to give
the operator a ranked list of possible matches and allow them to make the final decision. By utilising
a Support Vector Machine (SVM) ranking method, a weighting on the appearance features
can be learned that capitalises on the fact that not all image features are equally important to
re-identification. Additionally, an Ensemble-RankSVM is proposed to address scalability issues
by separating the training samples into smaller subsets and boosting the trained models.
Finally, the thesis looks at a practical application of the ranking paradigm in a real world application.
The system encompasses both the re-identification stage and the precursory extraction
and tracking stages to form an aid for CCTV operators. Segmentation and detection are combined
to extract relevant information from the video, while several combinations of matching
techniques are combined with temporal priors to form a more comprehensive overall matching
criteria.
The effectiveness of the proposed approaches is tested on datasets obtained from a variety
of challenging environments including offices, apartment buildings, airports and outdoor public
spaces
Uncooperative gait recognition by learning to rank
This work has partially been supported by projects CICYT TIN2009-14205-C04-04 from the Spanish Ministry of Innovation and Science, and P1-1B2012-22, PREDOC/2008/04 and E-2011-36 from Universitat Jaume I of CastellΓ³n
Action Recognition in Videos: from Motion Capture Labs to the Web
This paper presents a survey of human action recognition approaches based on
visual data recorded from a single video camera. We propose an organizing
framework which puts in evidence the evolution of the area, with techniques
moving from heavily constrained motion capture scenarios towards more
challenging, realistic, "in the wild" videos. The proposed organization is
based on the representation used as input for the recognition task, emphasizing
the hypothesis assumed and thus, the constraints imposed on the type of video
that each technique is able to address. Expliciting the hypothesis and
constraints makes the framework particularly useful to select a method, given
an application. Another advantage of the proposed organization is that it
allows categorizing newest approaches seamlessly with traditional ones, while
providing an insightful perspective of the evolution of the action recognition
task up to now. That perspective is the basis for the discussion in the end of
the paper, where we also present the main open issues in the area.Comment: Preprint submitted to CVIU, survey paper, 46 pages, 2 figures, 4
table
- β¦