2,636 research outputs found
Towards Effective Codebookless Model for Image Classification
The bag-of-features (BoF) model for image classification has been thoroughly
studied over the last decade. Different from the widely used BoF methods which
modeled images with a pre-trained codebook, the alternative codebook free image
modeling method, which we call Codebookless Model (CLM), attracted little
attention. In this paper, we present an effective CLM that represents an image
with a single Gaussian for classification. By embedding Gaussian manifold into
a vector space, we show that the simple incorporation of our CLM into a linear
classifier achieves very competitive accuracy compared with state-of-the-art
BoF methods (e.g., Fisher Vector). Since our CLM lies in a high dimensional
Riemannian manifold, we further propose a joint learning method of low-rank
transformation with support vector machine (SVM) classifier on the Gaussian
manifold, in order to reduce computational and storage cost. To study and
alleviate the side effect of background clutter on our CLM, we also present a
simple yet effective partial background removal method based on saliency
detection. Experiments are extensively conducted on eight widely used databases
to demonstrate the effectiveness and efficiency of our CLM method
Understanding Traffic Density from Large-Scale Web Camera Data
Understanding traffic density from large-scale web camera (webcam) videos is
a challenging problem because such videos have low spatial and temporal
resolution, high occlusion and large perspective. To deeply understand traffic
density, we explore both deep learning based and optimization based methods. To
avoid individual vehicle detection and tracking, both methods map the image
into vehicle density map, one based on rank constrained regression and the
other one based on fully convolution networks (FCN). The regression based
method learns different weights for different blocks in the image to increase
freedom degrees of weights and embed perspective information. The FCN based
method jointly estimates vehicle density map and vehicle count with a residual
learning framework to perform end-to-end dense prediction, allowing arbitrary
image resolution, and adapting to different vehicle scales and perspectives. We
analyze and compare both methods, and get insights from optimization based
method to improve deep model. Since existing datasets do not cover all the
challenges in our work, we collected and labelled a large-scale traffic video
dataset, containing 60 million frames from 212 webcams. Both methods are
extensively evaluated and compared on different counting tasks and datasets.
FCN based method significantly reduces the mean absolute error from 10.99 to
5.31 on the public dataset TRANCOS compared with the state-of-the-art baseline.Comment: Accepted by CVPR 2017. Preprint version was uploaded on
http://welcome.isr.tecnico.ulisboa.pt/publications/understanding-traffic-density-from-large-scale-web-camera-data
Proposal Flow: Semantic Correspondences from Object Proposals
Finding image correspondences remains a challenging problem in the presence
of intra-class variations and large changes in scene layout. Semantic flow
methods are designed to handle images depicting different instances of the same
object or scene category. We introduce a novel approach to semantic flow,
dubbed proposal flow, that establishes reliable correspondences using object
proposals. Unlike prevailing semantic flow approaches that operate on pixels or
regularly sampled local regions, proposal flow benefits from the
characteristics of modern object proposals, that exhibit high repeatability at
multiple scales, and can take advantage of both local and geometric consistency
constraints among proposals. We also show that the corresponding sparse
proposal flow can effectively be transformed into a conventional dense flow
field. We introduce two new challenging datasets that can be used to evaluate
both general semantic flow techniques and region-based approaches such as
proposal flow. We use these benchmarks to compare different matching
algorithms, object proposals, and region features within proposal flow, to the
state of the art in semantic flow. This comparison, along with experiments on
standard datasets, demonstrates that proposal flow significantly outperforms
existing semantic flow methods in various settings.Comment: arXiv admin note: text overlap with arXiv:1511.0506
Review of Person Re-identification Techniques
Person re-identification across different surveillance cameras with disjoint
fields of view has become one of the most interesting and challenging subjects
in the area of intelligent video surveillance. Although several methods have
been developed and proposed, certain limitations and unresolved issues remain.
In all of the existing re-identification approaches, feature vectors are
extracted from segmented still images or video frames. Different similarity or
dissimilarity measures have been applied to these vectors. Some methods have
used simple constant metrics, whereas others have utilised models to obtain
optimised metrics. Some have created models based on local colour or texture
information, and others have built models based on the gait of people. In
general, the main objective of all these approaches is to achieve a
higher-accuracy rate and lowercomputational costs. This study summarises
several developments in recent literature and discusses the various available
methods used in person re-identification. Specifically, their advantages and
disadvantages are mentioned and compared.Comment: Published 201
Log-Euclidean Bag of Words for Human Action Recognition
Representing videos by densely extracted local space-time features has
recently become a popular approach for analysing actions. In this paper, we
tackle the problem of categorising human actions by devising Bag of Words (BoW)
models based on covariance matrices of spatio-temporal features, with the
features formed from histograms of optical flow. Since covariance matrices form
a special type of Riemannian manifold, the space of Symmetric Positive Definite
(SPD) matrices, non-Euclidean geometry should be taken into account while
discriminating between covariance matrices. To this end, we propose to embed
SPD manifolds to Euclidean spaces via a diffeomorphism and extend the BoW
approach to its Riemannian version. The proposed BoW approach takes into
account the manifold geometry of SPD matrices during the generation of the
codebook and histograms. Experiments on challenging human action datasets show
that the proposed method obtains notable improvements in discrimination
accuracy, in comparison to several state-of-the-art methods
SIFT Flow: Dense Correspondence across Scenes and its Applications
While image alignment has been studied in different areas of computer vision for decades, aligning images depicting different scenes remains a challenging problem. Analogous to optical flow where an image is aligned to its temporally adjacent frame, we propose SIFT flow, a method to align an image to its nearest neighbors in a large image corpus containing a variety of scenes. The SIFT flow algorithm consists of matching densely sampled, pixel-wise SIFT features between two images, while preserving spatial discontinuities. The SIFT features allow robust matching across different scene/object appearances, whereas the discontinuity-preserving spatial model allows matching of objects located at different parts of the scene. Experiments show that the proposed approach robustly aligns complex scene pairs containing significant spatial differences. Based on SIFT flow, we propose an alignment-based large database framework for image analysis and synthesis, where image information is transferred from the nearest neighbors to a query image according to the dense scene correspondence. This framework is demonstrated through concrete applications, such as motion field prediction from a single image, motion synthesis via object transfer, satellite image registration and face recognition
- …