5,808 research outputs found
Embedding based on function approximation for large scale image search
The objective of this paper is to design an embedding method that maps local
features describing an image (e.g. SIFT) to a higher dimensional representation
useful for the image retrieval problem. First, motivated by the relationship
between the linear approximation of a nonlinear function in high dimensional
space and the stateof-the-art feature representation used in image retrieval,
i.e., VLAD, we propose a new approach for the approximation. The embedded
vectors resulted by the function approximation process are then aggregated to
form a single representation for image retrieval. Second, in order to make the
proposed embedding method applicable to large scale problem, we further derive
its fast version in which the embedded vectors can be efficiently computed,
i.e., in the closed-form. We compare the proposed embedding methods with the
state of the art in the context of image search under various settings: when
the images are represented by medium length vectors, short vectors, or binary
vectors. The experimental results show that the proposed embedding methods
outperform existing the state of the art on the standard public image retrieval
benchmarks.Comment: Accepted to TPAMI 2017. The implementation and precomputed features
of the proposed F-FAemb are released at the following link:
http://tinyurl.com/F-FAem
Affine Subspace Representation for Feature Description
This paper proposes a novel Affine Subspace Representation (ASR) descriptor
to deal with affine distortions induced by viewpoint changes. Unlike the
traditional local descriptors such as SIFT, ASR inherently encodes local
information of multi-view patches, making it robust to affine distortions while
maintaining a high discriminative ability. To this end, PCA is used to
represent affine-warped patches as PCA-patch vectors for its compactness and
efficiency. Then according to the subspace assumption, which implies that the
PCA-patch vectors of various affine-warped patches of the same keypoint can be
represented by a low-dimensional linear subspace, the ASR descriptor is
obtained by using a simple subspace-to-point mapping. Such a linear subspace
representation could accurately capture the underlying information of a
keypoint (local structure) under multiple views without sacrificing its
distinctiveness. To accelerate the computation of ASR descriptor, a fast
approximate algorithm is proposed by moving the most computational part (ie,
warp patch under various affine transformations) to an offline training stage.
Experimental results show that ASR is not only better than the state-of-the-art
descriptors under various image transformations, but also performs well without
a dedicated affine invariant detector when dealing with viewpoint changes.Comment: To Appear in the 2014 European Conference on Computer Visio
Fast, Dense Feature SDM on an iPhone
In this paper, we present our method for enabling dense SDM to run at over 90
FPS on a mobile device. Our contributions are two-fold. Drawing inspiration
from the FFT, we propose a Sparse Compositional Regression (SCR) framework,
which enables a significant speed up over classical dense regressors. Second,
we propose a binary approximation to SIFT features. Binary Approximated SIFT
(BASIFT) features, which are a computationally efficient approximation to SIFT,
a commonly used feature with SDM. We demonstrate the performance of our
algorithm on an iPhone 7, and show that we achieve similar accuracy to SDM
Low-rank SIFT: An Affine Invariant Feature for Place Recognition
In this paper, we present a novel affine-invariant feature based on SIFT,
leveraging the regular appearance of man-made objects. The feature achieves
full affine invariance without needing to simulate over affine parameter space.
Low-rank SIFT, as we name the feature, is based on our observation that local
tilt, which are caused by changes of camera axis orientation, could be
normalized by converting local patches to standard low-rank forms. Rotation,
translation and scaling invariance could be achieved in ways similar to SIFT.
As an extension of SIFT, our method seeks to add prior to solve the ill-posed
affine parameter estimation problem and normalizes them directly, and is
applicable to objects with regular structures. Furthermore, owing to recent
breakthrough in convex optimization, such parameter could be computed
efficiently. We will demonstrate its effectiveness in place recognition as our
major application. As extra contributions, we also describe our pipeline of
constructing geotagged building database from the ground up, as well as an
efficient scheme for automatic feature selection
Fast and robust 3D feature extraction from sparse point clouds
Matching 3D point clouds, a critical operation in map building and localization, is difficult with Velodyne-type sensors due to the sparse and non-uniform point clouds that they produce. Standard methods from dense 3D point clouds are generally not effective. In this paper, we describe a featurebased approach using Principal Components Analysis (PCA) of neighborhoods of points, which results in mathematically principled line and plane features. The key contribution in this work is to show how this type of feature extraction can be done efficiently and robustly even on non-uniformly sampled point clouds. The resulting detector runs in real-time and can be easily tuned to have a low false positive rate, simplifying data association. We evaluate the performance of our algorithm on an autonomous car at the MCity Test Facility using a Velodyne HDL-32E, and we compare our results against the state-of-theart NARF keypoint detector. © 2016 IEEE
Scalable Image Retrieval by Sparse Product Quantization
Fast Approximate Nearest Neighbor (ANN) search technique for high-dimensional
feature indexing and retrieval is the crux of large-scale image retrieval. A
recent promising technique is Product Quantization, which attempts to index
high-dimensional image features by decomposing the feature space into a
Cartesian product of low dimensional subspaces and quantizing each of them
separately. Despite the promising results reported, their quantization approach
follows the typical hard assignment of traditional quantization methods, which
may result in large quantization errors and thus inferior search performance.
Unlike the existing approaches, in this paper, we propose a novel approach
called Sparse Product Quantization (SPQ) to encoding the high-dimensional
feature vectors into sparse representation. We optimize the sparse
representations of the feature vectors by minimizing their quantization errors,
making the resulting representation is essentially close to the original data
in practice. Experiments show that the proposed SPQ technique is not only able
to compress data, but also an effective encoding technique. We obtain
state-of-the-art results for ANN search on four public image datasets and the
promising results of content-based image retrieval further validate the
efficacy of our proposed method.Comment: 12 page
On using gait to enhance frontal face extraction
Visual surveillance finds increasing deployment formonitoring urban environments. Operators need to be able to determine identity from surveillance images and often use face recognition for this purpose. In surveillance environments, it is necessary to handle pose variation of the human head, low frame rate, and low resolution input images. We describe the first use of gait to enable face acquisition and recognition, by analysis of 3-D head motion and gait trajectory, with super-resolution analysis. We use region- and distance-based refinement of head pose estimation. We develop a direct mapping to relate the 2-D image with a 3-D model. In gait trajectory analysis, we model the looming effect so as to obtain the correct face region. Based on head position and the gait trajectory, we can reconstruct high-quality frontal face images which are demonstrated to be suitable for face recognition. The contributions of this research include the construction of a 3-D model for pose estimation from planar imagery and the first use of gait information to enhance the face extraction process allowing for deployment in surveillance scenario
- …