12,141 research outputs found
Pose Estimation using Local Structure-Specific Shape and Appearance Context
We address the problem of estimating the alignment pose between two models
using structure-specific local descriptors. Our descriptors are generated using
a combination of 2D image data and 3D contextual shape data, resulting in a set
of semi-local descriptors containing rich appearance and shape information for
both edge and texture structures. This is achieved by defining feature space
relations which describe the neighborhood of a descriptor. By quantitative
evaluations, we show that our descriptors provide high discriminative power
compared to state of the art approaches. In addition, we show how to utilize
this for the estimation of the alignment pose between two point sets. We
present experiments both in controlled and real-life scenarios to validate our
approach
CSIFT Based Locality-constrained Linear Coding for Image Classification
In the past decade, SIFT descriptor has been witnessed as one of the most
robust local invariant feature descriptors and widely used in various vision
tasks. Most traditional image classification systems depend on the
luminance-based SIFT descriptors, which only analyze the gray level variations
of the images. Misclassification may happen since their color contents are
ignored. In this article, we concentrate on improving the performance of
existing image classification algorithms by adding color information. To
achieve this purpose, different kinds of colored SIFT descriptors are
introduced and implemented. Locality-constrained Linear Coding (LLC), a
state-of-the-art sparse coding technology, is employed to construct the image
classification system for the evaluation. The real experiments are carried out
on several benchmarks. With the enhancements of color SIFT, the proposed image
classification system obtains approximate 3% improvement of classification
accuracy on the Caltech-101 dataset and approximate 4% improvement of
classification accuracy on the Caltech-256 dataset.Comment: 9 pages, 5 figure
3D Pose Estimation and 3D Model Retrieval for Objects in the Wild
We propose a scalable, efficient and accurate approach to retrieve 3D models
for objects in the wild. Our contribution is twofold. We first present a 3D
pose estimation approach for object categories which significantly outperforms
the state-of-the-art on Pascal3D+. Second, we use the estimated pose as a prior
to retrieve 3D models which accurately represent the geometry of objects in RGB
images. For this purpose, we render depth images from 3D models under our
predicted pose and match learned image descriptors of RGB images against those
of rendered depth images using a CNN-based multi-view metric learning approach.
In this way, we are the first to report quantitative results for 3D model
retrieval on Pascal3D+, where our method chooses the same models as human
annotators for 50% of the validation images on average. In addition, we show
that our method, which was trained purely on Pascal3D+, retrieves rich and
accurate 3D models from ShapeNet given RGB images of objects in the wild.Comment: Accepted to Conference on Computer Vision and Pattern Recognition
(CVPR) 201
Robust Depth-based Person Re-identification
Person re-identification (re-id) aims to match people across non-overlapping
camera views. So far the RGB-based appearance is widely used in most existing
works. However, when people appeared in extreme illumination or changed
clothes, the RGB appearance-based re-id methods tended to fail. To overcome
this problem, we propose to exploit depth information to provide more invariant
body shape and skeleton information regardless of illumination and color
change. More specifically, we exploit depth voxel covariance descriptor and
further propose a locally rotation invariant depth shape descriptor called
Eigen-depth feature to describe pedestrian body shape. We prove that the
distance between any two covariance matrices on the Riemannian manifold is
equivalent to the Euclidean distance between the corresponding Eigen-depth
features. Furthermore, we propose a kernelized implicit feature transfer scheme
to estimate Eigen-depth feature implicitly from RGB image when depth
information is not available. We find that combining the estimated depth
features with RGB-based appearance features can sometimes help to better reduce
visual ambiguities of appearance features caused by illumination and similar
clothes. The effectiveness of our models was validated on publicly available
depth pedestrian datasets as compared to related methods for person
re-identification.Comment: IEEE Transactions on Image Processing Early Acces
U-CATCH: Using Color ATtribute of image patCHes in binary descriptors
In this study, we propose a simple yet very effective method for extracting
color information through binary feature description framework. Our method
expands the dimension of binary comparisons into RGB and YCbCr spaces, showing
more than 100% matching improve ment compared to non-color binary descriptors
for a wide range of hard-to-match cases. The proposed method is general and can
be applied to any binary descriptor to make it color sensitive. It is faster
than classical binary descriptors for RGB sampling due to the abandonment of
grayscale conversion and has almost identical complexity (insignificant
compared to smoothing operation) for YCbCr sampling
Human activity recognition from mobile inertial sensors using recurrence plots
Inertial sensors are present in most mobile devices nowadays and such devices
are used by people during most of their daily activities. In this paper, we
present an approach for human activity recognition based on inertial sensors by
employing recurrence plots (RP) and visual descriptors. The pipeline of the
proposed approach is the following: compute RPs from sensor data, compute
visual features from RPs and use them in a machine learning protocol. As RPs
generate texture visual patterns, we transform the problem of sensor data
classification to a problem of texture classification. Experiments for
classifying human activities based on accelerometer data showed that the
proposed approach obtains the highest accuracies, outperforming time- and
frequency-domain features directly extracted from sensor data. The best results
are obtained when using RGB RPs, in which each RGB channel corresponds to the
RP of an independent accelerometer axis
Feature Fusion using Extended Jaccard Graph and Stochastic Gradient Descent for Robot
Robot vision is a fundamental device for human-robot interaction and robot
complex tasks. In this paper, we use Kinect and propose a feature graph fusion
(FGF) for robot recognition. Our feature fusion utilizes RGB and depth
information to construct fused feature from Kinect. FGF involves multi-Jaccard
similarity to compute a robust graph and utilize word embedding method to
enhance the recognition results. We also collect DUT RGB-D face dataset and a
benchmark datset to evaluate the effectiveness and efficiency of our method.
The experimental results illustrate FGF is robust and effective to face and
object datasets in robot applications.Comment: Assembly Automatio
Direct Visual Odometry using Bit-Planes
Feature descriptors, such as SIFT and ORB, are well-known for their
robustness to illumination changes, which has made them popular for
feature-based VSLAM\@. However, in degraded imaging conditions such as low
light, low texture, blur and specular reflections, feature extraction is often
unreliable. In contrast, direct VSLAM methods which estimate the camera pose by
minimizing the photometric error using raw pixel intensities are often more
robust to low textured environments and blur. Nonetheless, at the core of
direct VSLAM is the reliance on a consistent photometric appearance across
images, otherwise known as the brightness constancy assumption. Unfortunately,
brightness constancy seldom holds in real world applications.
In this work, we overcome brightness constancy by incorporating feature
descriptors into a direct visual odometry framework. This combination results
in an efficient algorithm that combines the strength of both feature-based
algorithms and direct methods. Namely, we achieve robustness to arbitrary
photometric variations while operating in low-textured and poorly lit
environments. Our approach utilizes an efficient binary descriptor, which we
call Bit-Planes, and show how it can be used in the gradient-based optimization
required by direct methods. Moreover, we show that the squared Euclidean
distance between Bit-Planes is equivalent to the Hamming distance. Hence, the
descriptor may be used in least squares optimization without sacrificing its
photometric invariance. Finally, we present empirical results that demonstrate
the robustness of the approach in poorly lit underground environments
3D Object Instance Recognition and Pose Estimation Using Triplet Loss with Dynamic Margin
In this paper, we address the problem of 3D object instance recognition and
pose estimation of localized objects in cluttered environments using
convolutional neural networks. Inspired by the descriptor learning approach of
Wohlhart et al., we propose a method that introduces the dynamic margin in the
manifold learning triplet loss function. Such a loss function is designed to
map images of different objects under different poses to a lower-dimensional,
similarity-preserving descriptor space on which efficient nearest neighbor
search algorithms can be applied. Introducing the dynamic margin allows for
faster training times and better accuracy of the resulting low-dimensional
manifolds. Furthermore, we contribute the following: adding in-plane rotations
(ignored by the baseline method) to the training, proposing new background
noise types that help to better mimic realistic scenarios and improve accuracy
with respect to clutter, adding surface normals as another powerful image
modality representing an object surface leading to better performance than
merely depth, and finally implementing an efficient online batch generation
that allows for better variability during the training phase. We perform an
exhaustive evaluation to demonstrate the effects of our contributions.
Additionally, we assess the performance of the algorithm on the large BigBIRD
dataset to demonstrate good scalability properties of the pipeline with respect
to the number of models
Learning to Align Images using Weak Geometric Supervision
Image alignment tasks require accurate pixel correspondences, which are
usually recovered by matching local feature descriptors. Such descriptors are
often derived using supervised learning on existing datasets with ground truth
correspondences. However, the cost of creating such datasets is usually
prohibitive. In this paper, we propose a new approach to align two images
related by an unknown 2D homography where the local descriptor is learned from
scratch from the images and the homography is estimated simultaneously. Our key
insight is that a siamese convolutional neural network can be trained jointly
while iteratively updating the homography parameters by optimizing a single
loss function. Our method is currently weakly supervised because the input
images need to be roughly aligned.
We have used this method to align images of different modalities such as RGB
and near-infra-red (NIR) without using any prior labeled data. Images
automatically aligned by our method were then used to train descriptors that
generalize to new images. We also evaluated our method on RGB images. On the
HPatches benchmark, our method achieves comparable accuracy to deep local
descriptors that were trained offline in a supervised setting.Comment: Accepted in 3DV 201
- …