12,141 research outputs found

    Pose Estimation using Local Structure-Specific Shape and Appearance Context

    Full text link
    We address the problem of estimating the alignment pose between two models using structure-specific local descriptors. Our descriptors are generated using a combination of 2D image data and 3D contextual shape data, resulting in a set of semi-local descriptors containing rich appearance and shape information for both edge and texture structures. This is achieved by defining feature space relations which describe the neighborhood of a descriptor. By quantitative evaluations, we show that our descriptors provide high discriminative power compared to state of the art approaches. In addition, we show how to utilize this for the estimation of the alignment pose between two point sets. We present experiments both in controlled and real-life scenarios to validate our approach

    CSIFT Based Locality-constrained Linear Coding for Image Classification

    Full text link
    In the past decade, SIFT descriptor has been witnessed as one of the most robust local invariant feature descriptors and widely used in various vision tasks. Most traditional image classification systems depend on the luminance-based SIFT descriptors, which only analyze the gray level variations of the images. Misclassification may happen since their color contents are ignored. In this article, we concentrate on improving the performance of existing image classification algorithms by adding color information. To achieve this purpose, different kinds of colored SIFT descriptors are introduced and implemented. Locality-constrained Linear Coding (LLC), a state-of-the-art sparse coding technology, is employed to construct the image classification system for the evaluation. The real experiments are carried out on several benchmarks. With the enhancements of color SIFT, the proposed image classification system obtains approximate 3% improvement of classification accuracy on the Caltech-101 dataset and approximate 4% improvement of classification accuracy on the Caltech-256 dataset.Comment: 9 pages, 5 figure

    3D Pose Estimation and 3D Model Retrieval for Objects in the Wild

    Full text link
    We propose a scalable, efficient and accurate approach to retrieve 3D models for objects in the wild. Our contribution is twofold. We first present a 3D pose estimation approach for object categories which significantly outperforms the state-of-the-art on Pascal3D+. Second, we use the estimated pose as a prior to retrieve 3D models which accurately represent the geometry of objects in RGB images. For this purpose, we render depth images from 3D models under our predicted pose and match learned image descriptors of RGB images against those of rendered depth images using a CNN-based multi-view metric learning approach. In this way, we are the first to report quantitative results for 3D model retrieval on Pascal3D+, where our method chooses the same models as human annotators for 50% of the validation images on average. In addition, we show that our method, which was trained purely on Pascal3D+, retrieves rich and accurate 3D models from ShapeNet given RGB images of objects in the wild.Comment: Accepted to Conference on Computer Vision and Pattern Recognition (CVPR) 201

    Robust Depth-based Person Re-identification

    Full text link
    Person re-identification (re-id) aims to match people across non-overlapping camera views. So far the RGB-based appearance is widely used in most existing works. However, when people appeared in extreme illumination or changed clothes, the RGB appearance-based re-id methods tended to fail. To overcome this problem, we propose to exploit depth information to provide more invariant body shape and skeleton information regardless of illumination and color change. More specifically, we exploit depth voxel covariance descriptor and further propose a locally rotation invariant depth shape descriptor called Eigen-depth feature to describe pedestrian body shape. We prove that the distance between any two covariance matrices on the Riemannian manifold is equivalent to the Euclidean distance between the corresponding Eigen-depth features. Furthermore, we propose a kernelized implicit feature transfer scheme to estimate Eigen-depth feature implicitly from RGB image when depth information is not available. We find that combining the estimated depth features with RGB-based appearance features can sometimes help to better reduce visual ambiguities of appearance features caused by illumination and similar clothes. The effectiveness of our models was validated on publicly available depth pedestrian datasets as compared to related methods for person re-identification.Comment: IEEE Transactions on Image Processing Early Acces

    U-CATCH: Using Color ATtribute of image patCHes in binary descriptors

    Full text link
    In this study, we propose a simple yet very effective method for extracting color information through binary feature description framework. Our method expands the dimension of binary comparisons into RGB and YCbCr spaces, showing more than 100% matching improve ment compared to non-color binary descriptors for a wide range of hard-to-match cases. The proposed method is general and can be applied to any binary descriptor to make it color sensitive. It is faster than classical binary descriptors for RGB sampling due to the abandonment of grayscale conversion and has almost identical complexity (insignificant compared to smoothing operation) for YCbCr sampling

    Human activity recognition from mobile inertial sensors using recurrence plots

    Full text link
    Inertial sensors are present in most mobile devices nowadays and such devices are used by people during most of their daily activities. In this paper, we present an approach for human activity recognition based on inertial sensors by employing recurrence plots (RP) and visual descriptors. The pipeline of the proposed approach is the following: compute RPs from sensor data, compute visual features from RPs and use them in a machine learning protocol. As RPs generate texture visual patterns, we transform the problem of sensor data classification to a problem of texture classification. Experiments for classifying human activities based on accelerometer data showed that the proposed approach obtains the highest accuracies, outperforming time- and frequency-domain features directly extracted from sensor data. The best results are obtained when using RGB RPs, in which each RGB channel corresponds to the RP of an independent accelerometer axis

    Feature Fusion using Extended Jaccard Graph and Stochastic Gradient Descent for Robot

    Full text link
    Robot vision is a fundamental device for human-robot interaction and robot complex tasks. In this paper, we use Kinect and propose a feature graph fusion (FGF) for robot recognition. Our feature fusion utilizes RGB and depth information to construct fused feature from Kinect. FGF involves multi-Jaccard similarity to compute a robust graph and utilize word embedding method to enhance the recognition results. We also collect DUT RGB-D face dataset and a benchmark datset to evaluate the effectiveness and efficiency of our method. The experimental results illustrate FGF is robust and effective to face and object datasets in robot applications.Comment: Assembly Automatio

    Direct Visual Odometry using Bit-Planes

    Full text link
    Feature descriptors, such as SIFT and ORB, are well-known for their robustness to illumination changes, which has made them popular for feature-based VSLAM\@. However, in degraded imaging conditions such as low light, low texture, blur and specular reflections, feature extraction is often unreliable. In contrast, direct VSLAM methods which estimate the camera pose by minimizing the photometric error using raw pixel intensities are often more robust to low textured environments and blur. Nonetheless, at the core of direct VSLAM is the reliance on a consistent photometric appearance across images, otherwise known as the brightness constancy assumption. Unfortunately, brightness constancy seldom holds in real world applications. In this work, we overcome brightness constancy by incorporating feature descriptors into a direct visual odometry framework. This combination results in an efficient algorithm that combines the strength of both feature-based algorithms and direct methods. Namely, we achieve robustness to arbitrary photometric variations while operating in low-textured and poorly lit environments. Our approach utilizes an efficient binary descriptor, which we call Bit-Planes, and show how it can be used in the gradient-based optimization required by direct methods. Moreover, we show that the squared Euclidean distance between Bit-Planes is equivalent to the Hamming distance. Hence, the descriptor may be used in least squares optimization without sacrificing its photometric invariance. Finally, we present empirical results that demonstrate the robustness of the approach in poorly lit underground environments

    3D Object Instance Recognition and Pose Estimation Using Triplet Loss with Dynamic Margin

    Full text link
    In this paper, we address the problem of 3D object instance recognition and pose estimation of localized objects in cluttered environments using convolutional neural networks. Inspired by the descriptor learning approach of Wohlhart et al., we propose a method that introduces the dynamic margin in the manifold learning triplet loss function. Such a loss function is designed to map images of different objects under different poses to a lower-dimensional, similarity-preserving descriptor space on which efficient nearest neighbor search algorithms can be applied. Introducing the dynamic margin allows for faster training times and better accuracy of the resulting low-dimensional manifolds. Furthermore, we contribute the following: adding in-plane rotations (ignored by the baseline method) to the training, proposing new background noise types that help to better mimic realistic scenarios and improve accuracy with respect to clutter, adding surface normals as another powerful image modality representing an object surface leading to better performance than merely depth, and finally implementing an efficient online batch generation that allows for better variability during the training phase. We perform an exhaustive evaluation to demonstrate the effects of our contributions. Additionally, we assess the performance of the algorithm on the large BigBIRD dataset to demonstrate good scalability properties of the pipeline with respect to the number of models

    Learning to Align Images using Weak Geometric Supervision

    Full text link
    Image alignment tasks require accurate pixel correspondences, which are usually recovered by matching local feature descriptors. Such descriptors are often derived using supervised learning on existing datasets with ground truth correspondences. However, the cost of creating such datasets is usually prohibitive. In this paper, we propose a new approach to align two images related by an unknown 2D homography where the local descriptor is learned from scratch from the images and the homography is estimated simultaneously. Our key insight is that a siamese convolutional neural network can be trained jointly while iteratively updating the homography parameters by optimizing a single loss function. Our method is currently weakly supervised because the input images need to be roughly aligned. We have used this method to align images of different modalities such as RGB and near-infra-red (NIR) without using any prior labeled data. Images automatically aligned by our method were then used to train descriptors that generalize to new images. We also evaluated our method on RGB images. On the HPatches benchmark, our method achieves comparable accuracy to deep local descriptors that were trained offline in a supervised setting.Comment: Accepted in 3DV 201
    corecore