178,379 research outputs found

    Simultaneous Feature Learning and Hash Coding with Deep Neural Networks

    Full text link
    Similarity-preserving hashing is a widely-used method for nearest neighbour search in large-scale image retrieval tasks. For most existing hashing methods, an image is first encoded as a vector of hand-engineering visual features, followed by another separate projection or quantization step that generates binary codes. However, such visual feature vectors may not be optimally compatible with the coding process, thus producing sub-optimal hashing codes. In this paper, we propose a deep architecture for supervised hashing, in which images are mapped into binary codes via carefully designed deep neural networks. The pipeline of the proposed deep architecture consists of three building blocks: 1) a sub-network with a stack of convolution layers to produce the effective intermediate image features; 2) a divide-and-encode module to divide the intermediate image features into multiple branches, each encoded into one hash bit; and 3) a triplet ranking loss designed to characterize that one image is more similar to the second image than to the third one. Extensive evaluations on several benchmark image datasets show that the proposed simultaneous feature learning and hash coding pipeline brings substantial improvements over other state-of-the-art supervised or unsupervised hashing methods.Comment: This paper has been accepted to IEEE International Conference on Pattern Recognition and Computer Vision (CVPR), 201

    Feature fusion, feature selection and local n-ary patterns for object recognition and image classification

    Full text link
    University of Technology Sydney. Faculty of Engineering and Information Technology.Object recognition is one of the most fundamental topics in computer vision. During past years, it has been the interest for both academies working in computer science and professionals working in the information technology (IT) industry. The popularity of object recognition has been proven by its motivation of sophisticated theories in science and wide spread applications in the industry. Nowadays, with more powerful machine learning tools (both hardware and software) and the huge amount of information (data) readily available, higher expectations are imposed on object recognition. At its early stage in the 1990s, the task of object recognition can be as simple as to differentiate between object of interest and non-object of interest from a single still image. Currently, the task of object recognition may as well includes the segmentation and labeling of different image regions (i.e., to assign each segmented image region a meaningful label based on objects appear in those regions), and then using computer programs to infer the scene of the overall image based on those segmented regions. The original two-class classification problem is now getting more complex as it now evolves toward a multi-class classification problem. In this thesis, contributions on object recognition are made in two aspects. These are, improvements using feature fusion and improvements using feature selection. Three examples are given in this thesis to illustrate three different feature fusion methods, the descriptor concatenation (the low-level fusion), the confidence value escalation (the mid-level fusion) and the coarse-to-fine framework (the high-level fusion). Two examples are provided for feature selection to demonstrate its ideas, those are, optimal descriptor selection and improved classifier selection. Feature extraction plays a key role in object recognition because it is the first and also the most important step. If we consider the overall object recognition process, machine learning tools are to serve the purpose of finding distinctive features from the visual data. Given distinctive features, object recognition is readily available (e.g., a simple threshold function can be used to classify feature descriptors). The proposal of Local N-ary Pattern (LNP) texture features contributes to both feature extraction and texture classification. The distinctive LNP feature generalizes the texture feature extraction process and improves texture classification. Concretely, the local binary pattern (LBP) is the special case of LNP with n = 2 and the texture spectrum is the special case of LNP with n = 3. The proposed LNP representation has been proven to outperform the popular LBP and one of the LBP’s most successful extension - local ternary pattern (LTP) for texture classification

    Image Reconstruction from Bag-of-Visual-Words

    Full text link
    The objective of this work is to reconstruct an original image from Bag-of-Visual-Words (BoVW). Image reconstruction from features can be a means of identifying the characteristics of features. Additionally, it enables us to generate novel images via features. Although BoVW is the de facto standard feature for image recognition and retrieval, successful image reconstruction from BoVW has not been reported yet. What complicates this task is that BoVW lacks the spatial information for including visual words. As described in this paper, to estimate an original arrangement, we propose an evaluation function that incorporates the naturalness of local adjacency and the global position, with a method to obtain related parameters using an external image database. To evaluate the performance of our method, we reconstruct images of objects of 101 kinds. Additionally, we apply our method to analyze object classifiers and to generate novel images via BoVW

    Learning Adaptive Discriminative Correlation Filters via Temporal Consistency Preserving Spatial Feature Selection for Robust Visual Tracking

    Get PDF
    With efficient appearance learning models, Discriminative Correlation Filter (DCF) has been proven to be very successful in recent video object tracking benchmarks and competitions. However, the existing DCF paradigm suffers from two major issues, i.e., spatial boundary effect and temporal filter degradation. To mitigate these challenges, we propose a new DCF-based tracking method. The key innovations of the proposed method include adaptive spatial feature selection and temporal consistent constraints, with which the new tracker enables joint spatial-temporal filter learning in a lower dimensional discriminative manifold. More specifically, we apply structured spatial sparsity constraints to multi-channel filers. Consequently, the process of learning spatial filters can be approximated by the lasso regularisation. To encourage temporal consistency, the filter model is restricted to lie around its historical value and updated locally to preserve the global structure in the manifold. Last, a unified optimisation framework is proposed to jointly select temporal consistency preserving spatial features and learn discriminative filters with the augmented Lagrangian method. Qualitative and quantitative evaluations have been conducted on a number of well-known benchmarking datasets such as OTB2013, OTB50, OTB100, Temple-Colour, UAV123 and VOT2018. The experimental results demonstrate the superiority of the proposed method over the state-of-the-art approaches
    • …
    corecore