3,377 research outputs found

    Deep Shape Matching

    Full text link
    We cast shape matching as metric learning with convolutional networks. We break the end-to-end process of image representation into two parts. Firstly, well established efficient methods are chosen to turn the images into edge maps. Secondly, the network is trained with edge maps of landmark images, which are automatically obtained by a structure-from-motion pipeline. The learned representation is evaluated on a range of different tasks, providing improvements on challenging cases of domain generalization, generic sketch-based image retrieval or its fine-grained counterpart. In contrast to other methods that learn a different model per task, object category, or domain, we use the same network throughout all our experiments, achieving state-of-the-art results in multiple benchmarks.Comment: ECCV 201

    Invariant properties of a locally salient dither pattern with a spatial-chromatic histogram

    Full text link
    Compacted Dither Pattern Code (CDPC) is a recently found feature which is successful in irregular shapes based visual depiction. Locally salient dither pattern feature is an attempt to expand the capability of CDPC for both regular and irregular shape based visual depiction. This paper presents an analysis of rotational and scale invariance property of locally salient dither pattern feature with a two dimensional spatialchromatic histogram, which expands the applicability of the visual feature. Experiments were conducted to exhibit rotational and scale invariance of the feature. These experiments were conducted by combining linear Support Vector Machine (SVM) classifier to the new feature. The experimental results revealed that the locally salient dither pattern feature with the spatialchromatic histogram is rotationally and scale invariant

    Deep Learning Representation using Autoencoder for 3D Shape Retrieval

    Full text link
    We study the problem of how to build a deep learning representation for 3D shape. Deep learning has shown to be very effective in variety of visual applications, such as image classification and object detection. However, it has not been successfully applied to 3D shape recognition. This is because 3D shape has complex structure in 3D space and there are limited number of 3D shapes for feature learning. To address these problems, we project 3D shapes into 2D space and use autoencoder for feature learning on the 2D images. High accuracy 3D shape retrieval performance is obtained by aggregating the features learned on 2D images. In addition, we show the proposed deep learning feature is complementary to conventional local image descriptors. By combing the global deep learning representation and the local descriptor representation, our method can obtain the state-of-the-art performance on 3D shape retrieval benchmarks.Comment: 6 pages, 7 figures, 2014ICSPA

    Recent Advance in Content-based Image Retrieval: A Literature Survey

    Full text link
    The explosive increase and ubiquitous accessibility of visual data on the Web have led to the prosperity of research activity in image search or retrieval. With the ignorance of visual content as a ranking clue, methods with text search techniques for visual retrieval may suffer inconsistency between the text words and visual content. Content-based image retrieval (CBIR), which makes use of the representation of visual content to identify relevant images, has attracted sustained attention in recent two decades. Such a problem is challenging due to the intention gap and the semantic gap problems. Numerous techniques have been developed for content-based image retrieval in the last decade. The purpose of this paper is to categorize and evaluate those algorithms proposed during the period of 2003 to 2016. We conclude with several promising directions for future research.Comment: 22 page

    Content-Based Image Retrieval Based on Late Fusion of Binary and Local Descriptors

    Full text link
    One of the challenges in Content-Based Image Retrieval (CBIR) is to reduce the semantic gaps between low-level features and high-level semantic concepts. In CBIR, the images are represented in the feature space and the performance of CBIR depends on the type of selected feature representation. Late fusion also known as visual words integration is applied to enhance the performance of image retrieval. The recent advances in image retrieval diverted the focus of research towards the use of binary descriptors as they are reported computationally efficient. In this paper, we aim to investigate the late fusion of Fast Retina Keypoint (FREAK) and Scale Invariant Feature Transform (SIFT). The late fusion of binary and local descriptor is selected because among binary descriptors, FREAK has shown good results in classification-based problems while SIFT is robust to translation, scaling, rotation and small distortions. The late fusion of FREAK and SIFT integrates the performance of both feature descriptors for an effective image retrieval. Experimental results and comparisons show that the proposed late fusion enhances the performances of image retrieval

    Diffusion framework for geometric and photometric data fusion in non-rigid shape analysis

    Full text link
    In this paper, we explore the use of the diffusion geometry framework for the fusion of geometric and photometric information in local and global shape descriptors. Our construction is based on the definition of a diffusion process on the shape manifold embedded into a high-dimensional space where the embedding coordinates represent the photometric information. Experimental results show that such data fusion is useful in coping with different challenges of shape analysis where pure geometric and pure photometric methods fail

    Learning Local Shape Descriptors from Part Correspondences With Multi-view Convolutional Networks

    Full text link
    We present a new local descriptor for 3D shapes, directly applicable to a wide range of shape analysis problems such as point correspondences, semantic segmentation, affordance prediction, and shape-to-scan matching. The descriptor is produced by a convolutional network that is trained to embed geometrically and semantically similar points close to one another in descriptor space. The network processes surface neighborhoods around points on a shape that are captured at multiple scales by a succession of progressively zoomed out views, taken from carefully selected camera positions. We leverage two extremely large sources of data to train our network. First, since our network processes rendered views in the form of 2D images, we repurpose architectures pre-trained on massive image datasets. Second, we automatically generate a synthetic dense point correspondence dataset by non-rigid alignment of corresponding shape parts in a large collection of segmented 3D models. As a result of these design choices, our network effectively encodes multi-scale local context and fine-grained surface detail. Our network can be trained to produce either category-specific descriptors or more generic descriptors by learning from multiple shape categories. Once trained, at test time, the network extracts local descriptors for shapes without requiring any part segmentation as input. Our method can produce effective local descriptors even for shapes whose category is unknown or different from the ones used while training. We demonstrate through several experiments that our learned local descriptors are more discriminative compared to state of the art alternatives, and are effective in a variety of shape analysis applications

    GIFT: A Real-time and Scalable 3D Shape Search Engine

    Full text link
    Projective analysis is an important solution for 3D shape retrieval, since human visual perceptions of 3D shapes rely on various 2D observations from different view points. Although multiple informative and discriminative views are utilized, most projection-based retrieval systems suffer from heavy computational cost, thus cannot satisfy the basic requirement of scalability for search engines. In this paper, we present a real-time 3D shape search engine based on the projective images of 3D shapes. The real-time property of our search engine results from the following aspects: (1) efficient projection and view feature extraction using GPU acceleration; (2) the first inverted file, referred as F-IF, is utilized to speed up the procedure of multi-view matching; (3) the second inverted file (S-IF), which captures a local distribution of 3D shapes in the feature manifold, is adopted for efficient context-based re-ranking. As a result, for each query the retrieval task can be finished within one second despite the necessary cost of IO overhead. We name the proposed 3D shape search engine, which combines GPU acceleration and Inverted File Twice, as GIFT. Besides its high efficiency, GIFT also outperforms the state-of-the-art methods significantly in retrieval accuracy on various shape benchmarks and competitions.Comment: accepted by CVPR16, achieved the first place in Shrec2016 competition: Large-Scale 3D Shape Retrieval under the perturbed cas

    Efficient Multimedia Similarity Measurement Using Similar Elements

    Full text link
    Online social networking techniques and large-scale multimedia systems are developing rapidly, which not only has brought great convenience to our daily life, but generated, collected, and stored large-scale multimedia data. This trend has put forward higher requirements and greater challenges on massive multimedia data retrieval. In this paper, we investigate the problem of image similarity measurement which is used to lots of applications. At first we propose the definition of similarity measurement of images and the related notions. Based on it we present a novel basic method of similarity measurement named SMIN. To improve the performance of calculation, we propose a novel indexing structure called SMI Temp Index (SMII for short). Besides, we establish an index of potential similar visual words off-line to solve to problem that the index cannot be reused. Experimental evaluations on two real image datasets demonstrate that our solution outperforms state-of-the-art method.Comment: 17 pages. arXiv admin note: text overlap with arXiv:1808.0961

    Discriminative Map Retrieval Using View-Dependent Map Descriptor

    Full text link
    Map retrieval, the problem of similarity search over a large collection of 2D pointset maps previously built by mobile robots, is crucial for autonomous navigation in indoor and outdoor environments. Bag-of-words (BoW) methods constitute a popular approach to map retrieval; however, these methods have extremely limited descriptive ability because they ignore the spatial layout information of the local features. The main contribution of this paper is an extension of the bag-of-words map retrieval method to enable the use of spatial information from local features. Our strategy is to explicitly model a unique viewpoint of an input local map; the pose of the local feature is defined with respect to this unique viewpoint, and can be viewed as an additional invariant feature for discriminative map retrieval. Specifically, we wish to determine a unique viewpoint that is invariant to moving objects, clutter, occlusions, and actual viewpoints. Hence, we perform scene parsing to analyze the scene structure, and consider the "center" of the scene structure to be the unique viewpoint. Our scene parsing is based on a Manhattan world grammar that imposes a quasi-Manhattan world constraint to enable the robust detection of a scene structure that is invariant to clutter and moving objects. Experimental results using the publicly available radish dataset validate the efficacy of the proposed approach.Comment: Technical Report, 8 pages, 9 figure
    corecore