6,253 research outputs found

    Embedding based on function approximation for large scale image search

    Full text link
    The objective of this paper is to design an embedding method that maps local features describing an image (e.g. SIFT) to a higher dimensional representation useful for the image retrieval problem. First, motivated by the relationship between the linear approximation of a nonlinear function in high dimensional space and the stateof-the-art feature representation used in image retrieval, i.e., VLAD, we propose a new approach for the approximation. The embedded vectors resulted by the function approximation process are then aggregated to form a single representation for image retrieval. Second, in order to make the proposed embedding method applicable to large scale problem, we further derive its fast version in which the embedded vectors can be efficiently computed, i.e., in the closed-form. We compare the proposed embedding methods with the state of the art in the context of image search under various settings: when the images are represented by medium length vectors, short vectors, or binary vectors. The experimental results show that the proposed embedding methods outperform existing the state of the art on the standard public image retrieval benchmarks.Comment: Accepted to TPAMI 2017. The implementation and precomputed features of the proposed F-FAemb are released at the following link: http://tinyurl.com/F-FAem

    Supervised Hashing with End-to-End Binary Deep Neural Network

    Full text link
    Image hashing is a popular technique applied to large scale content-based visual retrieval due to its compact and efficient binary codes. Our work proposes a new end-to-end deep network architecture for supervised hashing which directly learns binary codes from input images and maintains good properties over binary codes such as similarity preservation, independence, and balancing. Furthermore, we also propose a new learning scheme that can cope with the binary constrained loss function. The proposed algorithm not only is scalable for learning over large-scale datasets but also outperforms state-of-the-art supervised hashing methods, which are illustrated throughout extensive experiments from various image retrieval benchmarks.Comment: Accepted to IEEE ICIP 201

    Selective Deep Convolutional Features for Image Retrieval

    Full text link
    Convolutional Neural Network (CNN) is a very powerful approach to extract discriminative local descriptors for effective image search. Recent work adopts fine-tuned strategies to further improve the discriminative power of the descriptors. Taking a different approach, in this paper, we propose a novel framework to achieve competitive retrieval performance. Firstly, we propose various masking schemes, namely SIFT-mask, SUM-mask, and MAX-mask, to select a representative subset of local convolutional features and remove a large number of redundant features. We demonstrate that this can effectively address the burstiness issue and improve retrieval accuracy. Secondly, we propose to employ recent embedding and aggregating methods to further enhance feature discriminability. Extensive experiments demonstrate that our proposed framework achieves state-of-the-art retrieval accuracy.Comment: Accepted to ACM MM 201

    Volumetric 3D Point Cloud Attribute Compression: Learned polynomial bilateral filter for prediction

    Full text link
    We extend a previous study on 3D point cloud attribute compression scheme that uses a volumetric approach: given a target volumetric attribute function f:R3↦Rf : \mathbb{R}^3 \mapsto \mathbb{R}, we quantize and encode parameters ΞΈ\theta that characterize ff at the encoder, for reconstruction fΞΈ^((x))f_{\hat{\theta}}(\mathbf(x)) at known 3D points (x)\mathbf(x) at the decoder. Specifically, parameters ΞΈ\theta are quantized coefficients of B-spline basis vectors Ξ¦l\mathbf{\Phi}_l (for order pβ‰₯2p \geq 2) that span the function space Fl(p)\mathcal{F}_l^{(p)} at a particular resolution ll, which are coded from coarse to fine resolutions for scalability. In this work, we focus on the prediction of finer-grained coefficients given coarser-grained ones by learning parameters of a polynomial bilateral filter (PBF) from data. PBF is a pseudo-linear filter that is signal-dependent with a graph spectral interpretation common in the graph signal processing (GSP) field. We demonstrate PBF's predictive performance over a linear predictor inspired by MPEG standardization over a wide range of point cloud datasets

    Learned Nonlinear Predictor for Critically Sampled 3D Point Cloud Attribute Compression

    Full text link
    We study 3D point cloud attribute compression via a volumetric approach: assuming point cloud geometry is known at both encoder and decoder, parameters ΞΈ\theta of a continuous attribute function f:R3↦Rf: \mathbb{R}^3 \mapsto \mathbb{R} are quantized to ΞΈ^\hat{\theta} and encoded, so that discrete samples fΞΈ^(xi)f_{\hat{\theta}}(\mathbf{x}_i) can be recovered at known 3D points xi∈R3\mathbf{x}_i \in \mathbb{R}^3 at the decoder. Specifically, we consider a nested sequences of function subspaces Fl0(p)βŠ†β‹―βŠ†FL(p)\mathcal{F}^{(p)}_{l_0} \subseteq \cdots \subseteq \mathcal{F}^{(p)}_L, where Fl(p)\mathcal{F}_l^{(p)} is a family of functions spanned by B-spline basis functions of order pp, flβˆ—f_l^* is the projection of ff on Fl(p)\mathcal{F}_l^{(p)} and encoded as low-pass coefficients Flβˆ—F_l^*, and glβˆ—g_l^* is the residual function in orthogonal subspace Gl(p)\mathcal{G}_l^{(p)} (where Gl(p)βŠ•Fl(p)=Fl+1(p)\mathcal{G}_l^{(p)} \oplus \mathcal{F}_l^{(p)} = \mathcal{F}_{l+1}^{(p)}) and encoded as high-pass coefficients Glβˆ—G_l^*. In this paper, to improve coding performance over [1], we study predicting fl+1βˆ—f_{l+1}^* at level l+1l+1 given flβˆ—f_l^* at level ll and encoding of Glβˆ—G_l^* for the p=1p=1 case (RAHT(11)). For the prediction, we formalize RAHT(1) linear prediction in MPEG-PCC in a theoretical framework, and propose a new nonlinear predictor using a polynomial of bilateral filter. We derive equations to efficiently compute the critically sampled high-pass coefficients Glβˆ—G_l^* amenable to encoding. We optimize parameters in our resulting feed-forward network on a large training set of point clouds by minimizing a rate-distortion Lagrangian. Experimental results show that our improved framework outperformed the MPEG G-PCC predictor by 1111 to 12%12\% in bit rate reduction
    • …
    corecore