Search CORE

6,253 research outputs found

Embedding based on function approximation for large scale image search

Author: Cheung Ngai-Man
Do Thanh-Toan
Publication venue
Publication date: 03/04/2017
Field of study

The objective of this paper is to design an embedding method that maps local features describing an image (e.g. SIFT) to a higher dimensional representation useful for the image retrieval problem. First, motivated by the relationship between the linear approximation of a nonlinear function in high dimensional space and the stateof-the-art feature representation used in image retrieval, i.e., VLAD, we propose a new approach for the approximation. The embedded vectors resulted by the function approximation process are then aggregated to form a single representation for image retrieval. Second, in order to make the proposed embedding method applicable to large scale problem, we further derive its fast version in which the embedded vectors can be efficiently computed, i.e., in the closed-form. We compare the proposed embedding methods with the state of the art in the context of image search under various settings: when the images are represented by medium length vectors, short vectors, or binary vectors. The experimental results show that the proposed embedding methods outperform existing the state of the art on the standard public image retrieval benchmarks.Comment: Accepted to TPAMI 2017. The implementation and precomputed features of the proposed F-FAemb are released at the following link: http://tinyurl.com/F-FAem

arXiv.org e-Print Archive

Adelaide Research & Scholarship

Supervised Hashing with End-to-End Binary Deep Neural Network

Author: Cheung Ngai-Man
Do Thanh-Toan
Tan Dang-Khoa Le
Publication venue
Publication date: 27/10/2018
Field of study

Image hashing is a popular technique applied to large scale content-based visual retrieval due to its compact and efficient binary codes. Our work proposes a new end-to-end deep network architecture for supervised hashing which directly learns binary codes from input images and maintains good properties over binary codes such as similarity preservation, independence, and balancing. Furthermore, we also propose a new learning scheme that can cope with the binary constrained loss function. The proposed algorithm not only is scalable for learning over large-scale datasets but also outperforms state-of-the-art supervised hashing methods, which are illustrated throughout extensive experiments from various image retrieval benchmarks.Comment: Accepted to IEEE ICIP 201

arXiv.org e-Print Archive

Crossref

Selective Deep Convolutional Features for Image Retrieval

Author: Cheung Ngai-Man
Do Thanh-Toan
Hoang Tuan
Tan Dang-Khoa Le
Publication venue
Publication date: 27/11/2017
Field of study

Convolutional Neural Network (CNN) is a very powerful approach to extract discriminative local descriptors for effective image search. Recent work adopts fine-tuned strategies to further improve the discriminative power of the descriptors. Taking a different approach, in this paper, we propose a novel framework to achieve competitive retrieval performance. Firstly, we propose various masking schemes, namely SIFT-mask, SUM-mask, and MAX-mask, to select a representative subset of local convolutional features and remove a large number of redundant features. We demonstrate that this can effectively address the burstiness issue and improve retrieval accuracy. Secondly, we propose to employ recent embedding and aggregating methods to further enhance feature discriminability. Extensive experiments demonstrate that our proposed framework achieves state-of-the-art retrieval accuracy.Comment: Accepted to ACM MM 201

arXiv.org e-Print Archive

Crossref

Volumetric 3D Point Cloud Attribute Compression: Learned polynomial bilateral filter for prediction

Author: Cheung Gene
Chou Philip A.
Do Tam Thuc
Publication venue
Publication date: 22/11/2023
Field of study

We extend a previous study on 3D point cloud attribute compression scheme that uses a volumetric approach: given a target volumetric attribute function

f : \mathbb{R}^3 \mapsto \mathbb{R}

, we quantize and encode parameters

\theta

that characterize

f

at the encoder, for reconstruction

f_{\hat{\theta}}(\mathbf(x))

at known 3D points

\mathbf(x)

at the decoder. Specifically, parameters

\theta

are quantized coefficients of B-spline basis vectors

\mathbf{\Phi}_l

(for order

p \geq 2

) that span the function space

\mathcal{F}_l^{(p)}

at a particular resolution

l

, which are coded from coarse to fine resolutions for scalability. In this work, we focus on the prediction of finer-grained coefficients given coarser-grained ones by learning parameters of a polynomial bilateral filter (PBF) from data. PBF is a pseudo-linear filter that is signal-dependent with a graph spectral interpretation common in the graph signal processing (GSP) field. We demonstrate PBF's predictive performance over a linear predictor inspired by MPEG standardization over a wide range of point cloud datasets

arXiv.org e-Print Archive

Learned Nonlinear Predictor for Critically Sampled 3D Point Cloud Attribute Compression

Author: Cheung Gene
Chou Philip A.
Do Tam Thuc
Publication venue
Publication date: 22/11/2023
Field of study

We study 3D point cloud attribute compression via a volumetric approach: assuming point cloud geometry is known at both encoder and decoder, parameters

\theta

of a continuous attribute function

f: \mathbb{R}^3 \mapsto \mathbb{R}

are quantized to

\hat{\theta}

and encoded, so that discrete samples

f_{\hat{\theta}}(\mathbf{x}_i)

can be recovered at known 3D points

\mathbf{x}_i \in \mathbb{R}^3

at the decoder. Specifically, we consider a nested sequences of function subspaces

\mathcal{F}^{(p)}_{l_0} \subseteq \cdots \subseteq \mathcal{F}^{(p)}_L

, where

\mathcal{F}_l^{(p)}

is a family of functions spanned by B-spline basis functions of order

p

f_l^*

is the projection of

f

\mathcal{F}_l^{(p)}

and encoded as low-pass coefficients

F_l^*

, and

g_l^*

is the residual function in orthogonal subspace

\mathcal{G}_l^{(p)}

(where

\mathcal{G}_l^{(p)} \oplus \mathcal{F}_l^{(p)} = \mathcal{F}_{l+1}^{(p)}

) and encoded as high-pass coefficients

G_l^*

. In this paper, to improve coding performance over [1], we study predicting

f_{l+1}^*

at level

l+1

given

f_l^*

at level

l

and encoding of

G_l^*

for the

p=1

case (RAHT(

1

)). For the prediction, we formalize RAHT(1) linear prediction in MPEG-PCC in a theoretical framework, and propose a new nonlinear predictor using a polynomial of bilateral filter. We derive equations to efficiently compute the critically sampled high-pass coefficients

G_l^*

amenable to encoding. We optimize parameters in our resulting feed-forward network on a large training set of point clouds by minimizing a rate-distortion Lagrangian. Experimental results show that our improved framework outperformed the MPEG G-PCC predictor by

11

12\%

in bit rate reduction

arXiv.org e-Print Archive

FAemb: a function approximation-based embedding method for image retrieval

Author: Cheung Ngai-Man
Do Thanh-Toan
Tran Quang D
Publication venue
Publication date: 01/01/2015
Field of study

University of Liverpool Repository

Crossref