Search CORE

4,921 research outputs found

Recommended from our members

Fast embedding for image classification & retrieval and its application to the hostel industry

Author: Ammatmanee Chanattra
Publication venue: Brunel University London
Publication date: 01/01/2022
Field of study

This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonContent-based image classification and retrieval are the automatic processes of taking an unseen image input and extracting its features representing the input image. Then, for the classification task, this mathematically measured input is categorized according to established criteria in the server and consequently shows the output as a result. On the other hand, for the retrieval task, the extracted features of an unseen query image are sent to the server to search for the most visually similar images to a given image and retrieve these images as a result. Despite image features could be represented by classical features, artificial intelligence-based features, Convolutional Neural Networks (CNN) to be precise, have become powerful tools in the field. Nonetheless, the high dimensional CNN features have been a challenge in particular for applications on mobile or Internet of Things devices. Therefore, in this thesis, several fast embeddings are explored and proposed to overcome the constraints of low memory, bandwidth, and power. Furthermore, the first hostel image database is created with three datasets, hostel image dataset containing 13,908 interior and exterior images of hostels across the world, and Hostels-900 dataset and Hostels-2K dataset containing 972 images and 2,380 images, respectively, of 20 London hostel buildings. The results demonstrate that the proposed fast embeddings such as the application of GHM-Rand operator, GHM-Fix operator, and binary feature vectors are able to outperform or give competitive results to those state-of-the-art methods with a lot less computational resource. Additionally, the findings from a ten-year literature review of CBIR study in the tourism industry could picturize the relevant research activities in the past decade which are not only beneficial to the hostel industry or tourism sector but also to the computer science and engineering research communities for the potential real-life applications of the existing and developing technologies in the field

Brunel University Research Archive

Hashing for Similarity Search: A Survey

Author: Ji Jianqiu
Shen Heng Tao
Song Jingkuan
Wang Jingdong
Publication venue
Publication date: 13/08/2014
Field of study

Similarity search (nearest neighbor search) is a problem of pursuing the data items whose distances to a query item are the smallest from a large database. Various methods have been developed to address this problem, and recently a lot of efforts have been devoted to approximate search. In this paper, we present a survey on one of the main solutions, hashing, which has been widely studied since the pioneering work locality sensitive hashing. We divide the hashing algorithms two main categories: locality sensitive hashing, which designs hash functions without exploring the data distribution and learning to hash, which learns hash functions according the data distribution, and review them from various aspects, including hash function design and distance measure and search scheme in the hash coding space

arXiv.org e-Print Archive

CiteSeerX

Small Width, Low Distortions: Quantized Random Embeddings of Low-complexity Sets

Author: Jacques Laurent
Publication venue
Publication date: 14/11/2016
Field of study

Under which conditions and with which distortions can we preserve the pairwise-distances of low-complexity vectors, e.g., for structured sets such as the set of sparse vectors or the one of low-rank matrices, when these are mapped in a finite set of vectors? This work addresses this general question through the specific use of a quantized and dithered random linear mapping which combines, in the following order, a sub-Gaussian random projection in

\mathbb R^M

of vectors in

\mathbb R^N

, a random translation, or "dither", of the projected vectors and a uniform scalar quantizer of resolution

\delta>0

applied componentwise. Thanks to this quantized mapping we are first able to show that, with high probability, an embedding of a bounded set

\mathcal K \subset \mathbb R^N

\delta \mathbb Z^M

can be achieved when distances in the quantized and in the original domains are measured with the

\ell_1

- and

\ell_2

-norm, respectively, and provided the number of quantized observations

M

is large before the square of the "Gaussian mean width" of

\mathcal K

. In this case, we show that the embedding is actually "quasi-isometric" and only suffers of both multiplicative and additive distortions whose magnitudes decrease as

M^{-1/5}

for general sets, and as

M^{-1/2}

for structured set, when

M

increases. Second, when one is only interested in characterizing the maximal distance separating two elements of

\mathcal K

mapped to the same quantized vector, i.e., the "consistency width" of the mapping, we show that for a similar number of measurements and with high probability this width decays as

M^{-1/4}

for general sets and as

1/M

for structured ones when

M

increases. Finally, as an important aspect of our work, we also establish how the non-Gaussianity of the mapping impacts the class of vectors that can be embedded or whose consistency width provably decays when

M

increases.Comment: Keywords: quantization, restricted isometry property, compressed sensing, dimensionality reduction. 31 pages, 1 figur

arXiv.org e-Print Archive

DIAL UCLouvain

k-Nearest Neighbour Classifiers: 2nd Edition (with Python examples)

Author: Cunningham Padraig
Delany Sarah Jane
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 29/04/2020
Field of study

Perhaps the most straightforward classifier in the arsenal or machine learning techniques is the Nearest Neighbour Classifier -- classification is achieved by identifying the nearest neighbours to a query example and using those neighbours to determine the class of the query. This approach to classification is of particular importance because issues of poor run-time performance is not such a problem these days with the computational power that is available. This paper presents an overview of techniques for Nearest Neighbour classification focusing on; mechanisms for assessing similarity (distance), computational issues in identifying nearest neighbours and mechanisms for reducing the dimension of the data. This paper is the second edition of a paper previously published as a technical report. Sections on similarity measures for time-series, retrieval speed-up and intrinsic dimensionality have been added. An Appendix is included providing access to Python code for the key methods.Comment: 22 pages, 15 figures: An updated edition of an older tutorial on kN

arXiv.org e-Print Archive

Arrow@TUDublin