4,921 research outputs found
Recommended from our members
Fast embedding for image classification & retrieval and its application to the hostel industry
This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonContent-based image classification and retrieval are the automatic processes of taking
an unseen image input and extracting its features representing the input image. Then,
for the classification task, this mathematically measured input is categorized according
to established criteria in the server and consequently shows the output as a result. On
the other hand, for the retrieval task, the extracted features of an unseen query image
are sent to the server to search for the most visually similar images to a given image
and retrieve these images as a result. Despite image features could be represented
by classical features, artificial intelligence-based features, Convolutional Neural
Networks (CNN) to be precise, have become powerful tools in the field. Nonetheless,
the high dimensional CNN features have been a challenge in particular for applications
on mobile or Internet of Things devices. Therefore, in this thesis, several fast
embeddings are explored and proposed to overcome the constraints of low memory,
bandwidth, and power. Furthermore, the first hostel image database is created with
three datasets, hostel image dataset containing 13,908 interior and exterior images of
hostels across the world, and Hostels-900 dataset and Hostels-2K dataset containing
972 images and 2,380 images, respectively, of 20 London hostel buildings. The results
demonstrate that the proposed fast embeddings such as the application of GHM-Rand
operator, GHM-Fix operator, and binary feature vectors are able to outperform or give
competitive results to those state-of-the-art methods with a lot less computational
resource. Additionally, the findings from a ten-year literature review of CBIR study in
the tourism industry could picturize the relevant research activities in the past decade
which are not only beneficial to the hostel industry or tourism sector but also to the
computer science and engineering research communities for the potential real-life
applications of the existing and developing technologies in the field
Hashing for Similarity Search: A Survey
Similarity search (nearest neighbor search) is a problem of pursuing the data
items whose distances to a query item are the smallest from a large database.
Various methods have been developed to address this problem, and recently a lot
of efforts have been devoted to approximate search. In this paper, we present a
survey on one of the main solutions, hashing, which has been widely studied
since the pioneering work locality sensitive hashing. We divide the hashing
algorithms two main categories: locality sensitive hashing, which designs hash
functions without exploring the data distribution and learning to hash, which
learns hash functions according the data distribution, and review them from
various aspects, including hash function design and distance measure and search
scheme in the hash coding space
Small Width, Low Distortions: Quantized Random Embeddings of Low-complexity Sets
Under which conditions and with which distortions can we preserve the
pairwise-distances of low-complexity vectors, e.g., for structured sets such as
the set of sparse vectors or the one of low-rank matrices, when these are
mapped in a finite set of vectors? This work addresses this general question
through the specific use of a quantized and dithered random linear mapping
which combines, in the following order, a sub-Gaussian random projection in
of vectors in , a random translation, or "dither",
of the projected vectors and a uniform scalar quantizer of resolution
applied componentwise. Thanks to this quantized mapping we are first
able to show that, with high probability, an embedding of a bounded set
in can be achieved when
distances in the quantized and in the original domains are measured with the
- and -norm, respectively, and provided the number of quantized
observations is large before the square of the "Gaussian mean width" of
. In this case, we show that the embedding is actually
"quasi-isometric" and only suffers of both multiplicative and additive
distortions whose magnitudes decrease as for general sets, and as
for structured set, when increases. Second, when one is only
interested in characterizing the maximal distance separating two elements of
mapped to the same quantized vector, i.e., the "consistency width"
of the mapping, we show that for a similar number of measurements and with high
probability this width decays as for general sets and as for
structured ones when increases. Finally, as an important aspect of our
work, we also establish how the non-Gaussianity of the mapping impacts the
class of vectors that can be embedded or whose consistency width provably
decays when increases.Comment: Keywords: quantization, restricted isometry property, compressed
sensing, dimensionality reduction. 31 pages, 1 figur
k-Nearest Neighbour Classifiers: 2nd Edition (with Python examples)
Perhaps the most straightforward classifier in the arsenal or machine
learning techniques is the Nearest Neighbour Classifier -- classification is
achieved by identifying the nearest neighbours to a query example and using
those neighbours to determine the class of the query. This approach to
classification is of particular importance because issues of poor run-time
performance is not such a problem these days with the computational power that
is available. This paper presents an overview of techniques for Nearest
Neighbour classification focusing on; mechanisms for assessing similarity
(distance), computational issues in identifying nearest neighbours and
mechanisms for reducing the dimension of the data.
This paper is the second edition of a paper previously published as a
technical report. Sections on similarity measures for time-series, retrieval
speed-up and intrinsic dimensionality have been added. An Appendix is included
providing access to Python code for the key methods.Comment: 22 pages, 15 figures: An updated edition of an older tutorial on kN
- …