1,749 research outputs found
From BoW to CNN: Two Decades of Texture Representation for Texture Classification
Texture is a fundamental characteristic of many types of images, and texture
representation is one of the essential and challenging problems in computer
vision and pattern recognition which has attracted extensive research
attention. Since 2000, texture representations based on Bag of Words (BoW) and
on Convolutional Neural Networks (CNNs) have been extensively studied with
impressive performance. Given this period of remarkable evolution, this paper
aims to present a comprehensive survey of advances in texture representation
over the last two decades. More than 200 major publications are cited in this
survey covering different aspects of the research, which includes (i) problem
description; (ii) recent advances in the broad categories of BoW-based,
CNN-based and attribute-based methods; and (iii) evaluation issues,
specifically benchmark datasets and state of the art results. In retrospect of
what has been achieved so far, the survey discusses open challenges and
directions for future research.Comment: Accepted by IJC
Coarse2Fine: Two-Layer Fusion For Image Retrieval
This paper addresses the problem of large-scale image retrieval. We propose a
two-layer fusion method which takes advantage of global and local cues and
ranks database images from coarse to fine (C2F). Departing from the previous
methods fusing multiple image descriptors simultaneously, C2F is featured by a
layered procedure composed by filtering and refining. In particular, C2F
consists of three components. 1) Distractor filtering. With holistic
representations, noise images are filtered out from the database, so the number
of candidate images to be used for comparison with the query can be greatly
reduced. 2) Adaptive weighting. For a certain query, the similarity of
candidate images can be estimated by holistic similarity scores in
complementary to the local ones. 3) Candidate refining. Accurate retrieval is
conducted via local features, combining the pre-computed adaptive weights.
Experiments are presented on two benchmarks, \emph{i.e.,} Holidays and Ukbench
datasets. We show that our method outperforms recent fusion methods in terms of
storage consumption and computation complexity, and that the accuracy is
competitive to the state-of-the-arts
SYQ: Learning Symmetric Quantization For Efficient Deep Neural Networks
Inference for state-of-the-art deep neural networks is computationally
expensive, making them difficult to deploy on constrained hardware
environments. An efficient way to reduce this complexity is to quantize the
weight parameters and/or activations during training by approximating their
distributions with a limited entry codebook. For very low-precisions, such as
binary or ternary networks with 1-8-bit activations, the information loss from
quantization leads to significant accuracy degradation due to large gradient
mismatches between the forward and backward functions. In this paper, we
introduce a quantization method to reduce this loss by learning a symmetric
codebook for particular weight subgroups. These subgroups are determined based
on their locality in the weight matrix, such that the hardware simplicity of
the low-precision representations is preserved. Empirically, we show that
symmetric quantization can substantially improve accuracy for networks with
extremely low-precision weights and activations. We also demonstrate that this
representation imposes minimal or no hardware implications to more
coarse-grained approaches. Source code is available at
https://www.github.com/julianfaraone/SYQ.Comment: Published as a conference paper at the 2018 IEEE Conference on
Computer Vision and Pattern Recognition (CVPR
Vector Quantization by Minimizing Kullback-Leibler Divergence
This paper proposes a new method for vector quantization by minimizing the
Kullback-Leibler Divergence between the class label distributions over the
quantization inputs, which are original vectors, and the output, which is the
quantization subsets of the vector set. In this way, the vector quantization
output can keep as much information of the class label as possible. An
objective function is constructed and we also developed an iterative algorithm
to minimize it. The new method is evaluated on bag-of-features based image
classification problem
Image Annotation using Multi-Layer Sparse Coding
Automatic annotation of images with descriptive words is a challenging
problem with vast applications in the areas of image search and retrieval. This
problem can be viewed as a label-assignment problem by a classifier dealing
with a very large set of labels, i.e., the vocabulary set. We propose a novel
annotation method that employs two layers of sparse coding and performs
coarse-to-fine labeling. Themes extracted from the training data are treated as
coarse labels. Each theme is a set of training images that share a common
subject in their visual and textual contents. Our system extracts coarse labels
for training and test images without requiring any prior knowledge. Vocabulary
words are the fine labels to be associated with images. Most of the annotation
methods achieve low recall due to the large number of available fine labels,
i.e., vocabulary words. These systems also tend to achieve high precision for
highly frequent words only while relatively rare words are more important for
search and retrieval purposes. Our system not only outperforms various
previously proposed annotation systems, but also achieves symmetric response in
terms of precision and recall. Our system scores and maintains high precision
for words with a wide range of frequencies. Such behavior is achieved by
intelligently reducing the number of available fine labels or words for each
image based on coarse labels assigned to it
Efficient Inferencing of Compressed Deep Neural Networks
Large number of weights in deep neural networks makes the models difficult to
be deployed in low memory environments such as, mobile phones, IOT edge devices
as well as "inferencing as a service" environments on cloud. Prior work has
considered reduction in the size of the models, through compression techniques
like pruning, quantization, Huffman encoding etc. However, efficient
inferencing using the compressed models has received little attention,
specially with the Huffman encoding in place. In this paper, we propose
efficient parallel algorithms for inferencing of single image and batches,
under various memory constraints. Our experimental results show that our
approach of using variable batch size for inferencing achieves 15-25\%
performance improvement in the inference throughput for AlexNet, while
maintaining memory and latency constraints
Recent Advance in Content-based Image Retrieval: A Literature Survey
The explosive increase and ubiquitous accessibility of visual data on the Web
have led to the prosperity of research activity in image search or retrieval.
With the ignorance of visual content as a ranking clue, methods with text
search techniques for visual retrieval may suffer inconsistency between the
text words and visual content. Content-based image retrieval (CBIR), which
makes use of the representation of visual content to identify relevant images,
has attracted sustained attention in recent two decades. Such a problem is
challenging due to the intention gap and the semantic gap problems. Numerous
techniques have been developed for content-based image retrieval in the last
decade. The purpose of this paper is to categorize and evaluate those
algorithms proposed during the period of 2003 to 2016. We conclude with several
promising directions for future research.Comment: 22 page
Non-Vacuous Generalization Bounds at the ImageNet Scale: A PAC-Bayesian Compression Approach
Modern neural networks are highly overparameterized, with capacity to
substantially overfit to training data. Nevertheless, these networks often
generalize well in practice. It has also been observed that trained networks
can often be "compressed" to much smaller representations. The purpose of this
paper is to connect these two empirical observations. Our main technical result
is a generalization bound for compressed networks based on the compressed size.
Combined with off-the-shelf compression algorithms, the bound leads to state of
the art generalization guarantees; in particular, we provide the first
non-vacuous generalization guarantees for realistic architectures applied to
the ImageNet classification problem. As additional evidence connecting
compression and generalization, we show that compressibility of models that
tend to overfit is limited: We establish an absolute limit on expected
compressibility as a function of expected generalization error, where the
expectations are over the random choice of training examples. The bounds are
complemented by empirical results that show an increase in overfitting implies
an increase in the number of bits required to describe a trained network.Comment: 16 pages, 1 figure. Accepted at ICLR 201
A Neural Network Architecture for Learning Word-Referent Associations in Multiple Contexts
This article proposes a biologically inspired neurocomputational architecture
which learns associations between words and referents in different contexts,
considering evidence collected from the literature of Psycholinguistics and
Neurolinguistics. The multi-layered architecture takes as input raw images of
objects (referents) and streams of word's phonemes (labels), builds an adequate
representation, recognizes the current context, and associates label with
referents incrementally, by employing a Self-Organizing Map which creates new
association nodes (prototypes) as required, adjusts the existing prototypes to
better represent the input stimuli and removes prototypes that become
obsolete/unused. The model takes into account the current context to retrieve
the correct meaning of words with multiple meanings. Simulations show that the
model can reach up to 78% of word-referent association accuracy in ambiguous
situations and approximates well the learning rates of humans as reported by
three different authors in five Cross-Situational Word Learning experiments,
also displaying similar learning patterns in the different learning conditions
A Survey on Methods and Theories of Quantized Neural Networks
Deep neural networks are the state-of-the-art methods for many real-world
tasks, such as computer vision, natural language processing and speech
recognition. For all its popularity, deep neural networks are also criticized
for consuming a lot of memory and draining battery life of devices during
training and inference. This makes it hard to deploy these models on mobile or
embedded devices which have tight resource constraints. Quantization is
recognized as one of the most effective approaches to satisfy the extreme
memory requirements that deep neural network models demand. Instead of adopting
32-bit floating point format to represent weights, quantized representations
store weights using more compact formats such as integers or even binary
numbers. Despite a possible degradation in predictive performance, quantization
provides a potential solution to greatly reduce the model size and the energy
consumption. In this survey, we give a thorough review of different aspects of
quantized neural networks. Current challenges and trends of quantized neural
networks are also discussed.Comment: 17 pages, 8 figure
- …