2,248 research outputs found
Coping with Change: Learning Invariant and Minimum Sufficient Representations for Fine-Grained Visual Categorization
Fine-grained visual categorization (FGVC) is a challenging task due to
similar visual appearances between various species. Previous studies always
implicitly assume that the training and test data have the same underlying
distributions, and that features extracted by modern backbone architectures
remain discriminative and generalize well to unseen test data. However, we
empirically justify that these conditions are not always true on benchmark
datasets. To this end, we combine the merits of invariant risk minimization
(IRM) and information bottleneck (IB) principle to learn invariant and minimum
sufficient (IMS) representations for FGVC, such that the overall model can
always discover the most succinct and consistent fine-grained features. We
apply the matrix-based R{\'e}nyi's -order entropy to simplify and
stabilize the training of IB; we also design a ``soft" environment partition
scheme to make IRM applicable to FGVC task. To the best of our knowledge, we
are the first to address the problem of FGVC from a generalization perspective
and develop a new information-theoretic solution accordingly. Extensive
experiments demonstrate the consistent performance gain offered by our IMS.Comment: Manuscript accepted by CVIU, code is available at Githu
A survey of parallel algorithms for fractal image compression
This paper presents a short survey of the key research work that has been undertaken in the application of parallel algorithms for Fractal image compression. The interest in fractal image compression techniques stems from their ability to achieve high compression ratios whilst maintaining a very high quality in the reconstructed image. The main drawback of this compression method is the very high computational cost that is associated with the encoding phase. Consequently, there has been significant interest in exploiting parallel computing architectures in order to speed up this phase, whilst still maintaining the advantageous features of the approach. This paper presents a brief introduction to fractal image compression, including the iterated function system theory upon
which it is based, and then reviews the different techniques that have been, and can be, applied in order to parallelize the compression algorithm
Mathematical modeling for partial object detection.
From a computer vision point of view, the image is a scene consisting of objects of interest and a background represented by everything else in the image. The relations and interactions among these objects are the key factors for scene understanding. In this dissertation, a mathematical model is designed for the detection of partially occluded faces captured in unconstrained real life conditions. The proposed model novelty comes from explicitly considering certain objects that are common to occlude faces and embedding them in the face model. This enables the detection of faces in difficult settings and provides more information to subsequent analysis in addition to the bounding box of the face. In the proposed Selective Part Models (SPM), the face is modelled as a collection of parts that can be selected from the visible regular facial parts and some of the occluding objects which commonly interact with faces such as sunglasses, caps, hands, shoulders, and other faces. With the face detection being the first step in the face recognition pipeline, the proposed model does not only detect partially occluded faces efficiently but it also suggests the occluded parts to be excluded from the subsequent recognition step. The model was tested on several recent face detection databases and benchmarks and achieved state of the art performance. In addition, detailed analysis for the performance with respect to different types of occlusion were provided. Moreover, a new database was collected for evaluating face detectors focusing on the partial occlusion problem. This dissertation highlights the importance of explicitly handling the partial occlusion problem in face detection and shows its efficiency in enhancing both the face detection performance and the subsequent recognition performance of partially occluded faces. The broader impact of the proposed detector exceeds the common security applications by using it for human robot interaction. The humanoid robot Nao is used to help in teaching children with autism and the proposed detector is used to achieve natural interaction between the robot and the children by detecting their faces which can be used for recognition or more interestingly for adaptive interaction by analyzing their expressions
ULSAM: Ultra-Lightweight Subspace Attention Module for Compact Convolutional Neural Networks
The capability of the self-attention mechanism to model the long-range
dependencies has catapulted its deployment in vision models. Unlike convolution
operators, self-attention offers infinite receptive field and enables
compute-efficient modeling of global dependencies. However, the existing
state-of-the-art attention mechanisms incur high compute and/or parameter
overheads, and hence unfit for compact convolutional neural networks (CNNs). In
this work, we propose a simple yet effective "Ultra-Lightweight Subspace
Attention Mechanism" (ULSAM), which infers different attention maps for each
feature map subspace. We argue that leaning separate attention maps for each
feature subspace enables multi-scale and multi-frequency feature
representation, which is more desirable for fine-grained image classification.
Our method of subspace attention is orthogonal and complementary to the
existing state-of-the-arts attention mechanisms used in vision models. ULSAM is
end-to-end trainable and can be deployed as a plug-and-play module in the
pre-existing compact CNNs. Notably, our work is the first attempt that uses a
subspace attention mechanism to increase the efficiency of compact CNNs. To
show the efficacy of ULSAM, we perform experiments with MobileNet-V1 and
MobileNet-V2 as backbone architectures on ImageNet-1K and three fine-grained
image classification datasets. We achieve 13% and 25%
reduction in both the FLOPs and parameter counts of MobileNet-V2 with a 0.27%
and more than 1% improvement in top-1 accuracy on the ImageNet-1K and
fine-grained image classification datasets (respectively). Code and trained
models are available at https://github.com/Nandan91/ULSAM.Comment: Accepted as a conference paper in 2020 IEEE Winter Conference on
Applications of Computer Vision (WACV
Recent Advances of Local Mechanisms in Computer Vision: A Survey and Outlook of Recent Work
Inspired by the fact that human brains can emphasize discriminative parts of
the input and suppress irrelevant ones, substantial local mechanisms have been
designed to boost the development of computer vision. They can not only focus
on target parts to learn discriminative local representations, but also process
information selectively to improve the efficiency. In terms of application
scenarios and paradigms, local mechanisms have different characteristics. In
this survey, we provide a systematic review of local mechanisms for various
computer vision tasks and approaches, including fine-grained visual
recognition, person re-identification, few-/zero-shot learning, multi-modal
learning, self-supervised learning, Vision Transformers, and so on.
Categorization of local mechanisms in each field is summarized. Then,
advantages and disadvantages for every category are analyzed deeply, leaving
room for exploration. Finally, future research directions about local
mechanisms have also been discussed that may benefit future works. To the best
our knowledge, this is the first survey about local mechanisms on computer
vision. We hope that this survey can shed light on future research in the
computer vision field
DeepKSPD: Learning Kernel-matrix-based SPD Representation for Fine-grained Image Recognition
Being symmetric positive-definite (SPD), covariance matrix has traditionally
been used to represent a set of local descriptors in visual recognition. Recent
study shows that kernel matrix can give considerably better representation by
modelling the nonlinearity in the local descriptor set. Nevertheless, neither
the descriptors nor the kernel matrix is deeply learned. Worse, they are
considered separately, hindering the pursuit of an optimal SPD representation.
This work proposes a deep network that jointly learns local descriptors,
kernel-matrix-based SPD representation, and the classifier via an end-to-end
training process. We derive the derivatives for the mapping from a local
descriptor set to the SPD representation to carry out backpropagation. Also, we
exploit the Daleckii-Krein formula in operator theory to give a concise and
unified result on differentiating SPD matrix functions, including the matrix
logarithm to handle the Riemannian geometry of kernel matrix. Experiments not
only show the superiority of kernel-matrix-based SPD representation with deep
local descriptors, but also verify the advantage of the proposed deep network
in pursuing better SPD representations for fine-grained image recognition
tasks
- …