9,921 research outputs found
Brain-mediated Transfer Learning of Convolutional Neural Networks
The human brain can effectively learn a new task from a small number of
samples, which indicate that the brain can transfer its prior knowledge to
solve tasks in different domains. This function is analogous to transfer
learning (TL) in the field of machine learning. TL uses a well-trained feature
space in a specific task domain to improve performance in new tasks with
insufficient training data. TL with rich feature representations, such as
features of convolutional neural networks (CNNs), shows high generalization
ability across different task domains. However, such TL is still insufficient
in making machine learning attain generalization ability comparable to that of
the human brain. To examine if the internal representation of the brain could
be used to achieve more efficient TL, we introduce a method for TL mediated by
human brains. Our method transforms feature representations of audiovisual
inputs in CNNs into those in activation patterns of individual brains via their
association learned ahead using measured brain responses. Then, to estimate
labels reflecting human cognition and behavior induced by the audiovisual
inputs, the transformed representations are used for TL. We demonstrate that
our brain-mediated TL (BTL) shows higher performance in the label estimation
than the standard TL. In addition, we illustrate that the estimations mediated
by different brains vary from brain to brain, and the variability reflects the
individual variability in perception. Thus, our BTL provides a framework to
improve the generalization ability of machine-learning feature representations
and enable machine learning to estimate human-like cognition and behavior,
including individual variability
Cross-convolutional-layer Pooling for Image Recognition
Recent studies have shown that a Deep Convolutional Neural Network (DCNN)
pretrained on a large image dataset can be used as a universal image
descriptor, and that doing so leads to impressive performance for a variety of
image classification tasks. Most of these studies adopt activations from a
single DCNN layer, usually the fully-connected layer, as the image
representation. In this paper, we proposed a novel way to extract image
representations from two consecutive convolutional layers: one layer is
utilized for local feature extraction and the other serves as guidance to pool
the extracted features. By taking different viewpoints of convolutional layers,
we further develop two schemes to realize this idea. The first one directly
uses convolutional layers from a DCNN. The second one applies the pretrained
CNN on densely sampled image regions and treats the fully-connected activations
of each image region as convolutional feature activations. We then train
another convolutional layer on top of that as the pooling-guidance
convolutional layer. By applying our method to three popular visual
classification tasks, we find our first scheme tends to perform better on the
applications which need strong discrimination on subtle object patterns within
small regions while the latter excels in the cases that require discrimination
on category-level patterns. Overall, the proposed method achieves superior
performance over existing ways of extracting image representations from a DCNN.Comment: Fixed typos. Journal extension of arXiv:1411.7466. Accepted to IEEE
Transactions on Pattern Analysis and Machine Intelligenc
Understanding deep features with computer-generated imagery
We introduce an approach for analyzing the variation of features generated by
convolutional neural networks (CNNs) with respect to scene factors that occur
in natural images. Such factors may include object style, 3D viewpoint, color,
and scene lighting configuration. Our approach analyzes CNN feature responses
corresponding to different scene factors by controlling for them via rendering
using a large database of 3D CAD models. The rendered images are presented to a
trained CNN and responses for different layers are studied with respect to the
input scene factors. We perform a decomposition of the responses based on
knowledge of the input scene factors and analyze the resulting components. In
particular, we quantify their relative importance in the CNN responses and
visualize them using principal component analysis. We show qualitative and
quantitative results of our study on three CNNs trained on large image
datasets: AlexNet, Places, and Oxford VGG. We observe important differences
across the networks and CNN layers for different scene factors and object
categories. Finally, we demonstrate that our analysis based on
computer-generated imagery translates to the network representation of natural
images
Deep filter banks for texture recognition, description, and segmentation
Visual textures have played a key role in image understanding because they
convey important semantics of images, and because texture representations that
pool local image descriptors in an orderless manner have had a tremendous
impact in diverse applications. In this paper we make several contributions to
texture understanding. First, instead of focusing on texture instance and
material category recognition, we propose a human-interpretable vocabulary of
texture attributes to describe common texture patterns, complemented by a new
describable texture dataset for benchmarking. Second, we look at the problem of
recognizing materials and texture attributes in realistic imaging conditions,
including when textures appear in clutter, developing corresponding benchmarks
on top of the recently proposed OpenSurfaces dataset. Third, we revisit classic
texture representations, including bag-of-visual-words and the Fisher vectors,
in the context of deep learning and show that these have excellent efficiency
and generalization properties if the convolutional layers of a deep model are
used as filter banks. We obtain in this manner state-of-the-art performance in
numerous datasets well beyond textures, an efficient method to apply deep
features to image regions, as well as benefit in transferring features from one
domain to another.Comment: 29 pages; 13 figures; 8 table
Aggregated Deep Local Features for Remote Sensing Image Retrieval
Remote Sensing Image Retrieval remains a challenging topic due to the special
nature of Remote Sensing Imagery. Such images contain various different
semantic objects, which clearly complicates the retrieval task. In this paper,
we present an image retrieval pipeline that uses attentive, local convolutional
features and aggregates them using the Vector of Locally Aggregated Descriptors
(VLAD) to produce a global descriptor. We study various system parameters such
as the multiplicative and additive attention mechanisms and descriptor
dimensionality. We propose a query expansion method that requires no external
inputs. Experiments demonstrate that even without training, the local
convolutional features and global representation outperform other systems.
After system tuning, we can achieve state-of-the-art or competitive results.
Furthermore, we observe that our query expansion method increases overall
system performance by about 3%, using only the top-three retrieved images.
Finally, we show how dimensionality reduction produces compact descriptors with
increased retrieval performance and fast retrieval computation times, e.g. 50%
faster than the current systems.Comment: Published in Remote Sensing. The first two authors have equal
contributio
- …