61,318 research outputs found
Multicolumn Networks for Face Recognition
The objective of this work is set-based face recognition, i.e. to decide if
two sets of images of a face are of the same person or not. Conventionally, the
set-wise feature descriptor is computed as an average of the descriptors from
individual face images within the set. In this paper, we design a neural
network architecture that learns to aggregate based on both "visual" quality
(resolution, illumination), and "content" quality (relative importance for
discriminative classification). To this end, we propose a Multicolumn Network
(MN) that takes a set of images (the number in the set can vary) as input, and
learns to compute a fix-sized feature descriptor for the entire set. To
encourage high-quality representations, each individual input image is first
weighted by its "visual" quality, determined by a self-quality assessment
module, and followed by a dynamic recalibration based on "content" qualities
relative to the other images within the set. Both of these qualities are learnt
implicitly during training for set-wise classification. Comparing with the
previous state-of-the-art architectures trained with the same dataset
(VGGFace2), our Multicolumn Networks show an improvement of between 2-6% on the
IARPA IJB face recognition benchmarks, and exceed the state of the art for all
methods on these benchmarks.Comment: To appear in BMVC201
Aggregated Deep Local Features for Remote Sensing Image Retrieval
Remote Sensing Image Retrieval remains a challenging topic due to the special
nature of Remote Sensing Imagery. Such images contain various different
semantic objects, which clearly complicates the retrieval task. In this paper,
we present an image retrieval pipeline that uses attentive, local convolutional
features and aggregates them using the Vector of Locally Aggregated Descriptors
(VLAD) to produce a global descriptor. We study various system parameters such
as the multiplicative and additive attention mechanisms and descriptor
dimensionality. We propose a query expansion method that requires no external
inputs. Experiments demonstrate that even without training, the local
convolutional features and global representation outperform other systems.
After system tuning, we can achieve state-of-the-art or competitive results.
Furthermore, we observe that our query expansion method increases overall
system performance by about 3%, using only the top-three retrieved images.
Finally, we show how dimensionality reduction produces compact descriptors with
increased retrieval performance and fast retrieval computation times, e.g. 50%
faster than the current systems.Comment: Published in Remote Sensing. The first two authors have equal
contributio
An Unsupervised Autoregressive Model for Speech Representation Learning
This paper proposes a novel unsupervised autoregressive neural model for
learning generic speech representations. In contrast to other speech
representation learning methods that aim to remove noise or speaker
variabilities, ours is designed to preserve information for a wide range of
downstream tasks. In addition, the proposed model does not require any phonetic
or word boundary labels, allowing the model to benefit from large quantities of
unlabeled data. Speech representations learned by our model significantly
improve performance on both phone classification and speaker verification over
the surface features and other supervised and unsupervised approaches. Further
analysis shows that different levels of speech information are captured by our
model at different layers. In particular, the lower layers tend to be more
discriminative for speakers, while the upper layers provide more phonetic
content.Comment: Accepted to Interspeech 2019. Code available at:
https://github.com/iamyuanchung/Autoregressive-Predictive-Codin
View Independent Vehicle Make, Model and Color Recognition Using Convolutional Neural Network
This paper describes the details of Sighthound's fully automated vehicle
make, model and color recognition system. The backbone of our system is a deep
convolutional neural network that is not only computationally inexpensive, but
also provides state-of-the-art results on several competitive benchmarks.
Additionally, our deep network is trained on a large dataset of several million
images which are labeled through a semi-automated process. Finally we test our
system on several public datasets as well as our own internal test dataset. Our
results show that we outperform other methods on all benchmarks by significant
margins. Our model is available to developers through the Sighthound Cloud API
at https://www.sighthound.com/products/cloudComment: 7 Page
- …