2,529 research outputs found
SFD: Single Shot Scale-invariant Face Detector
This paper presents a real-time face detector, named Single Shot
Scale-invariant Face Detector (SFD), which performs superiorly on various
scales of faces with a single deep neural network, especially for small faces.
Specifically, we try to solve the common problem that anchor-based detectors
deteriorate dramatically as the objects become smaller. We make contributions
in the following three aspects: 1) proposing a scale-equitable face detection
framework to handle different scales of faces well. We tile anchors on a wide
range of layers to ensure that all scales of faces have enough features for
detection. Besides, we design anchor scales based on the effective receptive
field and a proposed equal proportion interval principle; 2) improving the
recall rate of small faces by a scale compensation anchor matching strategy; 3)
reducing the false positive rate of small faces via a max-out background label.
As a consequence, our method achieves state-of-the-art detection performance on
all the common face detection benchmarks, including the AFW, PASCAL face, FDDB
and WIDER FACE datasets, and can run at 36 FPS on a Nvidia Titan X (Pascal) for
VGA-resolution images.Comment: Accepted by ICCV 2017 + its supplementary materials; Updated the
latest results on WIDER FAC
Deep Learning Features at Scale for Visual Place Recognition
The success of deep learning techniques in the computer vision domain has
triggered a range of initial investigations into their utility for visual place
recognition, all using generic features from networks that were trained for
other types of recognition tasks. In this paper, we train, at large scale, two
CNN architectures for the specific place recognition task and employ a
multi-scale feature encoding method to generate condition- and
viewpoint-invariant features. To enable this training to occur, we have
developed a massive Specific PlacEs Dataset (SPED) with hundreds of examples of
place appearance change at thousands of different places, as opposed to the
semantic place type datasets currently available. This new dataset enables us
to set up a training regime that interprets place recognition as a
classification problem. We comprehensively evaluate our trained networks on
several challenging benchmark place recognition datasets and demonstrate that
they achieve an average 10% increase in performance over other place
recognition algorithms and pre-trained CNNs. By analyzing the network responses
and their differences from pre-trained networks, we provide insights into what
a network learns when training for place recognition, and what these results
signify for future research in this area.Comment: 8 pages, 10 figures. Accepted by International Conference on Robotics
and Automation (ICRA) 2017. This is the submitted version. The final
published version may be slightly differen
Neural Nearest Neighbors Networks
Non-local methods exploiting the self-similarity of natural signals have been
well studied, for example in image analysis and restoration. Existing
approaches, however, rely on k-nearest neighbors (KNN) matching in a fixed
feature space. The main hurdle in optimizing this feature space w.r.t.
application performance is the non-differentiability of the KNN selection rule.
To overcome this, we propose a continuous deterministic relaxation of KNN
selection that maintains differentiability w.r.t. pairwise distances, but
retains the original KNN as the limit of a temperature parameter approaching
zero. To exploit our relaxation, we propose the neural nearest neighbors block
(N3 block), a novel non-local processing layer that leverages the principle of
self-similarity and can be used as building block in modern neural network
architectures. We show its effectiveness for the set reasoning task of
correspondence classification as well as for image restoration, including image
denoising and single image super-resolution, where we outperform strong
convolutional neural network (CNN) baselines and recent non-local models that
rely on KNN selection in hand-chosen features spaces.Comment: to appear at NIPS*2018, code available at
https://github.com/visinf/n3net
LIFT: Learned Invariant Feature Transform
We introduce a novel Deep Network architecture that implements the full
feature point handling pipeline, that is, detection, orientation estimation,
and feature description. While previous works have successfully tackled each
one of these problems individually, we show how to learn to do all three in a
unified manner while preserving end-to-end differentiability. We then
demonstrate that our Deep pipeline outperforms state-of-the-art methods on a
number of benchmark datasets, without the need of retraining.Comment: Accepted to ECCV 2016 (spotlight
- …