17,358 research outputs found
Detail-Preserving Pooling in Deep Networks
Most convolutional neural networks use some method for gradually downscaling
the size of the hidden layers. This is commonly referred to as pooling, and is
applied to reduce the number of parameters, improve invariance to certain
distortions, and increase the receptive field size. Since pooling by nature is
a lossy process, it is crucial that each such layer maintains the portion of
the activations that is most important for the network's discriminability. Yet,
simple maximization or averaging over blocks, max or average pooling, or plain
downsampling in the form of strided convolutions are the standard. In this
paper, we aim to leverage recent results on image downscaling for the purposes
of deep learning. Inspired by the human visual system, which focuses on local
spatial changes, we propose detail-preserving pooling (DPP), an adaptive
pooling method that magnifies spatial changes and preserves important
structural detail. Importantly, its parameters can be learned jointly with the
rest of the network. We analyze some of its theoretical properties and show its
empirical benefits on several datasets and networks, where DPP consistently
outperforms previous pooling approaches.Comment: To appear at CVPR 201
Exploring Different Dimensions of Attention for Uncertainty Detection
Neural networks with attention have proven effective for many natural
language processing tasks. In this paper, we develop attention mechanisms for
uncertainty detection. In particular, we generalize standardly used attention
mechanisms by introducing external attention and sequence-preserving attention.
These novel architectures differ from standard approaches in that they use
external resources to compute attention weights and preserve sequence
information. We compare them to other configurations along different dimensions
of attention. Our novel architectures set the new state of the art on a
Wikipedia benchmark dataset and perform similar to the state-of-the-art model
on a biomedical benchmark which uses a large set of linguistic features.Comment: accepted at EACL 201
Learned Perceptual Image Enhancement
Learning a typical image enhancement pipeline involves minimization of a loss
function between enhanced and reference images. While L1 and L2 losses are
perhaps the most widely used functions for this purpose, they do not
necessarily lead to perceptually compelling results. In this paper, we show
that adding a learned no-reference image quality metric to the loss can
significantly improve enhancement operators. This metric is implemented using a
CNN (convolutional neural network) trained on a large-scale dataset labelled
with aesthetic preferences of human raters. This loss allows us to conveniently
perform back-propagation in our learning framework to simultaneously optimize
for similarity to a given ground truth reference and perceptual quality. This
perceptual loss is only used to train parameters of image processing operators,
and does not impose any extra complexity at inference time. Our experiments
demonstrate that this loss can be effective for tuning a variety of operators
such as local tone mapping and dehazing
- …