10,281 research outputs found
A Dilated Inception Network for Visual Saliency Prediction
Recently, with the advent of deep convolutional neural networks (DCNN), the
improvements in visual saliency prediction research are impressive. One
possible direction to approach the next improvement is to fully characterize
the multi-scale saliency-influential factors with a computationally-friendly
module in DCNN architectures. In this work, we proposed an end-to-end dilated
inception network (DINet) for visual saliency prediction. It captures
multi-scale contextual features effectively with very limited extra parameters.
Instead of utilizing parallel standard convolutions with different kernel sizes
as the existing inception module, our proposed dilated inception module (DIM)
uses parallel dilated convolutions with different dilation rates which can
significantly reduce the computation load while enriching the diversity of
receptive fields in feature maps. Moreover, the performance of our saliency
model is further improved by using a set of linear normalization-based
probability distribution distance metrics as loss functions. As such, we can
formulate saliency prediction as a probability distribution prediction task for
global saliency inference instead of a typical pixel-wise regression problem.
Experimental results on several challenging saliency benchmark datasets
demonstrate that our DINet with proposed loss functions can achieve
state-of-the-art performance with shorter inference time.Comment: Accepted by IEEE Transactions on Multimedia. The source codes are
available at https://github.com/ysyscool/DINe
Learning Fully Dense Neural Networks for Image Semantic Segmentation
Semantic segmentation is pixel-wise classification which retains critical
spatial information. The "feature map reuse" has been commonly adopted in CNN
based approaches to take advantage of feature maps in the early layers for the
later spatial reconstruction. Along this direction, we go a step further by
proposing a fully dense neural network with an encoder-decoder structure that
we abbreviate as FDNet. For each stage in the decoder module, feature maps of
all the previous blocks are adaptively aggregated to feed-forward as input. On
the one hand, it reconstructs the spatial boundaries accurately. On the other
hand, it learns more efficiently with the more efficient gradient
backpropagation. In addition, we propose the boundary-aware loss function to
focus more attention on the pixels near the boundary, which boosts the "hard
examples" labeling. We have demonstrated the best performance of the FDNet on
the two benchmark datasets: PASCAL VOC 2012, NYUDv2 over previous works when
not considering training on other datasets
- …