29 research outputs found
What Catches the Eye? Visualizing and Understanding Deep Saliency Models
Deep convolutional neural networks have demonstrated high
performances for fixation prediction in recent years. How they achieve
this, however, is less explored and they remain to be black box models. Here, we attempt to shed light on the internal structure of deep
saliency models and study what features they extract for fixation prediction. Specifically, we use a simple yet powerful architecture, consisting of
only one CNN and a single resolution input, combined with a new loss
function for pixel-wise fixation prediction during free viewing of natural scenes. We show that our simple method is on par or better than
state-of-the-art complicated saliency models. Furthermore, we propose a
method, related to saliency model evaluation metrics, to visualize deep
models for fixation prediction. Our method reveals the inner representations of deep models for fixation prediction and provides evidence that
saliency, as experienced by humans, is likely to involve high-level semantic knowledge in addition to low-level perceptual cues. Our results can
be useful to measure the gap between current saliency models and the
human inter-observer model and to build new models to close this gap.Engineering and Physical Sciences Research Council (EPSRC
Multi-scale Interactive Network for Salient Object Detection
Deep-learning based salient object detection methods achieve great progress.
However, the variable scale and unknown category of salient objects are great
challenges all the time. These are closely related to the utilization of
multi-level and multi-scale features. In this paper, we propose the aggregate
interaction modules to integrate the features from adjacent levels, in which
less noise is introduced because of only using small up-/down-sampling rates.
To obtain more efficient multi-scale features from the integrated features, the
self-interaction modules are embedded in each decoder unit. Besides, the class
imbalance issue caused by the scale variation weakens the effect of the binary
cross entropy loss and results in the spatial inconsistency of the predictions.
Therefore, we exploit the consistency-enhanced loss to highlight the
fore-/back-ground difference and preserve the intra-class consistency.
Experimental results on five benchmark datasets demonstrate that the proposed
method without any post-processing performs favorably against 23
state-of-the-art approaches. The source code will be publicly available at
https://github.com/lartpang/MINet.Comment: Accepted by CVPR 202
The effect of downsampling-upsampling strategy on foreground detection algorithms
Publisher's Bespoke License
Versi贸n definitiva disponible en el DOI indicado.
Molina-Cabello, M. A., Garcia-Gonzalez, J., Luque-Baena, R. M., & L贸pez-Rubio, E. (2020). The effect of downsampling鈥搖psampling strategy on foreground detection algorithms. Artificial Intelligence Review, 53, 4935-4965.In video surveillance systems which incorporate stationary cameras, the first phase of movement object detection is crucial for the correct modelling of the behavior of these objects, as well as being the most complex in terms of execution time. There are many algorithms that provide a reliable and adequate segmentation mask, obtaining real-time ratios for reduced image sizes. However, due to the increased performance of camera hardware, the application of previous methods to sequences with higher resolutions (from 640x480 to 1920x1080) is not carried out in real time, compromising their use in real video surveillance systems. In this paper we propose a methodology to reduce the computational requirements of the algorithms, consisting of a reduction of the input frame and, subsequently, an interpolation of the segmentation mask of each method to recover the original frame size. In addition,
the viability of this meta-model is analyzed together with the different selected algorithms, evaluating the quality of the resulting segmentation and its gain in terms of computation time