Search CORE

83 research outputs found

Simple vs complex temporal recurrences for video saliency prediction

Author: Giró-i-Nieto Xavier
Linardos Panagiotis
McGuinness Kevin
Mohedano Eva
Nieto Juan Jose
O'Connor Noel E.
Publication venue: 'British Machine Vision Association and Society for Pattern Recognition'
Publication date: 01/01/2019
Field of study

This paper investigates modifying an existing neural network architecture for static saliency prediction using two types of recurrences that integrate information from the temporal domain. The first modification is the addition of a ConvLSTM within the architecture, while the second is a conceptually simple exponential moving average of an internal convolutional state. We use weights pre-trained on the SALICON dataset and fine-tune our model on DHF1K. Our results show that both modifications achieve state-of-the-art results and produce similar saliency maps. Source code is available at https://git.io/fjPiB

arXiv.org e-Print Archive

UPCommons. Portal del coneixement obert de la UPC

Irish Universities

DCU Online Research Access Service

Unified Image and Video Saliency Modeling

Author: A Borji
A Rozantsev
C Bak
C Guo
HJ Seo
J Liu
L Itti
LVD Maaten
M Cornia
O Le Meur
Q Lai
S Marat
S Mathe
SSS Kruthiventi
V Leboran
V Mahadevan
W Wang
Y Fang
Y Sun
Z Bylinskii
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Visual saliency modeling for images and videos is treated as two independent tasks in recent computer vision literature. While image saliency modeling is a well-studied problem and progress on benchmarks like SALICON and MIT300 is slowing, video saliency models have shown rapid gains on the recent DHF1K benchmark. Here, we take a step back and ask: Can image and video saliency modeling be approached via a unified model, with mutual benefit? We identify different sources of domain shift between image and video saliency data and between different video saliency datasets as a key challenge for effective joint modelling. To address this we propose four novel domain adaptation techniques - Domain-Adaptive Priors, Domain-Adaptive Fusion, Domain-Adaptive Smoothing and Bypass-RNN - in addition to an improved formulation of learned Gaussian priors. We integrate these techniques into a simple and lightweight encoder-RNN-decoder-style network, UNISAL, and train it jointly with image and video saliency data. We evaluate our method on the video saliency datasets DHF1K, Hollywood-2 and UCF-Sports, and the image saliency datasets SALICON and MIT300. With one set of parameters, UNISAL achieves state-of-the-art performance on all video saliency datasets and is on par with the state-of-the-art for image saliency datasets, despite faster runtime and a 5 to 20-fold smaller model size compared to all competing deep methods. We provide retrospective analyses and ablation studies which confirm the importance of the domain shift modeling. The code is available at https://github.com/rdroste/unisalComment: Presented at the European Conference on Computer Vision (ECCV) 2020. R. Droste and J. Jiao contributed equally to this work. v3: Updated Fig. 5a) and added new MTI300 benchmark results to supp. materia

arXiv.org e-Print Archive

Crossref

University of Birmingham Research Portal

Oxford University Research Archive

Problems with Saliency Maps

Author: A Borji
A Clavelli
A Coutrot
A Furnari
A Torralba
B Tatler
B Tatler
B Tatler
BW Tatler
C Koch
F Cristino
G Boccignone
G Boccignone
G Boccignone
HE Egeth
HH Schütt
J Zhang
L Itti
M Cerf
M Kümmerer
NC Anderson
NC Anderson
ND Bruce
O Meur Le
O Meur Le
P Napoletano
TV Nguyen
V Cuculo
Z Bylinskii
Z Bylinskii
Z Bylinskii
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Despite the popularity that saliency models have gained in the computer vision community, they are most often conceived, exploited and benchmarked without taking heed of a number of problems and subtle issues they bring about. When saliency maps are used as proxies for the likelihood of fixating a location in a viewed scene, one such issue is the temporal dimension of visual attention deployment. Through a simple simulation it is shown how neglecting this dimension leads to results that at best cast shadows on the predictive performance of a model and its assessment via benchmarking procedures

Crossref

AIR Universita degli studi di Milano

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

WinDB: HMD-free and Distortion-free Panoptic Video Fixation Learning

Author: Chen Chenglizhao
Fan Deng-Ping
Hao Aimin
Qin Hong
Wang Guotao
Publication venue
Publication date: 28/05/2023
Field of study

To date, the widely-adopted way to perform fixation collection in panoptic video is based on a head-mounted display (HMD), where participants' fixations are collected while wearing an HMD to explore the given panoptic scene freely. However, this widely-used data collection method is insufficient for training deep models to accurately predict which regions in a given panoptic are most important when it contains intermittent salient events. The main reason is that there always exist "blind zooms" when using HMD to collect fixations since the participants cannot keep spinning their heads to explore the entire panoptic scene all the time. Consequently, the collected fixations tend to be trapped in some local views, leaving the remaining areas to be the "blind zooms". Therefore, fixation data collected using HMD-based methods that accumulate local views cannot accurately represent the overall global importance of complex panoramic scenes. This paper introduces the auxiliary Window with a Dynamic Blurring (WinDB) fixation collection approach for panoptic video, which doesn't need HMD and is blind-zoom-free. Thus, the collected fixations can well reflect the regional-wise importance degree. Using our WinDB approach, we have released a new PanopticVideo-300 dataset, containing 300 panoptic clips covering over 225 categories. Besides, we have presented a simple baseline design to take full advantage of PanopticVideo-300 to handle the blind-zoom-free attribute-induced fixation shifting problem

arXiv.org e-Print Archive

How to look next? A data-driven approach for scanpath prediction

Author: A Borji
A Oliva
A Torralba
B Tatler
B Zhou
BW Tatler
C Xia
F Cristino
G Boccignone
G Boccignone
G Boccignone
G Boccignone
HH Schütt
L Itti
LO Rothkegel
M Cerf
NC Anderson
NC Anderson
ND Bruce
O Le Meur
P Napoletano
PH Tseng
T-Y Lin
TV Nguyen
V Cuculo
VI Levenshtein
Z Bylinskii
Z Bylinskii
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

By and large, current visual attention models mostly rely, when considering static stimuli, on the following procedure. Given an image, a saliency map is computed, which, in turn, might serve the purpose of predicting a sequence of gaze shifts, namely a scanpath instantiating the dynamics of visual attention deployment. The temporal pattern of attention unfolding is thus confined to the scanpath generation stage, whilst salience is conceived as a static map, at best conflating a number of factors (bottom-up information, top-down, spatial biases, etc.). In this note we propose a novel sequential scheme that consists of a three-stage processing relying on a center-bias model, a context/layout model, and an object-based model, respectively. Each stage contributes, at different times, to the sequential sampling of the final scanpath. We compare the method against classic scanpath generation that exploits state-of-the-art static saliency model. Results show that accounting for the structure of the temporal unfolding leads to gaze dynamics close to human gaze behaviour

Crossref

AIR Universita degli studi di Milano

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia