Search CORE

3,766 research outputs found

Spatiotemporal Knowledge Distillation for Efficient Estimation of Aerial Video Saliency

Author: Fu Kui
Ge Shiming
Li Jia
Zhao Shengwei
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 09/04/2019
Field of study

The performance of video saliency estimation techniques has achieved significant advances along with the rapid development of Convolutional Neural Networks (CNNs). However, devices like cameras and drones may have limited computational capability and storage space so that the direct deployment of complex deep saliency models becomes infeasible. To address this problem, this paper proposes a dynamic saliency estimation approach for aerial videos via spatiotemporal knowledge distillation. In this approach, five components are involved, including two teachers, two students and the desired spatiotemporal model. The knowledge of spatial and temporal saliency is first separately transferred from the two complex and redundant teachers to their simple and compact students, and the input scenes are also degraded from high-resolution to low-resolution to remove the probable data redundancy so as to greatly speed up the feature extraction process. After that, the desired spatiotemporal model is further trained by distilling and encoding the spatial and temporal saliency knowledge of two students into a unified network. In this manner, the inter-model redundancy can be further removed for the effective estimation of dynamic saliency on aerial videos. Experimental results show that the proposed approach outperforms ten state-of-the-art models in estimating visual saliency on aerial videos, while its speed reaches up to 28,738 FPS on the GPU platform

arXiv.org e-Print Archive

Bottom-up Attention, Models of

Author: Borji Ali
Bylinskii Zoya
Tavakoli Hamed R.
Publication venue
Publication date: 24/04/2019
Field of study

In this review, we examine the recent progress in saliency prediction and proposed several avenues for future research. In spite of tremendous efforts and huge progress, there is still room for improvement in terms finer-grained analysis of deep saliency models, evaluation measures, datasets, annotation methods, cognitive studies, and new applications. This chapter will appear in Encyclopedia of Computational Neuroscience.Comment: arXiv admin note: substantial text overlap with arXiv:1810.0371

arXiv.org e-Print Archive

Saliency Prediction in the Deep Learning Era: Successes, Limitations, and Future Challenges

Author: Borji Ali
Publication venue
Publication date: 24/05/2019
Field of study

Visual saliency models have enjoyed a big leap in performance in recent years, thanks to advances in deep learning and large scale annotated data. Despite enormous effort and huge breakthroughs, however, models still fall short in reaching human-level accuracy. In this work, I explore the landscape of the field emphasizing on new deep saliency models, benchmarks, and datasets. A large number of image and video saliency models are reviewed and compared over two image benchmarks and two large scale video datasets. Further, I identify factors that contribute to the gap between models and humans and discuss remaining issues that need to be addressed to build the next generation of more powerful saliency models. Some specific questions that are addressed include: in what ways current models fail, how to remedy them, what can be learned from cognitive studies of attention, how explicit saliency judgments relate to fixations, how to conduct fair model comparison, and what are the emerging applications of saliency models

arXiv.org e-Print Archive

Salient Object Detection in the Deep Learning Era: An In-Depth Survey

Author: Fu Huazhu
Lai Qiuxia
Ling Haibin
Shen Jianbing
Wang Wenguan
Yang Ruigang
Publication venue
Publication date: 08/01/2021
Field of study

As an essential problem in computer vision, salient object detection (SOD) has attracted an increasing amount of research attention over the years. Recent advances in SOD are predominantly led by deep learning-based solutions (named deep SOD). To enable in-depth understanding of deep SOD, in this paper, we provide a comprehensive survey covering various aspects, ranging from algorithm taxonomy to unsolved issues. In particular, we first review deep SOD algorithms from different perspectives, including network architecture, level of supervision, learning paradigm, and object-/instance-level detection. Following that, we summarize and analyze existing SOD datasets and evaluation metrics. Then, we benchmark a large group of representative SOD models, and provide detailed analyses of the comparison results. Moreover, we study the performance of SOD algorithms under different attribute settings, which has not been thoroughly explored previously, by constructing a novel SOD dataset with rich attribute annotations covering various salient object types, challenging factors, and scene categories. We further analyze, for the first time in the field, the robustness of SOD models to random input perturbations and adversarial attacks. We also look into the generalization and difficulty of existing SOD datasets. Finally, we discuss several open issues of SOD and outline future research directions.Comment: Published on IEEE TPAMI. All the saliency prediction maps, our constructed dataset with annotations, and codes for evaluation are publicly available at \url{https://github.com/wenguanwang/SODsurvey

arXiv.org e-Print Archive

SG-FCN: A Motion and Memory-Based Deep Learning Model for Video Saliency Detection

Author: Hu QinGhua
Jiang Jianmin
Sun Meijun
Wang Zheng
Zhou Ziqi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 21/09/2018
Field of study

Data-driven saliency detection has attracted strong interest as a result of applying convolutional neural networks to the detection of eye fixations. Although a number of imagebased salient object and fixation detection models have been proposed, video fixation detection still requires more exploration. Different from image analysis, motion and temporal information is a crucial factor affecting human attention when viewing video sequences. Although existing models based on local contrast and low-level features have been extensively researched, they failed to simultaneously consider interframe motion and temporal information across neighboring video frames, leading to unsatisfactory performance when handling complex scenes. To this end, we propose a novel and efficient video eye fixation detection model to improve the saliency detection performance. By simulating the memory mechanism and visual attention mechanism of human beings when watching a video, we propose a step-gained fully convolutional network by combining the memory information on the time axis with the motion information on the space axis while storing the saliency information of the current frame. The model is obtained through hierarchical training, which ensures the accuracy of the detection. Extensive experiments in comparison with 11 state-of-the-art methods are carried out, and the results show that our proposed model outperforms all 11 methods across a number of publicly available datasets

arXiv.org e-Print Archive

Deep Visual Attention Prediction

Author: Shen Jianbing
Wang Wenguan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 22/03/2018
Field of study

In this work, we aim to predict human eye fixation with view-free scenes based on an end-to-end deep learning architecture. Although Convolutional Neural Networks (CNNs) have made substantial improvement on human attention prediction, it is still needed to improve CNN based attention models by efficiently leveraging multi-scale features. Our visual attention network is proposed to capture hierarchical saliency information from deep, coarse layers with global saliency information to shallow, fine layers with local saliency response. Our model is based on a skip-layer network structure, which predicts human attention from multiple convolutional layers with various reception fields. Final saliency prediction is achieved via the cooperation of those global and local predictions. Our model is learned in a deep supervision manner, where supervision is directly fed into multi-level layers, instead of previous approaches of providing supervision only at the output layer and propagating this supervision back to earlier layers. Our model thus incorporates multi-level saliency predictions within a single network, which significantly decreases the redundancy of previous approaches of learning multiple network streams with different input scales. Extensive experimental analysis on various challenging benchmark datasets demonstrate our method yields state-of-the-art performance with competitive inference time.Comment: W. Wang and J. Shen. Deep visual attention prediction. IEEE TIP, 27(5):2368-2378,2018. Code and results can be found in https://github.com/wenguanwang/deepattentio

arXiv.org e-Print Archive

cvpaper.challenge in 2016: Futuristic Computer Vision through 1,600 Papers Survey

Author: Abe Kaori
Fuchida Masataka
He Yun
Kanehara Yoshihiro
Kanezaki Asako
Kataoka Hirokatsu
Maruyama Shinya
Matsuzaki Yuta
Miyashita Yudai
Morita Shin'ichiro
Okayasu Kazushige
Shirakabe Soma
Suzuki Teppei
Takasawa Ryosuke
Ueta Shunya
Yabe Toshiyuki
Yatsuyanagi Hiroya
Publication venue
Publication date: 20/07/2017
Field of study

The paper gives futuristic challenges disscussed in the cvpaper.challenge. In 2015 and 2016, we thoroughly study 1,600+ papers in several conferences/journals such as CVPR/ICCV/ECCV/NIPS/PAMI/IJCV

arXiv.org e-Print Archive

Recurrent Mixture Density Network for Spatiotemporal Visual Attention

Author: Bazzani Loris
Larochelle Hugo
Torresani Lorenzo
Publication venue
Publication date: 11/02/2017
Field of study

In many computer vision tasks, the relevant information to solve the problem at hand is mixed to irrelevant, distracting information. This has motivated researchers to design attentional models that can dynamically focus on parts of images or videos that are salient, e.g., by down-weighting irrelevant pixels. In this work, we propose a spatiotemporal attentional model that learns where to look in a video directly from human fixation data. We model visual attention with a mixture of Gaussians at each frame. This distribution is used to express the probability of saliency for each pixel. Time consistency in videos is modeled hierarchically by: 1) deep 3D convolutional features to represent spatial and short-term time relations and 2) a long short-term memory network on top that aggregates the clip-level representation of sequential clips and therefore expands the temporal domain from few frames to seconds. The parameters of the proposed model are optimized via maximum likelihood estimation using human fixations as training data, without knowledge of the action in each video. Our experiments on Hollywood2 show state-of-the-art performance on saliency prediction for video. We also show that our attentional model trained on Hollywood2 generalizes well to UCF101 and it can be leveraged to improve action classification accuracy on both datasets.Comment: ICLR 201

arXiv.org e-Print Archive

A Review of Co-saliency Detection Technique: Fundamentals, Applications, and Challenges

Author: Borji Ali
Fu Huazhu
Han Junwei
Li Xuelong
Zhang Dingwen
Publication venue
Publication date: 09/07/2017
Field of study

Co-saliency detection is a newly emerging and rapidly growing research area in computer vision community. As a novel branch of visual saliency, co-saliency detection refers to the discovery of common and salient foregrounds from two or more relevant images, and can be widely used in many computer vision tasks. The existing co-saliency detection algorithms mainly consist of three components: extracting effective features to represent the image regions, exploring the informative cues or factors to characterize co-saliency, and designing effective computational frameworks to formulate co-saliency. Although numerous methods have been developed, the literature is still lacking a deep review and evaluation of co-saliency detection techniques. In this paper, we aim at providing a comprehensive review of the fundamentals, challenges, and applications of co-saliency detection. Specifically, we provide an overview of some related computer vision works, review the history of co-saliency detection, summarize and categorize the major algorithms in this research area, discuss some open issues in this area, present the potential applications of co-saliency detection, and finally point out some unsolved challenges and promising future works. We expect this review to be beneficial to both fresh and senior researchers in this field, and give insights to researchers in other related areas regarding the utility of co-saliency detection algorithms.Comment: 28 pages, 12 figures, 3 table

arXiv.org e-Print Archive

cvpaper.challenge in 2015 - A review of CVPR2015 and DeepSurvey

Author: Abe Kaori
Hoshino Hironori
Imanari Takaaki
Kataoka Hirokatsu
Kato Ryo
Kobayashi Naomichi
Miyashita Yudai
Morita Shinichiro
Nakamura Akio
Sato Shin'ichi
Shirakabe Soma
Yamabe Tomoaki
Publication venue
Publication date: 26/05/2016
Field of study

The "cvpaper.challenge" is a group composed of members from AIST, Tokyo Denki Univ. (TDU), and Univ. of Tsukuba that aims to systematically summarize papers on computer vision, pattern recognition, and related fields. For this particular review, we focused on reading the ALL 602 conference papers presented at the CVPR2015, the premier annual computer vision event held in June 2015, in order to grasp the trends in the field. Further, we are proposing "DeepSurvey" as a mechanism embodying the entire process from the reading through all the papers, the generation of ideas, and to the writing of paper.Comment: Survey Pape

arXiv.org e-Print Archive