Search CORE

841 research outputs found

Saliency Prediction in the Deep Learning Era: Successes, Limitations, and Future Challenges

Author: Borji Ali
Publication venue
Publication date: 24/05/2019
Field of study

Visual saliency models have enjoyed a big leap in performance in recent years, thanks to advances in deep learning and large scale annotated data. Despite enormous effort and huge breakthroughs, however, models still fall short in reaching human-level accuracy. In this work, I explore the landscape of the field emphasizing on new deep saliency models, benchmarks, and datasets. A large number of image and video saliency models are reviewed and compared over two image benchmarks and two large scale video datasets. Further, I identify factors that contribute to the gap between models and humans and discuss remaining issues that need to be addressed to build the next generation of more powerful saliency models. Some specific questions that are addressed include: in what ways current models fail, how to remedy them, what can be learned from cognitive studies of attention, how explicit saliency judgments relate to fixations, how to conduct fair model comparison, and what are the emerging applications of saliency models

arXiv.org e-Print Archive

Spatio-Temporal Saliency Networks for Dynamic Saliency Prediction

Author: Bak Cagdas
Erdem Aykut
Erdem Erkut
Kocak Aysun
Publication venue
Publication date: 15/11/2017
Field of study

Computational saliency models for still images have gained significant popularity in recent years. Saliency prediction from videos, on the other hand, has received relatively little interest from the community. Motivated by this, in this work, we study the use of deep learning for dynamic saliency prediction and propose the so-called spatio-temporal saliency networks. The key to our models is the architecture of two-stream networks where we investigate different fusion mechanisms to integrate spatial and temporal information. We evaluate our models on the DIEM and UCF-Sports datasets and present highly competitive results against the existing state-of-the-art models. We also carry out some experiments on a number of still images from the MIT300 dataset by exploiting the optical flow maps predicted from these images. Our results show that considering inherent motion information in this way can be helpful for static saliency estimation

arXiv.org e-Print Archive

Object-based visual attention for computer vision

Author: Fisher Robert
Sun Yaoru
Publication venue: Elsevier Science B.V.
Publication date: 31/05/2003
Field of study

AbstractIn this paper, a novel model of object-based visual attention extending Duncan's Integrated Competition Hypothesis [Phil. Trans. R. Soc. London B 353 (1998) 1307–1317] is presented. In contrast to the attention mechanisms used in most previous machine vision systems which drive attention based on the spatial location hypothesis, the mechanisms which direct visual attention in our system are object-driven as well as feature-driven. The competition to gain visual attention occurs not only within an object but also between objects. For this purpose, two new mechanisms in the proposed model are described and analyzed in detail. The first mechanism computes the visual salience of objects and groupings; the second one implements the hierarchical selectivity of attentional shifts. The results of the new approach on synthetic and natural images are reported

Semantic segmentation priors for object discovery

Author: Behnke Sven
Frintrop Simone
Husain Syed Farzad
Martín García Germán
Schulz Hannes
Torras Carme
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

© 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Reliable object discovery in realistic indoor scenes is a necessity for many computer vision and service robot applications. In these scenes, semantic segmentation methods have made huge advances in recent years. Such methods can provide useful prior information for object discovery by removing false positives and by delineating object boundaries. We propose a novel method that combines bottom-up object discovery and semantic priors for producing generic object candidates in RGB-D images. We use a deep learning method for semantic segmentation to classify colour and depth superpixels into meaningful categories. Separately for each category, we use saliency to estimate the location and scale of objects, and superpixels to find their precise boundaries. Finally, object candidates of all categories are combined and ranked. We evaluate our approach on the NYU Depth V2 dataset and show that we outperform other state-of-the-art object discovery methods in terms of recall.Peer ReviewedPostprint (author's final draft

Monocular visual scene analysis:saliency detection and 3D face reconstruction using GAN

Author: Cai Xiaoxu
Publication venue
Publication date: 01/01/2021
Field of study

A model of saliency-based auditory attention to environmental sound

Author: Botteldooren Dick
De Coensel Bert
Publication venue
Publication date: 01/01/2010
Field of study

Microcalcification Detection in Digitized Mammograms: A Neurobiologically-Inspired Approach

Author: David F. Ramirez-Moreno
Juan F. Ramirez-Villegas
Publication venue: 'IntechOpen'
Publication date: 11/01/2012
Field of study

Segmentation of Skin Lesions and their Attributes Using Multi-Scale Convolutional Neural Networks and Domain Specific Augmentations

Author: Gooya Ali
Jahanifar Mostafa
Koohbanani Navid Alemi
Rajpoot Nasir
Tajeddin Neda Zamani
Publication venue
Publication date: 29/03/2019
Field of study

Computer-aided diagnosis systems for classification of different type of skin lesions have been an active field of research in recent decades. It has been shown that introducing lesions and their attributes masks into lesion classification pipeline can greatly improve the performance. In this paper, we propose a framework by incorporating transfer learning for segmenting lesions and their attributes based on the convolutional neural networks. The proposed framework is based on the encoder-decoder architecture which utilizes a variety of pre-trained networks in the encoding path and generates the prediction map by combining multi-scale information in decoding path using a pyramid pooling manner. To address the lack of training data and increase the proposed model generalization, an extensive set of novel domain-specific augmentation routines have been applied to simulate the real variations in dermoscopy images. Finally, by performing broad experiments on three different data sets obtained from International Skin Imaging Collaboration archive (ISIC2016, ISIC2017, and ISIC2018 challenges data sets), we show that the proposed method outperforms other state-of-the-art approaches for ISIC2016 and ISIC2017 segmentation task and achieved the first rank on the leader-board of ISIC2018 attribute detection task.Comment: 18 page

arXiv.org e-Print Archive

Recommended from our members

Explainable and Advisable Learning for Self-driving Vehicles

Author: Kim Jinkyu
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Deep neural perception and control networks are likely to be a key component of self-driving vehicles. These models need to be explainable - they should provide easy-to-interpret rationales for their behavior - so that passengers, insurance companies, law enforcement, developers, etc., can understand what triggered a particular behavior. Explanations may be triggered by the neural controller, namely introspective explanations, or informed by the neural controller's output, namely rationalizations. Our work has focused on the challenge of generating introspective explanations of deep models for self-driving vehicles. In Chapter 3, we begin by exploring the use of visual explanations. These explanations take the form of real-time highlighted regions of an image that causally influence the network's output (steering control). In the first stage, we use a visual attention model to train a convolution network end-to-end from images to steering angle. The attention model highlights image regions that potentially influence the network's output. Some of these are true influences, but some are spurious. We then apply a causal filtering step to determine which input regions actually influence the output. This produces more succinct visual explanations and more accurately exposes the network's behavior. In Chapter 4, we add an attention-based video-to-text model to produce textual explanations of model actions, e.g. "the car slows down because the road is wet". The attention maps of controller and explanation model are aligned so that explanations are grounded in the parts of the scene that mattered to the controller. We explore two approaches to attention alignment, strong- and weak-alignment. These explainable systems represent an externalization of tacit knowledge. The network's opaque reasoning is simplified to a situation-specific dependence on a visible object in the image. This makes them brittle and potentially unsafe in situations that do not match training data. In Chapter 5, we propose to address this issue by augmenting training data with natural language advice from a human. Advice includes guidance about what to do and where to attend. We present the first step toward advice-giving, where we train an end-to-end vehicle controller that accepts advice. The controller adapts the way it attends to the scene (visual attention) and the control (steering and speed). Further, in Chapter 6, we propose a new approach that learns vehicle control with the help of long-term (global) human advice. Specifically, our system learns to summarize its visual observations in natural language, predict an appropriate action response (e.g. "I see a pedestrian crossing, so I stop"), and predict the controls, accordingly

eScholarship - University of California

SBNet: Sparse Blocks Network for Fast Inference

Author: Pokrovsky Andrei
Ren Mengye
Urtasun Raquel
Yang Bin
Publication venue
Publication date: 07/06/2018
Field of study

Conventional deep convolutional neural networks (CNNs) apply convolution operators uniformly in space across all feature maps for hundreds of layers - this incurs a high computational cost for real-time applications. For many problems such as object detection and semantic segmentation, we are able to obtain a low-cost computation mask, either from a priori problem knowledge, or from a low-resolution segmentation network. We show that such computation masks can be used to reduce computation in the high-resolution main network. Variants of sparse activation CNNs have previously been explored on small-scale tasks and showed no degradation in terms of object classification accuracy, but often measured gains in terms of theoretical FLOPs without realizing a practical speed-up when compared to highly optimized dense convolution implementations. In this work, we leverage the sparsity structure of computation masks and propose a novel tiling-based sparse convolution algorithm. We verified the effectiveness of our sparse CNN on LiDAR-based 3D object detection, and we report significant wall-clock speed-ups compared to dense convolution without noticeable loss of accuracy.Comment: 10 pages, CVPR 201

arXiv.org e-Print Archive