564 research outputs found
Salient Object Detection Techniques in Computer Vision-A Survey.
Detection and localization of regions of images that attract immediate human visual attention is currently an intensive area of research in computer vision. The capability of automatic identification and segmentation of such salient image regions has immediate consequences for applications in the field of computer vision, computer graphics, and multimedia. A large number of salient object detection (SOD) methods have been devised to effectively mimic the capability of the human visual system to detect the salient regions in images. These methods can be broadly categorized into two categories based on their feature engineering mechanism: conventional or deep learning-based. In this survey, most of the influential advances in image-based SOD from both conventional as well as deep learning-based categories have been reviewed in detail. Relevant saliency modeling trends with key issues, core techniques, and the scope for future research work have been discussed in the context of difficulties often faced in salient object detection. Results are presented for various challenging cases for some large-scale public datasets. Different metrics considered for assessment of the performance of state-of-the-art salient object detection models are also covered. Some future directions for SOD are presented towards end
OSA-HCIM: On-The-Fly Saliency-Aware Hybrid SRAM CIM with Dynamic Precision Configuration
Computing-in-Memory (CIM) has shown great potential for enhancing efficiency
and performance for deep neural networks (DNNs). However, the lack of
flexibility in CIM leads to an unnecessary expenditure of computational
resources on less critical operations, and a diminished Signal-to-Noise Ratio
(SNR) when handling more complex tasks, significantly hindering the overall
performance. Hence, we focus on the integration of CIM with Saliency-Aware
Computing -- a paradigm that dynamically tailors computing precision based on
the importance of each input. We propose On-the-fly Saliency-Aware Hybrid CIM
(OSA-HCIM) offering three primary contributions: (1) On-the-fly Saliency-Aware
(OSA) precision configuration scheme, which dynamically sets the precision of
each MAC operation based on its saliency, (2) Hybrid CIM Array (HCIMA), which
enables simultaneous operation of digital-domain CIM (DCIM) and analog-domain
CIM (ACIM) via split-port 6T SRAM, and (3) an integrated framework combining
OSA and HCIMA to fulfill diverse accuracy and power demands.
Implemented on a 65nm CMOS process, OSA-HCIM demonstrates an exceptional
balance between accuracy and resource utilization. Notably, it is the first CIM
design to incorporate a dynamic digital-to-analog boundary, providing
unprecedented flexibility for saliency-aware computing. OSA-HCIM achieves a
1.95x enhancement in energy efficiency, while maintaining minimal accuracy loss
compared to DCIM when tested on CIFAR100 dataset
Computational principles for an autonomous active vision system
Vision research has uncovered computational principles that generalize across species and brain area. However, these biological mechanisms are not frequently implemented in computer vision algorithms. In this thesis, models suitable for application in computer vision were developed to address the benefits of two biologically-inspired computational principles: multi-scale sampling and active, space-variant, vision.
The first model investigated the role of multi-scale sampling in motion integration. It is known that receptive fields of different spatial and temporal scales exist in the visual cortex; however, models addressing how this basic principle is exploited by species are sparse and do not adequately explain the data. The developed model showed that the solution to a classical problem in motion integration, the aperture problem, can be reframed as an emergent property of multi-scale sampling facilitated by fast, parallel, bi-directional connections at different spatial resolutions.
Humans and most other mammals actively move their eyes to sample a scene (active vision); moreover, the resolution of detail in this sampling process is not uniform across spatial locations (space-variant). It is known that these eye-movements are not simply guided by image saliency, but are also influenced by factors such as spatial attention, scene layout, and task-relevance. However, it is seldom questioned how previous eye movements shape how one learns and recognizes an object in a continuously-learning system. To explore this question, a model (CogEye) was developed that integrates active, space-variant sampling with eye-movement selection (the where visual stream), and object recognition (the what visual stream). The model hypothesizes that a signal from the recognition system helps the where stream select fixation locations that best disambiguate object identity between competing alternatives.
The third study used eye-tracking coupled with an object disambiguation psychophysics experiment to validate the second model, CogEye. While humans outperformed the model in recognition accuracy, when the model used information from the recognition pathway to help select future fixations, it was more similar to human eye movement patterns than when the model relied on image saliency alone.
Taken together these results show that computational principles in the mammalian visual system can be used to improve computer vision models
Gradient-free activation maximization for identifying effective stimuli
A fundamental question for understanding brain function is what types of
stimuli drive neurons to fire. In visual neuroscience, this question has also
been posted as characterizing the receptive field of a neuron. The search for
effective stimuli has traditionally been based on a combination of insights
from previous studies, intuition, and luck. Recently, the same question has
emerged in the study of units in convolutional neural networks (ConvNets), and
together with this question a family of solutions were developed that are
generally referred to as "feature visualization by activation maximization."
We sought to bring in tools and techniques developed for studying ConvNets to
the study of biological neural networks. However, one key difference that
impedes direct translation of tools is that gradients can be obtained from
ConvNets using backpropagation, but such gradients are not available from the
brain. To circumvent this problem, we developed a method for gradient-free
activation maximization by combining a generative neural network with a genetic
algorithm. We termed this method XDream (EXtending DeepDream with real-time
evolution for activation maximization), and we have shown that this method can
reliably create strong stimuli for neurons in the macaque visual cortex (Ponce
et al., 2019). In this paper, we describe extensive experiments characterizing
the XDream method by using ConvNet units as in silico models of neurons. We
show that XDream is applicable across network layers, architectures, and
training sets; examine design choices in the algorithm; and provide practical
guides for choosing hyperparameters in the algorithm. XDream is an efficient
algorithm for uncovering neuronal tuning preferences in black-box networks
using a vast and diverse stimulus space.Comment: 16 pages, 8 figures, 3 table
Region Refinement Network for Salient Object Detection
Albeit intensively studied, false prediction and unclear boundaries are still
major issues of salient object detection. In this paper, we propose a Region
Refinement Network (RRN), which recurrently filters redundant information and
explicitly models boundary information for saliency detection. Different from
existing refinement methods, we propose a Region Refinement Module (RRM) that
optimizes salient region prediction by incorporating supervised attention masks
in the intermediate refinement stages. The module only brings a minor increase
in model size and yet significantly reduces false predictions from the
background. To further refine boundary areas, we propose a Boundary Refinement
Loss (BRL) that adds extra supervision for better distinguishing foreground
from background. BRL is parameter free and easy to train. We further observe
that BRL helps retain the integrity in prediction by refining the boundary.
Extensive experiments on saliency detection datasets show that our refinement
module and loss bring significant improvement to the baseline and can be easily
applied to different frameworks. We also demonstrate that our proposed model
generalizes well to portrait segmentation and shadow detection tasks
- …