218,024 research outputs found
Object-based visual attention for computer vision
AbstractIn this paper, a novel model of object-based visual attention extending Duncan's Integrated Competition Hypothesis [Phil. Trans. R. Soc. London B 353 (1998) 1307–1317] is presented. In contrast to the attention mechanisms used in most previous machine vision systems which drive attention based on the spatial location hypothesis, the mechanisms which direct visual attention in our system are object-driven as well as feature-driven. The competition to gain visual attention occurs not only within an object but also between objects. For this purpose, two new mechanisms in the proposed model are described and analyzed in detail. The first mechanism computes the visual salience of objects and groupings; the second one implements the hierarchical selectivity of attentional shifts. The results of the new approach on synthetic and natural images are reported
A computer vision model for visual-object-based attention and eye movements
This is the post-print version of the final paper published in Computer Vision and Image Understanding. The published article is available from the link below. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. Copyright @ 2008 Elsevier B.V.This paper presents a new computational framework for modelling visual-object-based attention and attention-driven eye movements within an integrated system in a biologically inspired approach. Attention operates at multiple levels of visual selection by space, feature, object and group depending on the nature of targets and visual tasks. Attentional shifts and gaze shifts are constructed upon their common process circuits and control mechanisms but also separated from their different function roles, working together to fulfil flexible visual selection tasks in complicated visual environments. The framework integrates the important aspects of human visual attention and eye movements resulting in sophisticated performance in complicated natural scenes. The proposed approach aims at exploring a useful visual selection system for computer vision, especially for usage in cluttered natural visual environments.National Natural Science of Founda-
tion of Chin
Attention Mechanisms in Computer Vision: A Survey
Humans can naturally and effectively find salient regions in complex scenes.
Motivated by this observation, attention mechanisms were introduced into
computer vision with the aim of imitating this aspect of the human visual
system. Such an attention mechanism can be regarded as a dynamic weight
adjustment process based on features of the input image. Attention mechanisms
have achieved great success in many visual tasks, including image
classification, object detection, semantic segmentation, video understanding,
image generation, 3D vision, multi-modal tasks and self-supervised learning. In
this survey, we provide a comprehensive review of various attention mechanisms
in computer vision and categorize them according to approach, such as channel
attention, spatial attention, temporal attention and branch attention; a
related repository https://github.com/MenghaoGuo/Awesome-Vision-Attentions is
dedicated to collecting related work. We also suggest future directions for
attention mechanism research.Comment: 27 pages, 9 figure
Salient Object Detection Techniques in Computer Vision-A Survey.
Detection and localization of regions of images that attract immediate human visual attention is currently an intensive area of research in computer vision. The capability of automatic identification and segmentation of such salient image regions has immediate consequences for applications in the field of computer vision, computer graphics, and multimedia. A large number of salient object detection (SOD) methods have been devised to effectively mimic the capability of the human visual system to detect the salient regions in images. These methods can be broadly categorized into two categories based on their feature engineering mechanism: conventional or deep learning-based. In this survey, most of the influential advances in image-based SOD from both conventional as well as deep learning-based categories have been reviewed in detail. Relevant saliency modeling trends with key issues, core techniques, and the scope for future research work have been discussed in the context of difficulties often faced in salient object detection. Results are presented for various challenging cases for some large-scale public datasets. Different metrics considered for assessment of the performance of state-of-the-art salient object detection models are also covered. Some future directions for SOD are presented towards end
Toward a Taxonomy and Computational Models of Abnormalities in Images
The human visual system can spot an abnormal image, and reason about what
makes it strange. This task has not received enough attention in computer
vision. In this paper we study various types of atypicalities in images in a
more comprehensive way than has been done before. We propose a new dataset of
abnormal images showing a wide range of atypicalities. We design human subject
experiments to discover a coarse taxonomy of the reasons for abnormality. Our
experiments reveal three major categories of abnormality: object-centric,
scene-centric, and contextual. Based on this taxonomy, we propose a
comprehensive computational model that can predict all different types of
abnormality in images and outperform prior arts in abnormality recognition.Comment: To appear in the Thirtieth AAAI Conference on Artificial Intelligence
(AAAI 2016
HyT-NAS: Hybrid Transformers Neural Architecture Search for Edge Devices
Vision Transformers have enabled recent attention-based Deep Learning (DL)
architectures to achieve remarkable results in Computer Vision (CV) tasks.
However, due to the extensive computational resources required, these
architectures are rarely implemented on resource-constrained platforms. Current
research investigates hybrid handcrafted convolution-based and attention-based
models for CV tasks such as image classification and object detection. In this
paper, we propose HyT-NAS, an efficient Hardware-aware Neural Architecture
Search (HW-NAS) including hybrid architectures targeting vision tasks on tiny
devices. HyT-NAS improves state-of-the-art HW-NAS by enriching the search space
and enhancing the search strategy as well as the performance predictors. Our
experiments show that HyT-NAS achieves a similar hypervolume with less than ~5x
training evaluations. Our resulting architecture outperforms MLPerf MobileNetV1
by 6.3% accuracy improvement with 3.5x less number of parameters on Visual Wake
Words.Comment: CODAI 2022 Workshop - Embedded System Week (ESWeek
Texture Segregation By Visual Cortex: Perceptual Grouping, Attention, and Learning
A neural model is proposed of how laminar interactions in the visual cortex may learn and recognize object texture and form boundaries. The model brings together five interacting processes: region-based texture classification, contour-based boundary grouping, surface filling-in, spatial attention, and object attention. The model shows how form boundaries can determine regions in which surface filling-in occurs; how surface filling-in interacts with spatial attention to generate a form-fitting distribution of spatial attention, or attentional shroud; how the strongest shroud can inhibit weaker shrouds; and how the winning shroud regulates learning of texture categories, and thus the allocation of object attention. The model can discriminate abutted textures with blurred boundaries and is sensitive to texture boundary attributes like discontinuities in orientation and texture flow curvature as well as to relative orientations of texture elements. The model quantitatively fits a large set of human psychophysical data on orientation-based textures. Object boundar output of the model is compared to computer vision algorithms using a set of human segmented photographic images. The model classifies textures and suppresses noise using a multiple scale oriented filterbank and a distributed Adaptive Resonance Theory (dART) classifier. The matched signal between the bottom-up texture inputs and top-down learned texture categories is utilized by oriented competitive and cooperative grouping processes to generate texture boundaries that control surface filling-in and spatial attention. Topdown modulatory attentional feedback from boundary and surface representations to early filtering stages results in enhanced texture boundaries and more efficient learning of texture within attended surface regions. Surface-based attention also provides a self-supervising training signal for learning new textures. Importance of the surface-based attentional feedback in texture learning and classification is tested using a set of textured images from the Brodatz micro-texture album. Benchmark studies vary from 95.1% to 98.6% with attention, and from 90.6% to 93.2% without attention.Air Force Office of Scientific Research (F49620-01-1-0397, F49620-01-1-0423); National Science Foundation (SBE-0354378); Office of Naval Research (N00014-01-1-0624
AIC-AB NET: A Neural Network for Image Captioning with Spatial Attention and Text Attributes
Image captioning is a significant field across computer vision and natural
language processing. We propose and present AIC-AB NET, a novel
Attribute-Information-Combined Attention-Based Network that combines spatial
attention architecture and text attributes in an encoder-decoder. For caption
generation, adaptive spatial attention determines which image region best
represents the image and whether to attend to the visual features or the visual
sentinel. Text attribute information is synchronously fed into the decoder to
help image recognition and reduce uncertainty. We have tested and evaluated our
AICAB NET on the MS COCO dataset and a new proposed Fashion dataset. The
Fashion dataset is employed as a benchmark of single-object images. The results
show the superior performance of the proposed model compared to the
state-of-the-art baseline and ablated models on both the images from MSCOCO and
our single-object images. Our AIC-AB NET outperforms the baseline adaptive
attention network by 0.017 (CIDEr score) on the MS COCO dataset and 0.095
(CIDEr score) on the Fashion dataset
- …