218,024 research outputs found

    Object-based visual attention for computer vision

    Get PDF
    AbstractIn this paper, a novel model of object-based visual attention extending Duncan's Integrated Competition Hypothesis [Phil. Trans. R. Soc. London B 353 (1998) 1307–1317] is presented. In contrast to the attention mechanisms used in most previous machine vision systems which drive attention based on the spatial location hypothesis, the mechanisms which direct visual attention in our system are object-driven as well as feature-driven. The competition to gain visual attention occurs not only within an object but also between objects. For this purpose, two new mechanisms in the proposed model are described and analyzed in detail. The first mechanism computes the visual salience of objects and groupings; the second one implements the hierarchical selectivity of attentional shifts. The results of the new approach on synthetic and natural images are reported

    A computer vision model for visual-object-based attention and eye movements

    Get PDF
    This is the post-print version of the final paper published in Computer Vision and Image Understanding. The published article is available from the link below. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. Copyright @ 2008 Elsevier B.V.This paper presents a new computational framework for modelling visual-object-based attention and attention-driven eye movements within an integrated system in a biologically inspired approach. Attention operates at multiple levels of visual selection by space, feature, object and group depending on the nature of targets and visual tasks. Attentional shifts and gaze shifts are constructed upon their common process circuits and control mechanisms but also separated from their different function roles, working together to fulfil flexible visual selection tasks in complicated visual environments. The framework integrates the important aspects of human visual attention and eye movements resulting in sophisticated performance in complicated natural scenes. The proposed approach aims at exploring a useful visual selection system for computer vision, especially for usage in cluttered natural visual environments.National Natural Science of Founda- tion of Chin

    Attention Mechanisms in Computer Vision: A Survey

    Full text link
    Humans can naturally and effectively find salient regions in complex scenes. Motivated by this observation, attention mechanisms were introduced into computer vision with the aim of imitating this aspect of the human visual system. Such an attention mechanism can be regarded as a dynamic weight adjustment process based on features of the input image. Attention mechanisms have achieved great success in many visual tasks, including image classification, object detection, semantic segmentation, video understanding, image generation, 3D vision, multi-modal tasks and self-supervised learning. In this survey, we provide a comprehensive review of various attention mechanisms in computer vision and categorize them according to approach, such as channel attention, spatial attention, temporal attention and branch attention; a related repository https://github.com/MenghaoGuo/Awesome-Vision-Attentions is dedicated to collecting related work. We also suggest future directions for attention mechanism research.Comment: 27 pages, 9 figure

    Salient Object Detection Techniques in Computer Vision-A Survey.

    Full text link
    Detection and localization of regions of images that attract immediate human visual attention is currently an intensive area of research in computer vision. The capability of automatic identification and segmentation of such salient image regions has immediate consequences for applications in the field of computer vision, computer graphics, and multimedia. A large number of salient object detection (SOD) methods have been devised to effectively mimic the capability of the human visual system to detect the salient regions in images. These methods can be broadly categorized into two categories based on their feature engineering mechanism: conventional or deep learning-based. In this survey, most of the influential advances in image-based SOD from both conventional as well as deep learning-based categories have been reviewed in detail. Relevant saliency modeling trends with key issues, core techniques, and the scope for future research work have been discussed in the context of difficulties often faced in salient object detection. Results are presented for various challenging cases for some large-scale public datasets. Different metrics considered for assessment of the performance of state-of-the-art salient object detection models are also covered. Some future directions for SOD are presented towards end

    Toward a Taxonomy and Computational Models of Abnormalities in Images

    Full text link
    The human visual system can spot an abnormal image, and reason about what makes it strange. This task has not received enough attention in computer vision. In this paper we study various types of atypicalities in images in a more comprehensive way than has been done before. We propose a new dataset of abnormal images showing a wide range of atypicalities. We design human subject experiments to discover a coarse taxonomy of the reasons for abnormality. Our experiments reveal three major categories of abnormality: object-centric, scene-centric, and contextual. Based on this taxonomy, we propose a comprehensive computational model that can predict all different types of abnormality in images and outperform prior arts in abnormality recognition.Comment: To appear in the Thirtieth AAAI Conference on Artificial Intelligence (AAAI 2016

    HyT-NAS: Hybrid Transformers Neural Architecture Search for Edge Devices

    Full text link
    Vision Transformers have enabled recent attention-based Deep Learning (DL) architectures to achieve remarkable results in Computer Vision (CV) tasks. However, due to the extensive computational resources required, these architectures are rarely implemented on resource-constrained platforms. Current research investigates hybrid handcrafted convolution-based and attention-based models for CV tasks such as image classification and object detection. In this paper, we propose HyT-NAS, an efficient Hardware-aware Neural Architecture Search (HW-NAS) including hybrid architectures targeting vision tasks on tiny devices. HyT-NAS improves state-of-the-art HW-NAS by enriching the search space and enhancing the search strategy as well as the performance predictors. Our experiments show that HyT-NAS achieves a similar hypervolume with less than ~5x training evaluations. Our resulting architecture outperforms MLPerf MobileNetV1 by 6.3% accuracy improvement with 3.5x less number of parameters on Visual Wake Words.Comment: CODAI 2022 Workshop - Embedded System Week (ESWeek

    Texture Segregation By Visual Cortex: Perceptual Grouping, Attention, and Learning

    Get PDF
    A neural model is proposed of how laminar interactions in the visual cortex may learn and recognize object texture and form boundaries. The model brings together five interacting processes: region-based texture classification, contour-based boundary grouping, surface filling-in, spatial attention, and object attention. The model shows how form boundaries can determine regions in which surface filling-in occurs; how surface filling-in interacts with spatial attention to generate a form-fitting distribution of spatial attention, or attentional shroud; how the strongest shroud can inhibit weaker shrouds; and how the winning shroud regulates learning of texture categories, and thus the allocation of object attention. The model can discriminate abutted textures with blurred boundaries and is sensitive to texture boundary attributes like discontinuities in orientation and texture flow curvature as well as to relative orientations of texture elements. The model quantitatively fits a large set of human psychophysical data on orientation-based textures. Object boundar output of the model is compared to computer vision algorithms using a set of human segmented photographic images. The model classifies textures and suppresses noise using a multiple scale oriented filterbank and a distributed Adaptive Resonance Theory (dART) classifier. The matched signal between the bottom-up texture inputs and top-down learned texture categories is utilized by oriented competitive and cooperative grouping processes to generate texture boundaries that control surface filling-in and spatial attention. Topdown modulatory attentional feedback from boundary and surface representations to early filtering stages results in enhanced texture boundaries and more efficient learning of texture within attended surface regions. Surface-based attention also provides a self-supervising training signal for learning new textures. Importance of the surface-based attentional feedback in texture learning and classification is tested using a set of textured images from the Brodatz micro-texture album. Benchmark studies vary from 95.1% to 98.6% with attention, and from 90.6% to 93.2% without attention.Air Force Office of Scientific Research (F49620-01-1-0397, F49620-01-1-0423); National Science Foundation (SBE-0354378); Office of Naval Research (N00014-01-1-0624

    AIC-AB NET: A Neural Network for Image Captioning with Spatial Attention and Text Attributes

    Full text link
    Image captioning is a significant field across computer vision and natural language processing. We propose and present AIC-AB NET, a novel Attribute-Information-Combined Attention-Based Network that combines spatial attention architecture and text attributes in an encoder-decoder. For caption generation, adaptive spatial attention determines which image region best represents the image and whether to attend to the visual features or the visual sentinel. Text attribute information is synchronously fed into the decoder to help image recognition and reduce uncertainty. We have tested and evaluated our AICAB NET on the MS COCO dataset and a new proposed Fashion dataset. The Fashion dataset is employed as a benchmark of single-object images. The results show the superior performance of the proposed model compared to the state-of-the-art baseline and ablated models on both the images from MSCOCO and our single-object images. Our AIC-AB NET outperforms the baseline adaptive attention network by 0.017 (CIDEr score) on the MS COCO dataset and 0.095 (CIDEr score) on the Fashion dataset
    • …
    corecore