1,231 research outputs found

    Unsupervised image saliency detection with Gestalt-laws guided optimization and visual attention based refinement.

    Get PDF
    Visual attention is a kind of fundamental cognitive capability that allows human beings to focus on the region of interests (ROIs) under complex natural environments. What kind of ROIs that we pay attention to mainly depends on two distinct types of attentional mechanisms. The bottom-up mechanism can guide our detection of the salient objects and regions by externally driven factors, i.e. color and location, whilst the top-down mechanism controls our biasing attention based on prior knowledge and cognitive strategies being provided by visual cortex. However, how to practically use and fuse both attentional mechanisms for salient object detection has not been sufficiently explored. To the end, we propose in this paper an integrated framework consisting of bottom-up and top-down attention mechanisms that enable attention to be computed at the level of salient objects and/or regions. Within our framework, the model of a bottom-up mechanism is guided by the gestalt-laws of perception. We interpreted gestalt-laws of homogeneity, similarity, proximity and figure and ground in link with color, spatial contrast at the level of regions and objects to produce feature contrast map. The model of top-down mechanism aims to use a formal computational model to describe the background connectivity of the attention and produce the priority map. Integrating both mechanisms and applying to salient object detection, our results have demonstrated that the proposed method consistently outperforms a number of existing unsupervised approaches on five challenging and complicated datasets in terms of higher precision and recall rates, AP (average precision) and AUC (area under curve) values

    Um arcabouço para estimativa de saliência em múltiplas iterações em diferentes domínios de imagem

    Get PDF
    Orientador: Alexandre Xavier FalcãoDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: A detecção de objetos salientes estima os objetos que mais se destacam em uma imagem. Os estimadores de saliência não-supervisionados utilizam um conjunto predeterminado de suposições a respeito de como humanos percebem saliência para identificar características discriminantes de objeto salientes. Como esses métodos fixam essas suposições predeterminadas como parte integral de seu modelo, esses métodos não podem ser facilmente estendidos para cenários específicos ou outros domínios de imagens. Nós propomos, então, um arcabouço iterativo para estimação de saliência baseado em superpixels, intitulado ITSELF (Iterative Saliency Estimation fLexible Framework). Nosso arcabouço permite que o usuário adicione múltiplas suposições de saliência para melhor representar seu modelo. Graças a avanços em algoritmos de segmentação por superpixels, mapas de saliência podem ser utilizados para melhorar o delineamento de superpixels. Combinando algoritmos de superpixels baseados em informações de saliência com algoritmos de estimação de saliência baseados em superpixels, nós propomos um ciclo para auto melhoria iterativa de mapas de saliência. Nós comparamos o ITSELF com outros dois estimadores de saliência no estado-da-arte em cinco métricas e seis conjuntos de dados, dos quais quatro são compostos por imagens naturais, e dois são compostos por imagens biomédicas. Os experimentos mostram que nossa abordagem é mais robusta quando comparada aos outros métodos, apresentando resultados competitivos em imagens naturais e os superando em imagens biomédicasAbstract: Saliency object detection estimates the objects that most stand out in an image. The available unsupervised saliency estimators rely on a pre-determined set of assumptions of how humans perceive saliency to create discriminating features. These methods cannot be easily extended for specific settings and different image domains by fixing the pre-selected assumptions as an integral part of their models. We then propose a superpixel-based ITerative Saliency Estimation fLexible Framework (ITSELF) that allows any user-defined assumptions to be added to the model when required. Thanks to recent advancements in superpixel segmentation algorithms, saliency-maps can be used to improve superpixel delineation. By combining a saliency-based superpixel algorithm to a superpixel-based saliency estimator, we propose a novel saliency/superpixel self-improving loop to enhance saliency maps iteratively. We compare ITSELF to two state-of-the-art saliency estimators on five metrics and six datasets, four of them with natural images and two with biomedical images. Experiments show that our approach is more robust than the compared methods, presenting competitive results on natural image datasets and outperforming them on biomedical image datasetsMestradoCiência da ComputaçãoMestre em Ciência da Computação134659/2018-0CNP

    Visual Saliency Estimation and Its Applications

    Get PDF
    The human visual system can automatically emphasize some parts of the image and ignore the other parts when seeing an image or a scene. Visual Saliency Estimation (VSE) aims to imitate this functionality of the human visual system to estimate the degree of human attention attracted by different image regions and locate the salient object. The study of VSE will help us explore the way human visual systems extract objects from an image. It has wide applications, such as robot navigation, video surveillance, object tracking, self-driving, etc. The current VSE approaches on natural images models generic visual stimuli based on lower-level image features, e.g., locations, local/global contrast, and feature correlation. However, existing models still suffered from some drawbacks. First, these methods fail in the cases when the objects are near the image borders. Second, due to imperfect model assumptions, many methods cannot achieve good results when the images have complicated backgrounds. In this work, I focuses on solving these challenges on the natural images by proposing a new framework with more robust task-related priors, and I apply the framework to low-quality biomedical images. The new framework formulates VSE on natural images as a quadratic program (QP) problem. It proposes an adaptive center-based bias hypothesis to replace the most common image center-based center-bias, which is much more robust even when the objects are far away from the image center. Second, it models a new smoothness term to force similar color having similar saliency statistics, which is more robust than that based on region dissimilarity when the image has a complicated background or low contrast. The new approach achieves the best performance among 11 latest methods on three public datasets. Three approaches based on the framework by integrating both high-level domain-knowledge and robust low-level saliency assumptions are utilized to imitate the radiologists\u27 attention to detect breast tumors from breast ultrasound images

    Representations and representation learning for image aesthetics prediction and image enhancement

    Get PDF
    With the continual improvement in cell phone cameras and improvements in the connectivity of mobile devices, we have seen an exponential increase in the images that are captured, stored and shared on social media. For example, as of July 1st 2017 Instagram had over 715 million registered users which had posted just shy of 35 billion images. This represented approximately seven and nine-fold increase in the number of users and photos present on Instagram since 2012. Whether the images are stored on personal computers or reside on social networks (e.g. Instagram, Flickr), the sheer number of images calls for methods to determine various image properties, such as object presence or appeal, for the purpose of automatic image management and curation. One of the central problems in consumer photography centers around determining the aesthetic appeal of an image and motivates us to explore questions related to understanding aesthetic preferences, image enhancement and the possibility of using such models on devices with constrained resources. In this dissertation, we present our work on exploring representations and representation learning approaches for aesthetic inference, composition ranking and its application to image enhancement. Firstly, we discuss early representations that mainly consisted of expert features, and their possibility to enhance Convolutional Neural Networks (CNN). Secondly, we discuss the ability of resource-constrained CNNs, and the different architecture choices (inputs size and layer depth) in solving various aesthetic inference tasks: binary classification, regression, and image cropping. We show that if trained for solving fine-grained aesthetics inference, such models can rival the cropping performance of other aesthetics-based croppers, however they fall short in comparison to models trained for composition ranking. Lastly, we discuss our work on exploring and identifying the design choices in training composition ranking functions, with the goal of using them for image composition enhancement

    The Role of Early Recurrence in Improving Visual Representations

    Get PDF
    This dissertation proposes a computational model of early vision with recurrence, termed as early recurrence. The idea is motivated from the research of the primate vision. Specifically, the proposed model relies on the following four observations. 1) The primate visual system includes two main visual pathways: the dorsal pathway and the ventral pathway; 2) The two pathways respond to different visual features; 3) The neurons of the dorsal pathway conduct visual information faster than that of the neurons of the ventral pathway; 4) There are lower-level feedback connections from the dorsal pathway to the ventral pathway. As such, the primate visual system may implement a recurrent mechanism to improve visual representations of the ventral pathway. Our work starts from a comprehensive review of the literature, based on which a conceptualization of early recurrence is proposed. Early recurrence manifests itself as a form of surround suppression. We propose that early recurrence is capable of refining the ventral processing using results of the dorsal processing. Our work further defines a set of computational components to formalize early recurrence. Although we do not intend to model the true nature of biology, to verify that the proposed computation is biologically consistent, we have applied the model to simulate a neurophysiological experiment of a bar-and-checkerboard and a psychological experiment involving a moving contour illusion. Simulation results indicated that the proposed computation behaviourally reproduces the original observations. The ultimate goal of this work is to investigate whether the proposal is capable of improving computer vision applications. To do this, we have applied the model to a variety of applications, including visual saliency and contour detection. Based on comparisons against the state-of-the-art, we conclude that the proposed model of early recurrence sheds light on a generally applicable yet lightweight approach to boost real-life application performance

    Salient Object Detection via Structured Matrix Decomposition

    Get PDF
    Low-rank recovery models have shown potential for salient object detection, where a matrix is decomposed into a low-rank matrix representing image background and a sparse matrix identifying salient objects. Two deficiencies, however, still exist. First, previous work typically assumes the elements in the sparse matrix are mutually independent, ignoring the spatial and pattern relations of image regions. Second, when the low-rank and sparse matrices are relatively coherent, e.g., when there are similarities between the salient objects and background or when the background is complicated, it is difficult for previous models to disentangle them. To address these problems, we propose a novel structured matrix decomposition model with two structural regularizations: (1) a tree-structured sparsity-inducing regularization that captures the image structure and enforces patches from the same object to have similar saliency values, and (2) a Laplacian regularization that enlarges the gaps between salient objects and the background in feature space. Furthermore, high-level priors are integrated to guide the matrix decomposition and boost the detection. We evaluate our model for salient object detection on five challenging datasets including single object, multiple objects and complex scene images, and show competitive results as compared with 24 state-of-the-art methods in terms of seven performance metrics

    Visual saliency computation for image analysis

    Full text link
    Visual saliency computation is about detecting and understanding salient regions and elements in a visual scene. Algorithms for visual saliency computation can give clues to where people will look in images, what objects are visually prominent in a scene, etc. Such algorithms could be useful in a wide range of applications in computer vision and graphics. In this thesis, we study the following visual saliency computation problems. 1) Eye Fixation Prediction. Eye fixation prediction aims to predict where people look in a visual scene. For this problem, we propose a Boolean Map Saliency (BMS) model which leverages the global surroundedness cue using a Boolean map representation. We draw a theoretic connection between BMS and the Minimum Barrier Distance (MBD) transform to provide insight into our algorithm. Experiment results show that BMS compares favorably with state-of-the-art methods on seven benchmark datasets. 2) Salient Region Detection. Salient region detection entails computing a saliency map that highlights the regions of dominant objects in a scene. We propose a salient region detection method based on the Minimum Barrier Distance (MBD) transform. We present a fast approximate MBD transform algorithm with an error bound analysis. Powered by this fast MBD transform algorithm, our method can run at about 80 FPS and achieve state-of-the-art performance on four benchmark datasets. 3) Salient Object Detection. Salient object detection targets at localizing each salient object instance in an image. We propose a method using a Convolutional Neural Network (CNN) model for proposal generation and a novel subset optimization formulation for bounding box filtering. In experiments, our subset optimization formulation consistently outperforms heuristic bounding box filtering baselines, such as Non-maximum Suppression, and our method substantially outperforms previous methods on three challenging datasets. 4) Salient Object Subitizing. We propose a new visual saliency computation task, called Salient Object Subitizing, which is to predict the existence and the number of salient objects in an image using holistic cues. To this end, we present an image dataset of about 14K everyday images which are annotated using an online crowdsourcing marketplace. We show that an end-to-end trained CNN subitizing model can achieve promising performance without requiring any localization process. A method is proposed to further improve the training of the CNN subitizing model by leveraging synthetic images. 5) Top-down Saliency Detection. Unlike the aforementioned tasks, top-down saliency detection entails generating task-specific saliency maps. We propose a weakly supervised top-down saliency detection approach by modeling the top-down attention of a CNN image classifier. We propose Excitation Backprop and the concept of contrastive attention to generate highly discriminative top-down saliency maps. Our top-down saliency detection method achieves superior performance in weakly supervised localization tasks on challenging datasets. The usefulness of our method is further validated in the text-to-region association task, where our method provides state-of-the-art performance using only weakly labeled web images for training
    corecore