95 research outputs found

    Feature combination strategies for saliency-based visual attention systems

    Get PDF
    Bottom-up or saliency-based visual attention allows primates to detect nonspecific conspicuous targets in cluttered scenes. A classical metaphor, derived from electrophysiological and psychophysical studies, describes attention as a rapidly shiftable “spotlight.” We use a model that reproduces the attentional scan paths of this spotlight. Simple multi-scale “feature maps” detect local spatial discontinuities in intensity, color, and orientation, and are combined into a unique “master” or “saliency” map. The saliency map is sequentially scanned, in order of decreasing saliency, by the focus of attention. We here study the problem of combining feature maps, from different visual modalities (such as color and orientation), into a unique saliency map. Four combination strategies are compared using three databases of natural color images: (1) Simple normalized summation, (2) linear combination with learned weights, (3) global nonlinear normalization followed by summation, and (4) local nonlinear competition between salient locations followed by summation. Performance was measured as the number of false detections before the most salient target was found. Strategy (1) always yielded poorest performance and (2) best performance, with a threefold to eightfold improvement in time to find a salient target. However, (2) yielded specialized systems with poor generalization. Interestingly, strategy (4) and its simplified, computationally efficient approximation (3) yielded significantly better performance than (1), with up to fourfold improvement, while preserving generality

    Feature combination strategies for saliency-based visual attention systems

    Get PDF
    Bottom-up or saliency-based visual attention allows primates to detect nonspecific conspicuous targets in cluttered scenes. A classical metaphor, derived from electrophysiological and psychophysical studies, describes attention as a rapidly shiftable “spotlight.” We use a model that reproduces the attentional scan paths of this spotlight. Simple multi-scale “feature maps” detect local spatial discontinuities in intensity, color, and orientation, and are combined into a unique “master” or “saliency” map. The saliency map is sequentially scanned, in order of decreasing saliency, by the focus of attention. We here study the problem of combining feature maps, from different visual modalities (such as color and orientation), into a unique saliency map. Four combination strategies are compared using three databases of natural color images: (1) Simple normalized summation, (2) linear combination with learned weights, (3) global nonlinear normalization followed by summation, and (4) local nonlinear competition between salient locations followed by summation. Performance was measured as the number of false detections before the most salient target was found. Strategy (1) always yielded poorest performance and (2) best performance, with a threefold to eightfold improvement in time to find a salient target. However, (2) yielded specialized systems with poor generalization. Interestingly, strategy (4) and its simplified, computationally efficient approximation (3) yielded significantly better performance than (1), with up to fourfold improvement, while preserving generality

    Personalization of Saliency Estimation

    Full text link
    Most existing saliency models use low-level features or task descriptions when generating attention predictions. However, the link between observer characteristics and gaze patterns is rarely investigated. We present a novel saliency prediction technique which takes viewers' identities and personal traits into consideration when modeling human attention. Instead of only computing image salience for average observers, we consider the interpersonal variation in the viewing behaviors of observers with different personal traits and backgrounds. We present an enriched derivative of the GAN network, which is able to generate personalized saliency predictions when fed with image stimuli and specific information about the observer. Our model contains a generator which generates grayscale saliency heat maps based on the image and an observer label. The generator is paired with an adversarial discriminator which learns to distinguish generated salience from ground truth salience. The discriminator also has the observer label as an input, which contributes to the personalization ability of our approach. We evaluate the performance of our personalized salience model by comparison with a benchmark model along with other un-personalized predictions, and illustrate improvements in prediction accuracy for all tested observer groups

    Multi-scale Deep Learning Architectures for Person Re-identification

    Full text link
    Person Re-identification (re-id) aims to match people across non-overlapping camera views in a public space. It is a challenging problem because many people captured in surveillance videos wear similar clothes. Consequently, the differences in their appearance are often subtle and only detectable at the right location and scales. Existing re-id models, particularly the recently proposed deep learning based ones match people at a single scale. In contrast, in this paper, a novel multi-scale deep learning model is proposed. Our model is able to learn deep discriminative feature representations at different scales and automatically determine the most suitable scales for matching. The importance of different spatial locations for extracting discriminative features is also learned explicitly. Experiments are carried out to demonstrate that the proposed model outperforms the state-of-the art on a number of benchmarksComment: 9 pages, 3 figures, accepted by ICCV 201

    Dynamic Visual Attention: competitive versus motion priority scheme

    Get PDF
    Defined as attentive process in presence of visual sequences, dynamic visual attention responds to static and motion features as well. For a computer model, a straightforward way to integrate these features is to combine all features in a competitive scheme: the saliency map contains a contribution of each feature, static and motion. Another way of integration is to combine the features in a motion priority scheme: in presence of motion, the saliency map is computed as the motion map, and in absence of motion, as the static map. In this paper, four models are considered: two models based on a competitive scheme and two models based on a motion priority scheme. The models are evaluated experimentally by comparing them with respect to the eye movement patterns of human subjects, while viewing a set of video sequences. Qualitative and quantitative evaluations, performed in the context of simple synthetic video sequences, show the highest performance of the motion priority scheme, compared to the competitive scheme

    Learning regions of interest from low level maps in virtual microscopy

    Get PDF
    Virtual microscopy can improve the workflow of modern pathology laboratories, a goal limited by the large size of the virtual slides (VS). Lately, determination of the Regions of Interest has shown to be useful in navigation and compression tasks. This work presents a novel method for establishing RoIs in VS, based on a relevance score calculated from example images selected by pathologist. The process starts by splitting the Virtual Slide (VS) into a grid of blocks, each represented by a set of low level features which aim to capture the very basic visual properties, namely, color, intensity, orientation and texture. The expert selects then two blocks i.e. A typical relevant (irrelevant) instance. Different similarity (disimilarity) maps are then constructed, using these positive (negative) examples. The obtained maps are then integrated by a normalization process that promotes maps with a similarity global maxima that largely exceeds the average local maxima. Each image region is thus entailed with an associated score, established by the number of closest positive (negative) blocks, whereby any block has also an associated score. Evaluation was carried out using 8 VS from different tissues, upon which a group of three pathologists had navigated. Precision-recall measurements were calculated at each step of any actual navigation, obtaining an average precision of 55% and a recall of about 38% when using the available set of navigations
    • …
    corecore