67,084 research outputs found

    Color and Shape Recognition

    Get PDF
    The object "car" and "cat" can be easily distinguished by humans, but how these labels are assigned? Grouping these images is easy for a person into different categories, but its very tedious for a computer. Hence, an object recognition system finds objects in the real world from an image. Object recognition algorithms rely on matching, learning or pattern recognition algorithms using appearance-based or feature-based techniques. In this thesis, the use of color and shape attributes as an explicit color and shape representation respectively for object detection is proposed. Color attributes are dense, computationally effective, and when joined with old-fashioned shape features provide pleasing results for object detection. The procedure of shape detection is actually a natural extension of the job of edge detection at the pixel level to the difficulty of global contour detection. A tool for a systematic analysis of edge based shape detection is provided by this filtering scheme. This enables us to find distinctions between objects based on color and shape

    Attention to attributes and objects in working memory

    Get PDF
    It has been debated on the basis of change-detection procedures whether visual working memory is limited by the number of objects, task-relevant attributes within those objects, or bindings between attributes. This debate, however, has been hampered by several limitations, including the use of conditions that vary between studies and the absence of appropriate mathematical models to estimate the number of items in working memory in different stimulus conditions. We re-examined working memory limits in two experiments with a wide array of conditions involving color and shape attributes, relying on a set of new models to fit various stimulus situations. In Experiment 2, a new procedure allowed identical retrieval conditions across different conditions of attention at encoding. The results show that multiple attributes compete for attention, but that retaining the binding between attributes is accomplished only by retaining the attributes themselves. We propose a theoretical account in which a fixed object capacity limit contains within it the possibility of the incomplete retention of object attributes, depending on the direction of attention

    Spectral salient object detection

    Full text link
    © 2014 IEEE. Many existing methods for salient object detection are performed by over-segmenting images into non-overlapping regions, which facilitate local/global color statistics for saliency computation. In this paper, we propose a new approach: spectral salient object detection, which is benefited from selected attributes of normalized cut, enabling better retaining of holistic salient objects as comparing to conventionally employed pre-segmentation techniques. The proposed saliency detection method recursively bi-partitions regions that render the lowest cut cost in each iteration, resulting in binary spanning tree structure. Each segmented region is then evaluated under criterion that fit Gestalt laws and statistical prior. Final result is obtained by integrating multiple intermediate saliency maps. Experimental results on three benchmark datasets demonstrate the effectiveness of the proposed method against 13 state-of-the-art approaches to salient object detection

    Data-Side Efficiencies for Lightweight Convolutional Neural Networks

    Full text link
    We examine how the choice of data-side attributes for two important visual tasks of image classification and object detection can aid in the choice or design of lightweight convolutional neural networks. We show by experimentation how four data attributes - number of classes, object color, image resolution, and object scale affect neural network model size and efficiency. Intra- and inter-class similarity metrics, based on metric learning, are defined to guide the evaluation of these attributes toward achieving lightweight models. Evaluations made using these metrics are shown to require 30x less computation than running full inference tests. We provide, as an example, applying the metrics and methods to choose a lightweight model for a robot path planning application and achieve computation reduction of 66% and accuracy gain of 3.5% over the pre-method model.Comment: 10 pages, 5 figures, 6 table

    Attribute Pair-Based Visual Recognition and Memory

    Get PDF
    Background: In the human visual system, different attributes of an object, such as shape, color, and motion, are processed separately in different areas of the brain. This raises a fundamental question of how are these attributes integrated to produce a unified perception and a specific response. This ‘‘binding problem’ ’ is computationally difficult because all attributes are assumed to be bound together to form a single object representation. However, there is no firm evidence to confirm that such representations exist for general objects. Methodology/Principal Findings: Here we propose a paired-attribute model in which cognitive processes are based on multiple representations of paired attributes. In line with the model’s prediction, we found that multiattribute stimuli can produce an illusory perception of a multiattribute object arising from erroneous integration of attribute pairs, implying that object recognition is based on parallel perception of paired attributes. Moreover, in a change-detection task, a feature change in a single attribute frequently caused an illusory perception of change in another attribute, suggesting that multiple pairs of attributes are stored in memory. Conclusions/Significance: The paired-attribute model can account for some novel illusions and controversial findings on binocular rivalry and short-term memory. Our results suggest that many cognitive processes are performed at the level of paired attributes rather than integrated objects, which greatly facilitates the binding problem and provides simple

    Automatic Recognition of Film Genres

    Full text link
    Film genres in digital video can be detected automatically. In a three-step approach we analyze first the syntactic properties of digital films: color statistics, cut detection, camera motion, object motion and audio. In a second step we use these statistics to derive at a more abstract level film style attributes such as camera panning and zooming, speech and music. These are distinguishing properties for film genres, e.g. newscasts vs. sports vs. commercials. In the third and final step we map the detected style attributes to film genres. Algorithms for the three steps are presented in detail, and we report on initial experience with real videos. It is our goal to automatically classify the large body of existing video for easier access in digital video-on-demand databases

    FACET: Fairness in Computer Vision Evaluation Benchmark

    Full text link
    Computer vision models have known performance disparities across attributes such as gender and skin tone. This means during tasks such as classification and detection, model performance differs for certain classes based on the demographics of the people in the image. These disparities have been shown to exist, but until now there has not been a unified approach to measure these differences for common use-cases of computer vision models. We present a new benchmark named FACET (FAirness in Computer Vision EvaluaTion), a large, publicly available evaluation set of 32k images for some of the most common vision tasks - image classification, object detection and segmentation. For every image in FACET, we hired expert reviewers to manually annotate person-related attributes such as perceived skin tone and hair type, manually draw bounding boxes and label fine-grained person-related classes such as disk jockey or guitarist. In addition, we use FACET to benchmark state-of-the-art vision models and present a deeper understanding of potential performance disparities and challenges across sensitive demographic attributes. With the exhaustive annotations collected, we probe models using single demographics attributes as well as multiple attributes using an intersectional approach (e.g. hair color and perceived skin tone). Our results show that classification, detection, segmentation, and visual grounding models exhibit performance disparities across demographic attributes and intersections of attributes. These harms suggest that not all people represented in datasets receive fair and equitable treatment in these vision tasks. We hope current and future results using our benchmark will contribute to fairer, more robust vision models. FACET is available publicly at https://facet.metademolab.com
    corecore