67,084 research outputs found
Color and Shape Recognition
The object "car" and "cat" can be easily distinguished by humans, but how these labels are assigned? Grouping these images is easy for a person into different categories, but its very tedious for a computer. Hence, an object recognition system finds objects in the real world from an image. Object recognition algorithms rely on matching, learning or pattern recognition algorithms using appearance-based or feature-based techniques. In this thesis, the use of color and shape attributes as an explicit color and shape representation respectively for object detection is proposed. Color attributes are dense, computationally effective, and when joined with old-fashioned shape features provide pleasing results for object detection. The procedure of shape detection is actually a natural extension of the job of edge detection at the pixel level to the difficulty of global contour detection. A tool for a systematic analysis of edge based shape detection is provided by this filtering scheme. This enables us to find distinctions between objects based on color and shape
Attention to attributes and objects in working memory
It has been debated on the basis of change-detection procedures whether visual working memory is limited by the number of objects, task-relevant attributes within those objects, or bindings between attributes. This debate, however, has been hampered by several limitations, including the use of conditions that vary between studies and the absence of appropriate mathematical models to estimate the number of items in working memory in different stimulus conditions. We re-examined working memory limits in two experiments with a wide array of conditions involving color and shape attributes, relying on a set of new models to fit various stimulus situations. In Experiment 2, a new procedure allowed identical retrieval conditions across different conditions of attention at encoding. The results show that multiple attributes compete for attention, but that retaining the binding between attributes is accomplished only by retaining the attributes themselves. We propose a theoretical account in which a fixed object capacity limit contains within it the possibility of the incomplete retention of object attributes, depending on the direction of attention
Spectral salient object detection
© 2014 IEEE. Many existing methods for salient object detection are performed by over-segmenting images into non-overlapping regions, which facilitate local/global color statistics for saliency computation. In this paper, we propose a new approach: spectral salient object detection, which is benefited from selected attributes of normalized cut, enabling better retaining of holistic salient objects as comparing to conventionally employed pre-segmentation techniques. The proposed saliency detection method recursively bi-partitions regions that render the lowest cut cost in each iteration, resulting in binary spanning tree structure. Each segmented region is then evaluated under criterion that fit Gestalt laws and statistical prior. Final result is obtained by integrating multiple intermediate saliency maps. Experimental results on three benchmark datasets demonstrate the effectiveness of the proposed method against 13 state-of-the-art approaches to salient object detection
Data-Side Efficiencies for Lightweight Convolutional Neural Networks
We examine how the choice of data-side attributes for two important visual
tasks of image classification and object detection can aid in the choice or
design of lightweight convolutional neural networks. We show by experimentation
how four data attributes - number of classes, object color, image resolution,
and object scale affect neural network model size and efficiency. Intra- and
inter-class similarity metrics, based on metric learning, are defined to guide
the evaluation of these attributes toward achieving lightweight models.
Evaluations made using these metrics are shown to require 30x less computation
than running full inference tests. We provide, as an example, applying the
metrics and methods to choose a lightweight model for a robot path planning
application and achieve computation reduction of 66% and accuracy gain of 3.5%
over the pre-method model.Comment: 10 pages, 5 figures, 6 table
Attribute Pair-Based Visual Recognition and Memory
Background: In the human visual system, different attributes of an object, such as shape, color, and motion, are processed separately in different areas of the brain. This raises a fundamental question of how are these attributes integrated to produce a unified perception and a specific response. This ‘‘binding problem’ ’ is computationally difficult because all attributes are assumed to be bound together to form a single object representation. However, there is no firm evidence to confirm that such representations exist for general objects. Methodology/Principal Findings: Here we propose a paired-attribute model in which cognitive processes are based on multiple representations of paired attributes. In line with the model’s prediction, we found that multiattribute stimuli can produce an illusory perception of a multiattribute object arising from erroneous integration of attribute pairs, implying that object recognition is based on parallel perception of paired attributes. Moreover, in a change-detection task, a feature change in a single attribute frequently caused an illusory perception of change in another attribute, suggesting that multiple pairs of attributes are stored in memory. Conclusions/Significance: The paired-attribute model can account for some novel illusions and controversial findings on binocular rivalry and short-term memory. Our results suggest that many cognitive processes are performed at the level of paired attributes rather than integrated objects, which greatly facilitates the binding problem and provides simple
Automatic Recognition of Film Genres
Film genres in digital video can be detected automatically. In a three-step approach we analyze first the syntactic properties of digital films: color statistics, cut detection, camera motion, object motion and audio. In a second step we use these statistics to derive at a more abstract level film style attributes such as camera panning and zooming, speech and music. These are distinguishing properties for film genres, e.g. newscasts vs. sports vs. commercials. In the third and final step we map the detected style attributes to film genres. Algorithms for the three steps are presented in detail, and we report on initial experience with real videos. It is our goal to automatically classify the large body of existing video for easier access in digital video-on-demand databases
FACET: Fairness in Computer Vision Evaluation Benchmark
Computer vision models have known performance disparities across attributes
such as gender and skin tone. This means during tasks such as classification
and detection, model performance differs for certain classes based on the
demographics of the people in the image. These disparities have been shown to
exist, but until now there has not been a unified approach to measure these
differences for common use-cases of computer vision models. We present a new
benchmark named FACET (FAirness in Computer Vision EvaluaTion), a large,
publicly available evaluation set of 32k images for some of the most common
vision tasks - image classification, object detection and segmentation. For
every image in FACET, we hired expert reviewers to manually annotate
person-related attributes such as perceived skin tone and hair type, manually
draw bounding boxes and label fine-grained person-related classes such as disk
jockey or guitarist. In addition, we use FACET to benchmark state-of-the-art
vision models and present a deeper understanding of potential performance
disparities and challenges across sensitive demographic attributes. With the
exhaustive annotations collected, we probe models using single demographics
attributes as well as multiple attributes using an intersectional approach
(e.g. hair color and perceived skin tone). Our results show that
classification, detection, segmentation, and visual grounding models exhibit
performance disparities across demographic attributes and intersections of
attributes. These harms suggest that not all people represented in datasets
receive fair and equitable treatment in these vision tasks. We hope current and
future results using our benchmark will contribute to fairer, more robust
vision models. FACET is available publicly at https://facet.metademolab.com
- …