60,695 research outputs found
Insignificant shadow detection for video segmentation
To prevent moving cast shadows from being misunderstood as part of moving objects in change detection based
video segmentation, this paper proposes a novel approach to the cast shadow detection based on the edge and region information in multiple frames. First, an initial change detection mask containing moving objects and cast shadows is obtained. Then a Canny edge
map is generated. After that, the shadow region is detected and
removed through multiframe integration, edge matching, and region growing. Finally, a post processing procedure is used to eliminate noise and tune the boundaries of the objects. Our approach
can be used for video segmentation in indoor environment. The experimental results demonstrate its good performance
What Can Help Pedestrian Detection?
Aggregating extra features has been considered as an effective approach to
boost traditional pedestrian detection methods. However, there is still a lack
of studies on whether and how CNN-based pedestrian detectors can benefit from
these extra features. The first contribution of this paper is exploring this
issue by aggregating extra features into CNN-based pedestrian detection
framework. Through extensive experiments, we evaluate the effects of different
kinds of extra features quantitatively. Moreover, we propose a novel network
architecture, namely HyperLearner, to jointly learn pedestrian detection as
well as the given extra feature. By multi-task training, HyperLearner is able
to utilize the information of given features and improve detection performance
without extra inputs in inference. The experimental results on multiple
pedestrian benchmarks validate the effectiveness of the proposed HyperLearner.Comment: Accepted to IEEE International Conference on Computer Vision and
Pattern Recognition (CVPR) 201
Edge- and region-based processes of 2nd-order vision
The human visual system is sensitive to 2nd-order image properties (often called texture properties). Spatial gradients in certain 2nd-order properties are edge-based, in that contours are effortlessly perceived through a rapid segmentation process. Others, however, are region-based, in that they require regional integration in order to be discriminated. The five studies reported in this thesis explore these mechanisms of 2nd-order vision, referred to respectively as segmentation and discrimination. Study one compares the segmentation and discrimination of 2nd-order stimuli and uses flicker-defined-form to demonstrate that the former may be subserved by phase-insensitive mechanisms. In study two, through testing of a neuropsychological patient, it is shown that 2nd-order segmentation is achieved relatively early in the visual system and, contrary to some claims, does not require the region termed human “V4”. Study three demonstrates, through selective adaptation aftereffects, that orientation variance (a 2nd-order regional property) is encoded by a dedicated mechanism tuned broadly to high and low variance and insensitive to low-level pattern information. Furthermore, the finding that the variance-specific aftereffect is limited to a retinotopic (not spatiotopic) reference frame, and that a neuropsychological patient with mid- to high-level visual cortical damage retains some sensitivity to variance, suggests that this regional property may be encoded at an earlier cortical site than previously assumed. Study four examines how cues from different 2nd-order channels are temporally integrated to allow cue-invariant segmentation. Results from testing a patient with bilateral lateral occipital damage and from selective visual field testing in normal observers suggest that this is achieved prior to the level of lateral occipital complex, but at least at the level of V2. The final study demonstrates that objects that are segmented rapidly by 2nd-order channels are processed at a sufficiently high cortical level as to allow object-based attention without those objects ever reaching awareness
Visual Saliency Based on Multiscale Deep Features
Visual saliency is a fundamental problem in both cognitive and computational
sciences, including computer vision. In this CVPR 2015 paper, we discover that
a high-quality visual saliency model can be trained with multiscale features
extracted using a popular deep learning architecture, convolutional neural
networks (CNNs), which have had many successes in visual recognition tasks. For
learning such saliency models, we introduce a neural network architecture,
which has fully connected layers on top of CNNs responsible for extracting
features at three different scales. We then propose a refinement method to
enhance the spatial coherence of our saliency results. Finally, aggregating
multiple saliency maps computed for different levels of image segmentation can
further boost the performance, yielding saliency maps better than those
generated from a single segmentation. To promote further research and
evaluation of visual saliency models, we also construct a new large database of
4447 challenging images and their pixelwise saliency annotation. Experimental
results demonstrate that our proposed method is capable of achieving
state-of-the-art performance on all public benchmarks, improving the F-Measure
by 5.0% and 13.2% respectively on the MSRA-B dataset and our new dataset
(HKU-IS), and lowering the mean absolute error by 5.7% and 35.1% respectively
on these two datasets.Comment: To appear in CVPR 201
The role of terminators and occlusion cues in motion integration and segmentation: a neural network model
The perceptual interaction of terminators and occlusion cues with the functional processes of motion integration and segmentation is examined using a computational model. Inte-gration is necessary to overcome noise and the inherent ambiguity in locally measured motion direction (the aperture problem). Segmentation is required to detect the presence of motion discontinuities and to prevent spurious integration of motion signals between objects with different trajectories. Terminators are used for motion disambiguation, while occlusion cues are used to suppress motion noise at points where objects intersect. The model illustrates how competitive and cooperative interactions among cells carrying out these functions can account for a number of perceptual effects, including the chopsticks illusion and the occluded diamond illusion. Possible links to the neurophysiology of the middle temporal visual area (MT) are suggested
Asynchrony in image analysis: using the luminance-to-response-latency relationship to improve segmentation
We deal with the probiem of segmenting static images, a procedure known to be difficult in the case of very
noisy patterns, The proposed approach rests on the transformation of a static image into a data flow in which
the first image points to be processed are the brighter ones. This solution, inspired by human perception, in
which strong luminances elicit reactions from the visual system before weaker ones, has led to the notion of
asynchronous processing. The asynchronous processing of image points has required the design of a specific
architecture that exploits time differences in the processing of information. The results otained when very
noisy images are segmented demonstrate the strengths of this architecture; they also suggest extensions of
the approach to other computer vision problem
- …