60,695 research outputs found

    Insignificant shadow detection for video segmentation

    Get PDF
    To prevent moving cast shadows from being misunderstood as part of moving objects in change detection based video segmentation, this paper proposes a novel approach to the cast shadow detection based on the edge and region information in multiple frames. First, an initial change detection mask containing moving objects and cast shadows is obtained. Then a Canny edge map is generated. After that, the shadow region is detected and removed through multiframe integration, edge matching, and region growing. Finally, a post processing procedure is used to eliminate noise and tune the boundaries of the objects. Our approach can be used for video segmentation in indoor environment. The experimental results demonstrate its good performance

    What Can Help Pedestrian Detection?

    Full text link
    Aggregating extra features has been considered as an effective approach to boost traditional pedestrian detection methods. However, there is still a lack of studies on whether and how CNN-based pedestrian detectors can benefit from these extra features. The first contribution of this paper is exploring this issue by aggregating extra features into CNN-based pedestrian detection framework. Through extensive experiments, we evaluate the effects of different kinds of extra features quantitatively. Moreover, we propose a novel network architecture, namely HyperLearner, to jointly learn pedestrian detection as well as the given extra feature. By multi-task training, HyperLearner is able to utilize the information of given features and improve detection performance without extra inputs in inference. The experimental results on multiple pedestrian benchmarks validate the effectiveness of the proposed HyperLearner.Comment: Accepted to IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 201

    Edge- and region-based processes of 2nd-order vision

    Get PDF
    The human visual system is sensitive to 2nd-order image properties (often called texture properties). Spatial gradients in certain 2nd-order properties are edge-based, in that contours are effortlessly perceived through a rapid segmentation process. Others, however, are region-based, in that they require regional integration in order to be discriminated. The five studies reported in this thesis explore these mechanisms of 2nd-order vision, referred to respectively as segmentation and discrimination. Study one compares the segmentation and discrimination of 2nd-order stimuli and uses flicker-defined-form to demonstrate that the former may be subserved by phase-insensitive mechanisms. In study two, through testing of a neuropsychological patient, it is shown that 2nd-order segmentation is achieved relatively early in the visual system and, contrary to some claims, does not require the region termed human “V4”. Study three demonstrates, through selective adaptation aftereffects, that orientation variance (a 2nd-order regional property) is encoded by a dedicated mechanism tuned broadly to high and low variance and insensitive to low-level pattern information. Furthermore, the finding that the variance-specific aftereffect is limited to a retinotopic (not spatiotopic) reference frame, and that a neuropsychological patient with mid- to high-level visual cortical damage retains some sensitivity to variance, suggests that this regional property may be encoded at an earlier cortical site than previously assumed. Study four examines how cues from different 2nd-order channels are temporally integrated to allow cue-invariant segmentation. Results from testing a patient with bilateral lateral occipital damage and from selective visual field testing in normal observers suggest that this is achieved prior to the level of lateral occipital complex, but at least at the level of V2. The final study demonstrates that objects that are segmented rapidly by 2nd-order channels are processed at a sufficiently high cortical level as to allow object-based attention without those objects ever reaching awareness

    Visual Saliency Based on Multiscale Deep Features

    Get PDF
    Visual saliency is a fundamental problem in both cognitive and computational sciences, including computer vision. In this CVPR 2015 paper, we discover that a high-quality visual saliency model can be trained with multiscale features extracted using a popular deep learning architecture, convolutional neural networks (CNNs), which have had many successes in visual recognition tasks. For learning such saliency models, we introduce a neural network architecture, which has fully connected layers on top of CNNs responsible for extracting features at three different scales. We then propose a refinement method to enhance the spatial coherence of our saliency results. Finally, aggregating multiple saliency maps computed for different levels of image segmentation can further boost the performance, yielding saliency maps better than those generated from a single segmentation. To promote further research and evaluation of visual saliency models, we also construct a new large database of 4447 challenging images and their pixelwise saliency annotation. Experimental results demonstrate that our proposed method is capable of achieving state-of-the-art performance on all public benchmarks, improving the F-Measure by 5.0% and 13.2% respectively on the MSRA-B dataset and our new dataset (HKU-IS), and lowering the mean absolute error by 5.7% and 35.1% respectively on these two datasets.Comment: To appear in CVPR 201

    The role of terminators and occlusion cues in motion integration and segmentation: a neural network model

    Get PDF
    The perceptual interaction of terminators and occlusion cues with the functional processes of motion integration and segmentation is examined using a computational model. Inte-gration is necessary to overcome noise and the inherent ambiguity in locally measured motion direction (the aperture problem). Segmentation is required to detect the presence of motion discontinuities and to prevent spurious integration of motion signals between objects with different trajectories. Terminators are used for motion disambiguation, while occlusion cues are used to suppress motion noise at points where objects intersect. The model illustrates how competitive and cooperative interactions among cells carrying out these functions can account for a number of perceptual effects, including the chopsticks illusion and the occluded diamond illusion. Possible links to the neurophysiology of the middle temporal visual area (MT) are suggested

    Asynchrony in image analysis: using the luminance-to-response-latency relationship to improve segmentation

    Get PDF
    We deal with the probiem of segmenting static images, a procedure known to be difficult in the case of very noisy patterns, The proposed approach rests on the transformation of a static image into a data flow in which the first image points to be processed are the brighter ones. This solution, inspired by human perception, in which strong luminances elicit reactions from the visual system before weaker ones, has led to the notion of asynchronous processing. The asynchronous processing of image points has required the design of a specific architecture that exploits time differences in the processing of information. The results otained when very noisy images are segmented demonstrate the strengths of this architecture; they also suggest extensions of the approach to other computer vision problem
    corecore