50,390 research outputs found

    What Can Help Pedestrian Detection?

    Full text link
    Aggregating extra features has been considered as an effective approach to boost traditional pedestrian detection methods. However, there is still a lack of studies on whether and how CNN-based pedestrian detectors can benefit from these extra features. The first contribution of this paper is exploring this issue by aggregating extra features into CNN-based pedestrian detection framework. Through extensive experiments, we evaluate the effects of different kinds of extra features quantitatively. Moreover, we propose a novel network architecture, namely HyperLearner, to jointly learn pedestrian detection as well as the given extra feature. By multi-task training, HyperLearner is able to utilize the information of given features and improve detection performance without extra inputs in inference. The experimental results on multiple pedestrian benchmarks validate the effectiveness of the proposed HyperLearner.Comment: Accepted to IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 201

    Visual Saliency Based on Multiscale Deep Features

    Get PDF
    Visual saliency is a fundamental problem in both cognitive and computational sciences, including computer vision. In this CVPR 2015 paper, we discover that a high-quality visual saliency model can be trained with multiscale features extracted using a popular deep learning architecture, convolutional neural networks (CNNs), which have had many successes in visual recognition tasks. For learning such saliency models, we introduce a neural network architecture, which has fully connected layers on top of CNNs responsible for extracting features at three different scales. We then propose a refinement method to enhance the spatial coherence of our saliency results. Finally, aggregating multiple saliency maps computed for different levels of image segmentation can further boost the performance, yielding saliency maps better than those generated from a single segmentation. To promote further research and evaluation of visual saliency models, we also construct a new large database of 4447 challenging images and their pixelwise saliency annotation. Experimental results demonstrate that our proposed method is capable of achieving state-of-the-art performance on all public benchmarks, improving the F-Measure by 5.0% and 13.2% respectively on the MSRA-B dataset and our new dataset (HKU-IS), and lowering the mean absolute error by 5.7% and 35.1% respectively on these two datasets.Comment: To appear in CVPR 201

    The role of terminators and occlusion cues in motion integration and segmentation: a neural network model

    Get PDF
    The perceptual interaction of terminators and occlusion cues with the functional processes of motion integration and segmentation is examined using a computational model. Inte-gration is necessary to overcome noise and the inherent ambiguity in locally measured motion direction (the aperture problem). Segmentation is required to detect the presence of motion discontinuities and to prevent spurious integration of motion signals between objects with different trajectories. Terminators are used for motion disambiguation, while occlusion cues are used to suppress motion noise at points where objects intersect. The model illustrates how competitive and cooperative interactions among cells carrying out these functions can account for a number of perceptual effects, including the chopsticks illusion and the occluded diamond illusion. Possible links to the neurophysiology of the middle temporal visual area (MT) are suggested

    The aceToolbox: low-level audiovisual feature extraction for retrieval and classification

    Get PDF
    In this paper we present an overview of a software platform that has been developed within the aceMedia project, termed the aceToolbox, that provides global and local lowlevel feature extraction from audio-visual content. The toolbox is based on the MPEG-7 eXperimental Model (XM), with extensions to provide descriptor extraction from arbitrarily shaped image segments, thereby supporting local descriptors reflecting real image content. We describe the architecture of the toolbox as well as providing an overview of the descriptors supported to date. We also briefly describe the segmentation algorithm provided. We then demonstrate the usefulness of the toolbox in the context of two different content processing scenarios: similarity-based retrieval in large collections and scene-level classification of still images

    Analysis of Amoeba Active Contours

    Full text link
    Subject of this paper is the theoretical analysis of structure-adaptive median filter algorithms that approximate curvature-based PDEs for image filtering and segmentation. These so-called morphological amoeba filters are based on a concept introduced by Lerallut et al. They achieve similar results as the well-known geodesic active contour and self-snakes PDEs. In the present work, the PDE approximated by amoeba active contours is derived for a general geometric situation and general amoeba metric. This PDE is structurally similar but not identical to the geodesic active contour equation. It reproduces the previous PDE approximation results for amoeba median filters as special cases. Furthermore, modifications of the basic amoeba active contour algorithm are analysed that are related to the morphological force terms frequently used with geodesic active contours. Experiments demonstrate the basic behaviour of amoeba active contours and its similarity to geodesic active contours.Comment: Revised version with several improvements for clarity, slightly extended experiments and discussion. Accepted for publication in Journal of Mathematical Imaging and Visio

    Image analysis by integration of disparate information

    Get PDF
    Image analysis often starts with some preliminary segmentation which provides a representation of the scene needed for further interpretation. Segmentation can be performed in several ways, which are categorized as pixel based, edge-based, and region-based. Each of these approaches are affected differently by various factors, and the final result may be improved by integrating several or all of these methods, thus taking advantage of their complementary nature. In this paper, we propose an approach that integrates pixel-based and edge-based results by utilizing an iterative relaxation technique. This approach has been implemented on a massively parallel computer and tested on some remotely sensed imagery from the Landsat-Thematic Mapper (TM) sensor

    Insignificant shadow detection for video segmentation

    Get PDF
    To prevent moving cast shadows from being misunderstood as part of moving objects in change detection based video segmentation, this paper proposes a novel approach to the cast shadow detection based on the edge and region information in multiple frames. First, an initial change detection mask containing moving objects and cast shadows is obtained. Then a Canny edge map is generated. After that, the shadow region is detected and removed through multiframe integration, edge matching, and region growing. Finally, a post processing procedure is used to eliminate noise and tune the boundaries of the objects. Our approach can be used for video segmentation in indoor environment. The experimental results demonstrate its good performance
    corecore