146,368 research outputs found

    Convolutional Feature Masking for Joint Object and Stuff Segmentation

    Full text link
    The topic of semantic segmentation has witnessed considerable progress due to the powerful features learned by convolutional neural networks (CNNs). The current leading approaches for semantic segmentation exploit shape information by extracting CNN features from masked image regions. This strategy introduces artificial boundaries on the images and may impact the quality of the extracted features. Besides, the operations on the raw image domain require to compute thousands of networks on a single image, which is time-consuming. In this paper, we propose to exploit shape information via masking convolutional features. The proposal segments (e.g., super-pixels) are treated as masks on the convolutional feature maps. The CNN features of segments are directly masked out from these maps and used to train classifiers for recognition. We further propose a joint method to handle objects and "stuff" (e.g., grass, sky, water) in the same framework. State-of-the-art results are demonstrated on benchmarks of PASCAL VOC and new PASCAL-CONTEXT, with a compelling computational speed.Comment: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 201

    Effective Use of Dilated Convolutions for Segmenting Small Object Instances in Remote Sensing Imagery

    Full text link
    Thanks to recent advances in CNNs, solid improvements have been made in semantic segmentation of high resolution remote sensing imagery. However, most of the previous works have not fully taken into account the specific difficulties that exist in remote sensing tasks. One of such difficulties is that objects are small and crowded in remote sensing imagery. To tackle with this challenging task we have proposed a novel architecture called local feature extraction (LFE) module attached on top of dilated front-end module. The LFE module is based on our findings that aggressively increasing dilation factors fails to aggregate local features due to sparsity of the kernel, and detrimental to small objects. The proposed LFE module solves this problem by aggregating local features with decreasing dilation factor. We tested our network on three remote sensing datasets and acquired remarkably good results for all datasets especially for small objects

    Fusion of aerial images and sensor data from a ground vehicle for improved semantic mapping

    Get PDF
    This work investigates the use of semantic information to link ground level occupancy maps and aerial images. A ground level semantic map, which shows open ground and indicates the probability of cells being occupied by walls of buildings, is obtained by a mobile robot equipped with an omnidirectional camera, GPS and a laser range finder. This semantic information is used for local and global segmentation of an aerial image. The result is a map where the semantic information has been extended beyond the range of the robot sensors and predicts where the mobile robot can find buildings and potentially driveable ground

    Enhancment of dense urban digital surface models from VHR optical satellite stereo data by pre-segmentation and object detection

    Get PDF
    The generation of digital surface models (DSM) of urban areas from very high resolution (VHR) stereo satellite imagery requires advanced methods. In the classical approach of DSM generation from stereo satellite imagery, interest points are extracted and correlated between the stereo mates using an area based matching followed by a least-squares sub-pixel refinement step. After a region growing the 3D point list is triangulated to the resulting DSM. In urban areas this approach fails due to the size of the correlation window, which smoothes out the usual steep edges of buildings. Also missing correlations as for partly – in one or both of the images – occluded areas will simply be interpolated in the triangulation step. So an urban DSM generated with the classical approach results in a very smooth DSM with missing steep walls, narrow streets and courtyards. To overcome these problems algorithms from computer vision are introduced and adopted to satellite imagery. These algorithms do not work using local optimisation like the area-based matching but try to optimize a (semi-)global cost function. Analysis shows that dynamic programming approaches based on epipolar images like dynamic line warping or semiglobal matching yield the best results according to accuracy and processing time. These algorithms can also detect occlusions – areas not visible in one or both of the stereo images. Beside these also the time and memory consuming step of handling and triangulating large point lists can be omitted due to the direct operation on epipolar images and direct generation of a so called disparity image fitting exactly on the first of the stereo images. This disparity image – representing already a sort of a dense DSM – contains the distances measured in pixels in the epipolar direction (or a no-data value for a detected occlusion) for each pixel in the image. Despite the global optimization of the cost function many outliers, mismatches and erroneously detected occlusions remain, especially if only one stereo pair is available. To enhance these dense DSM – the disparity image – a pre-segmentation approach is presented in this paper. Since the disparity image is fitting exactly on the first of the two stereo partners (beforehand transformed to epipolar geometry) a direct correlation between image pixels and derived heights (the disparities) exist. This feature of the disparity image is exploited to integrate additional knowledge from the image into the DSM. This is done by segmenting the stereo image, transferring the segmentation information to the DSM and performing a statistical analysis on each of the created DSM segments. Based on this analysis and spectral information a coarse object detection and classification can be performed and in turn the DSM can be enhanced. After the description of the proposed method some results are shown and discussed

    Emergence of Object Segmentation in Perturbed Generative Models

    Get PDF
    We introduce a novel framework to build a model that can learn how to segment objects from a collection of images without any human annotation. Our method builds on the observation that the location of object segments can be perturbed locally relative to a given background without affecting the realism of a scene. Our approach is to first train a generative model of a layered scene. The layered representation consists of a background image, a foreground image and the mask of the foreground. A composite image is then obtained by overlaying the masked foreground image onto the background. The generative model is trained in an adversarial fashion against a discriminator, which forces the generative model to produce realistic composite images. To force the generator to learn a representation where the foreground layer corresponds to an object, we perturb the output of the generative model by introducing a random shift of both the foreground image and mask relative to the background. Because the generator is unaware of the shift before computing its output, it must produce layered representations that are realistic for any such random perturbation. Finally, we learn to segment an image by defining an autoencoder consisting of an encoder, which we train, and the pre-trained generator as the decoder, which we freeze. The encoder maps an image to a feature vector, which is fed as input to the generator to give a composite image matching the original input image. Because the generator outputs an explicit layered representation of the scene, the encoder learns to detect and segment objects. We demonstrate this framework on real images of several object categories.Comment: 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Spotlight presentatio
    corecore