12,255 research outputs found

    Recurrent Segmentation for Variable Computational Budgets

    Full text link
    State-of-the-art systems for semantic image segmentation use feed-forward pipelines with fixed computational costs. Building an image segmentation system that works across a range of computational budgets is challenging and time-intensive as new architectures must be designed and trained for every computational setting. To address this problem we develop a recurrent neural network that successively improves prediction quality with each iteration. Importantly, the RNN may be deployed across a range of computational budgets by merely running the model for a variable number of iterations. We find that this architecture is uniquely suited for efficiently segmenting videos. By exploiting the segmentation of past frames, the RNN can perform video segmentation at similar quality but reduced computational cost compared to state-of-the-art image segmentation methods. When applied to static images in the PASCAL VOC 2012 and Cityscapes segmentation datasets, the RNN traces out a speed-accuracy curve that saturates near the performance of state-of-the-art segmentation methods

    Learning Video Object Segmentation with Visual Memory

    Get PDF
    This paper addresses the task of segmenting moving objects in unconstrained videos. We introduce a novel two-stream neural network with an explicit memory module to achieve this. The two streams of the network encode spatial and temporal features in a video sequence respectively, while the memory module captures the evolution of objects over time. The module to build a "visual memory" in video, i.e., a joint representation of all the video frames, is realized with a convolutional recurrent unit learned from a small number of training video sequences. Given a video frame as input, our approach assigns each pixel an object or background label based on the learned spatio-temporal features as well as the "visual memory" specific to the video, acquired automatically without any manually-annotated frames. The visual memory is implemented with convolutional gated recurrent units, which allows to propagate spatial information over time. We evaluate our method extensively on two benchmarks, DAVIS and Freiburg-Berkeley motion segmentation datasets, and show state-of-the-art results. For example, our approach outperforms the top method on the DAVIS dataset by nearly 6%. We also provide an extensive ablative analysis to investigate the influence of each component in the proposed framework

    DA-RNN: Semantic Mapping with Data Associated Recurrent Neural Networks

    Full text link
    3D scene understanding is important for robots to interact with the 3D world in a meaningful way. Most previous works on 3D scene understanding focus on recognizing geometrical or semantic properties of the scene independently. In this work, we introduce Data Associated Recurrent Neural Networks (DA-RNNs), a novel framework for joint 3D scene mapping and semantic labeling. DA-RNNs use a new recurrent neural network architecture for semantic labeling on RGB-D videos. The output of the network is integrated with mapping techniques such as KinectFusion in order to inject semantic information into the reconstructed 3D scene. Experiments conducted on a real world dataset and a synthetic dataset with RGB-D videos demonstrate the ability of our method in semantic 3D scene mapping.Comment: Published in RSS 201
    corecore