5,014 research outputs found

    PanDA: Panoptic Data Augmentation

    Get PDF
    The recently proposed panoptic segmentation task presents a significant challenge of image understanding with computer vision by unifying semantic segmentation and instance segmentation tasks. In this paper we present an efficient and novel panoptic data augmentation (PanDA) method which operates exclusively in pixel space, requires no additional data or training, and is computationally cheap to implement. By retraining original state-of-the-art models on PanDA augmented datasets generated with a single frozen set of parameters, we show robust performance gains in panoptic segmentation, instance segmentation, as well as detection across models, backbones, dataset domains, and scales. Finally, the effectiveness of unrealistic-looking training images synthesized by PanDA suggest that one should rethink the need for image realism for efficient data augmentation

    Augmented Reality Meets Computer Vision : Efficient Data Generation for Urban Driving Scenes

    Full text link
    The success of deep learning in computer vision is based on availability of large annotated datasets. To lower the need for hand labeled images, virtually rendered 3D worlds have recently gained popularity. Creating realistic 3D content is challenging on its own and requires significant human effort. In this work, we propose an alternative paradigm which combines real and synthetic data for learning semantic instance segmentation and object detection models. Exploiting the fact that not all aspects of the scene are equally important for this task, we propose to augment real-world imagery with virtual objects of the target category. Capturing real-world images at large scale is easy and cheap, and directly provides real background appearances without the need for creating complex 3D models of the environment. We present an efficient procedure to augment real images with virtual objects. This allows us to create realistic composite images which exhibit both realistic background appearance and a large number of complex object arrangements. In contrast to modeling complete 3D environments, our augmentation approach requires only a few user interactions in combination with 3D shapes of the target object. Through extensive experimentation, we conclude the right set of parameters to produce augmented data which can maximally enhance the performance of instance segmentation models. Further, we demonstrate the utility of our approach on training standard deep models for semantic instance segmentation and object detection of cars in outdoor driving scenes. We test the models trained on our augmented data on the KITTI 2015 dataset, which we have annotated with pixel-accurate ground truth, and on Cityscapes dataset. Our experiments demonstrate that models trained on augmented imagery generalize better than those trained on synthetic data or models trained on limited amount of annotated real data

    On the Importance of Visual Context for Data Augmentation in Scene Understanding

    Get PDF
    Performing data augmentation for learning deep neural networks is known to be important for training visual recognition systems. By artificially increasing the number of training examples, it helps reducing overfitting and improves generalization. While simple image transformations can already improve predictive performance in most vision tasks, larger gains can be obtained by leveraging task-specific prior knowledge. In this work, we consider object detection, semantic and instance segmentation and augment the training images by blending objects in existing scenes, using instance segmentation annotations. We observe that randomly pasting objects on images hurts the performance, unless the object is placed in the right context. To resolve this issue, we propose an explicit context model by using a convolutional neural network, which predicts whether an image region is suitable for placing a given object or not. In our experiments, we show that our approach is able to improve object detection, semantic and instance segmentation on the PASCAL VOC12 and COCO datasets, with significant gains in a limited annotation scenario, i.e. when only one category is annotated. We also show that the method is not limited to datasets that come with expensive pixel-wise instance annotations and can be used when only bounding boxes are available, by employing weakly-supervised learning for instance masks approximation.Comment: Updated the experimental section. arXiv admin note: substantial text overlap with arXiv:1807.0742

    Modeling Camera Effects to Improve Visual Learning from Synthetic Data

    Full text link
    Recent work has focused on generating synthetic imagery to increase the size and variability of training data for learning visual tasks in urban scenes. This includes increasing the occurrence of occlusions or varying environmental and weather effects. However, few have addressed modeling variation in the sensor domain. Sensor effects can degrade real images, limiting generalizability of network performance on visual tasks trained on synthetic data and tested in real environments. This paper proposes an efficient, automatic, physically-based augmentation pipeline to vary sensor effects --chromatic aberration, blur, exposure, noise, and color cast-- for synthetic imagery. In particular, this paper illustrates that augmenting synthetic training datasets with the proposed pipeline reduces the domain gap between synthetic and real domains for the task of object detection in urban driving scenes
    • …
    corecore