13 research outputs found
On the Importance of Visual Context for Data Augmentation in Scene Understanding
Performing data augmentation for learning deep neural networks is known to be
important for training visual recognition systems. By artificially increasing
the number of training examples, it helps reducing overfitting and improves
generalization. While simple image transformations can already improve
predictive performance in most vision tasks, larger gains can be obtained by
leveraging task-specific prior knowledge. In this work, we consider object
detection, semantic and instance segmentation and augment the training images
by blending objects in existing scenes, using instance segmentation
annotations. We observe that randomly pasting objects on images hurts the
performance, unless the object is placed in the right context. To resolve this
issue, we propose an explicit context model by using a convolutional neural
network, which predicts whether an image region is suitable for placing a given
object or not. In our experiments, we show that our approach is able to improve
object detection, semantic and instance segmentation on the PASCAL VOC12 and
COCO datasets, with significant gains in a limited annotation scenario, i.e.
when only one category is annotated. We also show that the method is not
limited to datasets that come with expensive pixel-wise instance annotations
and can be used when only bounding boxes are available, by employing
weakly-supervised learning for instance masks approximation.Comment: Updated the experimental section. arXiv admin note: substantial text
overlap with arXiv:1807.0742
PanDA: Panoptic Data Augmentation
The recently proposed panoptic segmentation task presents a significant challenge of image understanding with computer vision by unifying semantic segmentation and instance segmentation tasks. In this paper we present an efficient and novel panoptic data augmentation (PanDA) method which operates exclusively in pixel space, requires no additional data or training, and is computationally cheap to implement. By retraining original state-of-the-art models on PanDA augmented datasets generated with a single frozen set of parameters, we show robust performance gains in panoptic segmentation, instance segmentation, as well as detection across models, backbones, dataset domains, and scales. Finally, the effectiveness of unrealistic-looking training images synthesized by PanDA suggest that one should rethink the need for image realism for efficient data augmentation
PanDA: Panoptic Data Augmentation
The recently proposed panoptic segmentation task presents a significant challenge of image understanding with computer vision by unifying semantic segmentation and instance segmentation tasks. In this paper we present an efficient and novel panoptic data augmentation (PanDA) method which operates exclusively in pixel space, requires no additional data or training, and is computationally cheap to implement. By retraining original state-of-the-art models on PanDA augmented datasets generated with a single frozen set of parameters, we show robust performance gains in panoptic segmentation, instance segmentation, as well as detection across models, backbones, dataset domains, and scales. Finally, the effectiveness of unrealistic-looking training images synthesized by PanDA suggest that one should rethink the need for image realism for efficient data augmentation
Contextualizing Multiple Tasks via Learning to Decompose
One single instance could possess multiple portraits and reveal diverse
relationships with others according to different contexts. Those ambiguities
increase the difficulty of learning a generalizable model when there exists one
concept or mixed concepts in a task. We propose a general approach Learning to
Decompose Network (LeadNet) for both two cases, which contextualizes a model
through meta-learning multiple maps for concepts discovery -- the
representations of instances are decomposed and adapted conditioned on the
contexts. Through taking a holistic view over multiple latent components over
instances in a sampled pseudo task, LeadNet learns to automatically select the
right concept via incorporating those rich semantics inside and between
objects. LeadNet demonstrates its superiority in various applications,
including exploring multiple views of confusing tasks, out-of-distribution
recognition, and few-shot image classification