6,794 research outputs found
On Rendering Synthetic Images for Training an Object Detector
We propose a novel approach to synthesizing images that are effective for
training object detectors. Starting from a small set of real images, our
algorithm estimates the rendering parameters required to synthesize similar
images given a coarse 3D model of the target object. These parameters can then
be reused to generate an unlimited number of training images of the object of
interest in arbitrary 3D poses, which can then be used to increase
classification performances.
A key insight of our approach is that the synthetically generated images
should be similar to real images, not in terms of image quality, but rather in
terms of features used during the detector training. We show in the context of
drone, plane, and car detection that using such synthetically generated images
yields significantly better performances than simply perturbing real images or
even synthesizing images in such way that they look very realistic, as is often
done when only limited amounts of training data are available
Synthetic Data-based Detection of Zebras in Drone Imagery
Nowadays, there is a wide availability of datasets that enable the training
of common object detectors or human detectors. These come in the form of
labelled real-world images and require either a significant amount of human
effort, with a high probability of errors such as missing labels, or very
constrained scenarios, e.g. VICON systems. On the other hand, uncommon
scenarios, like aerial views, animals, like wild zebras, or difficult-to-obtain
information, such as human shapes, are hardly available. To overcome this,
synthetic data generation with realistic rendering technologies has recently
gained traction and advanced research areas such as target tracking and human
pose estimation. However, subjects such as wild animals are still usually not
well represented in such datasets. In this work, we first show that a
pre-trained YOLO detector can not identify zebras in real images recorded from
aerial viewpoints. To solve this, we present an approach for training an animal
detector using only synthetic data. We start by generating a novel synthetic
zebra dataset using GRADE, a state-of-the-art framework for data generation.
The dataset includes RGB, depth, skeletal joint locations, pose, shape and
instance segmentations for each subject. We use this to train a YOLO detector
from scratch. Through extensive evaluations of our model with real-world data
from i) limited datasets available on the internet and ii) a new one collected
and manually labelled by us, we show that we can detect zebras by using only
synthetic data during training. The code, results, trained models, and both the
generated and training data are provided as open-source at
https://eliabntt.github.io/grade-rr.Comment: 8 pages, 7 figures, 3 tables. Published in IEEE ECMR 202
Augmented Reality Meets Computer Vision : Efficient Data Generation for Urban Driving Scenes
The success of deep learning in computer vision is based on availability of
large annotated datasets. To lower the need for hand labeled images, virtually
rendered 3D worlds have recently gained popularity. Creating realistic 3D
content is challenging on its own and requires significant human effort. In
this work, we propose an alternative paradigm which combines real and synthetic
data for learning semantic instance segmentation and object detection models.
Exploiting the fact that not all aspects of the scene are equally important for
this task, we propose to augment real-world imagery with virtual objects of the
target category. Capturing real-world images at large scale is easy and cheap,
and directly provides real background appearances without the need for creating
complex 3D models of the environment. We present an efficient procedure to
augment real images with virtual objects. This allows us to create realistic
composite images which exhibit both realistic background appearance and a large
number of complex object arrangements. In contrast to modeling complete 3D
environments, our augmentation approach requires only a few user interactions
in combination with 3D shapes of the target object. Through extensive
experimentation, we conclude the right set of parameters to produce augmented
data which can maximally enhance the performance of instance segmentation
models. Further, we demonstrate the utility of our approach on training
standard deep models for semantic instance segmentation and object detection of
cars in outdoor driving scenes. We test the models trained on our augmented
data on the KITTI 2015 dataset, which we have annotated with pixel-accurate
ground truth, and on Cityscapes dataset. Our experiments demonstrate that
models trained on augmented imagery generalize better than those trained on
synthetic data or models trained on limited amount of annotated real data
What is Holding Back Convnets for Detection?
Convolutional neural networks have recently shown excellent results in
general object detection and many other tasks. Albeit very effective, they
involve many user-defined design choices. In this paper we want to better
understand these choices by inspecting two key aspects "what did the network
learn?", and "what can the network learn?". We exploit new annotations
(Pascal3D+), to enable a new empirical analysis of the R-CNN detector. Despite
common belief, our results indicate that existing state-of-the-art convnet
architectures are not invariant to various appearance factors. In fact, all
considered networks have similar weak points which cannot be mitigated by
simply increasing the training data (architectural changes are needed). We show
that overall performance can improve when using image renderings for data
augmentation. We report the best known results on the Pascal3D+ detection and
view-point estimation tasks
- …