We propose a novel approach to synthesizing images that are effective for
training object detectors. Starting from a small set of real images, our
algorithm estimates the rendering parameters required to synthesize similar
images given a coarse 3D model of the target object. These parameters can then
be reused to generate an unlimited number of training images of the object of
interest in arbitrary 3D poses, which can then be used to increase
classification performances.
  A key insight of our approach is that the synthetically generated images
should be similar to real images, not in terms of image quality, but rather in
terms of features used during the detector training. We show in the context of
drone, plane, and car detection that using such synthetically generated images
yields significantly better performances than simply perturbing real images or
even synthesizing images in such way that they look very realistic, as is often
done when only limited amounts of training data are available

Fua, Pascal

Lepetit, Vincent

Rozantsev, Artem

English

arXiv

We propose a novel approach to synthesizing images that are effective for training object detectors. Starting from a small set of real images, our algorithm estimates the rendering parameters required to synthesize similar images given a coarse 3D model of the target object. These parameters can then be reused to generate an unlimited number of training images of the object of interest in arbitrary 3D poses, which can then be used to increase classification performances. A key insight of our approach is that the synthetically generated images should be similar to real images, not in terms of image quality, but rather in terms of features used during the detector training. We show in the context of drone, plane, and car detection that using such synthetically generated images yields significantly better performances than simply perturbing real images or even synthesizing images in such way that they look very realistic, as is often done when only limited amounts of training data are available.CVLA

Infoscience - École polytechnique fédérale de Lausanne

On Rendering Synthetic Images for Training an Object Detector

We propose a novel approach to synthesizing images that are effective for training object detectors. Starting from a small set of real images, our algorithm estimates the rendering parameters required to synthesize similar images given a coarse 3D model of the target object. These parameters can then be reused to generate an unlimited number of training images of the object of interest in arbitrary 3D poses, which can then be used to increase classification performances. A key insight of our approach is that the synthetically generated images should be similar to real images, not in terms of image quality, but rather in terms of features used during the classifier training. We demonstrate the benefits of using such synthetically generated images in the context of drone detection, where limited amount of training data is available.CVLA

000001002003004005006007008009010011012013014015016017018019020021022023024025026027028029030031032033034035036037038039040041042043044000001002003004005006007008009010011012013014015016017018019020021022023024025026027028029030031032033034035036037038039040041042043044ECCV#1421ECCV#1421On Rendering Synthetic Images for Training an ObjectDetectorSupplementary MaterialAnonymous ECCV submissionPaper ID 14211 Influences of the Similarity MeasuresAs discussed in the main submission, the similarity measure based on the same familyof image features as the detection framework yields the best performances. Table 1 ofthe main submission illustrates the influence of different similarity measures on thedetection accuracy of various classifiers. We provide the corresponding plots here.(a) DPM (b) AdaBoost (c) CNNFig. 1. The similarity measure has a strong influence on the final performance. We achieve betterperformance when using the similarity measure that relies on the same family of image featuresas the detection framework (best seen in color)DPM AdaBoost CNNFig. 2. Performances when using the Euclidean distance as a similarity measure. (best seen incolor)045046047048049050051052053054055056057058059060061062063064065066067068069070071072073074075076077078079080081082083084085086087088089045046047048049050051052053054055056057058059060061062063064065066067068069070071072073074075076077078079080081082083084085086087088089ECCV#1421ECCV#14212 ECCV-14 submission ID 1421Fig. 1 shows the precision-recall curves for all possible combinations of the de-tection methods and the similarity measures we consider, and confirms that using thesimilarity measure that relies on the same family of images features as the detectionframework yields better performance.As shown in Fig. 2, relying on the Euclidean distance as a similarity measure tooptimize the rendering parameters actually degrades the final performances for all thedetectors.2 Importance of the Rendering ParametersTo check whether the rendering effects have all a positive influence, and the importanceof optimizing synthetic data generation parameters, we performed a set of evaluationsin addition to those presented in Section 5.4 and Fig. 8 of the main submission.We fixed all the capture parameters in Θ, setting their values to 0, which effectivelymeans completely discarding the influence of the corresponding effect. We then variedonly one of them and repeated this experiment with different values of the parameter,and for each of the capture parameters.Classification method Average precisionBoundaries blurring:No effects σs = 1 σs = 1.5 σs = 2DPM 0.78 0.77 0.84 0.75AdaBoost 0.65 0.73 0.79 0.79CNN 0.86 0.89 0.85 0.86Motion blurring:No effects σmu = 0.3 σmu = 0.5 σmu = 1σmv = 0.3 σmv = 0.5 σmv = 1DPM 0.78 0.84 0.81 0.79AdaBoost 0.65 0.72 0.79 0.79CNN 0.86 0.87 0.88 0.89Random noise:No effects σn = 0.5 σn = 0.9 σn = 1.1DPM 0.78 0.83 0.81 0.83AdaBoost 0.65 0.75 0.79 0.75CNN 0.86 0.89 0.86 0.85Material properties:No effects wd = 0.5 wd = 1 wd = 2DPM 0.78 0.83 0.84 0.86AdaBoost 0.65 0.70 0.76 0.66CNN 0.86 0.88 0.89 0.81Table 1. Influence of various post-processing effects on the detection accuracy of different detec-tors.090091092093094095096097098099100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134090091092093094095096097098099100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134ECCV#1421ECCV#1421ECCV-14 submission ID 1421 3The results are shown in Table 1 and Fig. 3. This shows that all the classifiers benefitfrom the application of every single post-processing effect.To further highlight the effectiveness of the rendering parameters, in Fig. 4 we alsocompared the performance of every detector, trained on the real and synthetic data thatwas generated without using any of the post-processing steps with the ones when syn-thetic data was generated using all the post-processing steps with the parameters, opti-mized using appropriate similarity measures.3 Importance of the Optimization over the Rendering ParametersTo show the importance of optimizing over the rendering parameters Θ, in Fig. 5 wecompared the final performance obtained using optimized parameters with the finalperformance obtained with random parameters drawn from a uniform distribution. Theminimum and maximum values for the random parameters were taken as the minimumand maximum values of the optimised parameters.4 Rendering Parameters DistributionOur method computes the capture parameters for each available real image. To showthat these parameters are correlated in practice, Fig. 6 shows the distribution of eachpossible pair of parameters, for all the similarity measures we consider.135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179ECCV#1421ECCV#14214 ECCV-14 submission ID 1421Boundaries blurring (BB)Motion blurring (MB)Random noise (RN)Material properties (MP)DPM AdaBoost CNNFig. 3. Evaluation of the synthetic data generation effects. We fixed one capture parameter in Θand then optimized the other parameters using the best similarity measure for each classificationmethod. Each effect has clearly a positive influence of the quality of the synthetic data, howeverthe impact is different for every classification method. (best seen in color)180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224ECCV#1421ECCV#1421ECCV-14 submission ID 1421 5DPM AdaBoost CNNFig. 4. Performances when using real data only, synthetic data without any post-processing effectsand synthetic data with all the introduced effects, with the Θ parameters optimised according tothe appropriate for every detector similarity measures. (best seen in color)DPM AdaBoost CNNUsing real Random Optimisedimages only parameters parametersClassification method: Average precision:DPM 0.84 0.82 0.93AdaBoost 0.80 0.82 0.92CNN 0.85 0.87 0.89Fig. 5. Comparison of the performances of different classifiers trained on real and synthetic datagenerated using corresponding similarity measures with those where the capture parameters arerandomly selected. The optimized parameters always yield better performance. (best seen incolor)225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269ECCV#1421ECCV#14216 ECCV-14 submission ID 1421dHoG(., .)dLWL(., .)dRWL(., .)dCNN(., .)270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314ECCV#1421ECCV#1421ECCV-14 submission ID 1421 7dEucl(., .)Fig. 6. Joint distributions of each possible pair of capture parameters, optimised using differentsimilarity measures. The different parameters are clearly correlated, in a complex way. (best seenin color)

On Rendering Synthetic Images for Training an Object Detector

Abstract

Similar works

Full text

Available Versions

Infoscience - École polytechnique fédérale de Lausanne

Infoscience - École polytechnique fédérale de Lausanne