336 research outputs found
Activity Driven Weakly Supervised Object Detection
Weakly supervised object detection aims at reducing the amount of supervision
required to train detection models. Such models are traditionally learned from
images/videos labelled only with the object class and not the object bounding
box. In our work, we try to leverage not only the object class labels but also
the action labels associated with the data. We show that the action depicted in
the image/video can provide strong cues about the location of the associated
object. We learn a spatial prior for the object dependent on the action (e.g.
"ball" is closer to "leg of the person" in "kicking ball"), and incorporate
this prior to simultaneously train a joint object detection and action
classification model. We conducted experiments on both video datasets and image
datasets to evaluate the performance of our weakly supervised object detection
model. Our approach outperformed the current state-of-the-art (SOTA) method by
more than 6% in mAP on the Charades video dataset.Comment: CVPR'19 camera read
Generating All the Roads to Rome: Road Layout Randomization for Improved Road Marking Segmentation
Road markings provide guidance to traffic participants and enforce safe
driving behaviour, understanding their semantic meaning is therefore paramount
in (automated) driving. However, producing the vast quantities of road marking
labels required for training state-of-the-art deep networks is costly,
time-consuming, and simply infeasible for every domain and condition. In
addition, training data retrieved from virtual worlds often lack the richness
and complexity of the real world and consequently cannot be used directly. In
this paper, we provide an alternative approach in which new road marking
training pairs are automatically generated. To this end, we apply principles of
domain randomization to the road layout and synthesize new images from altered
semantic labels. We demonstrate that training on these synthetic pairs improves
mIoU of the segmentation of rare road marking classes during real-world
deployment in complex urban environments by more than 12 percentage points,
while performance for other classes is retained. This framework can easily be
scaled to all domains and conditions to generate large-scale road marking
datasets, while avoiding manual labelling effort.Comment: presented at ITSC 201
Generative Model with Coordinate Metric Learning for Object Recognition Based on 3D Models
Given large amount of real photos for training, Convolutional neural network
shows excellent performance on object recognition tasks. However, the process
of collecting data is so tedious and the background are also limited which
makes it hard to establish a perfect database. In this paper, our generative
model trained with synthetic images rendered from 3D models reduces the
workload of data collection and limitation of conditions. Our structure is
composed of two sub-networks: semantic foreground object reconstruction network
based on Bayesian inference and classification network based on multi-triplet
cost function for avoiding over-fitting problem on monotone surface and fully
utilizing pose information by establishing sphere-like distribution of
descriptors in each category which is helpful for recognition on regular photos
according to poses, lighting condition, background and category information of
rendered images. Firstly, our conjugate structure called generative model with
metric learning utilizing additional foreground object channels generated from
Bayesian rendering as the joint of two sub-networks. Multi-triplet cost
function based on poses for object recognition are used for metric learning
which makes it possible training a category classifier purely based on
synthetic data. Secondly, we design a coordinate training strategy with the
help of adaptive noises acting as corruption on input images to help both
sub-networks benefit from each other and avoid inharmonious parameter tuning
due to different convergence speed of two sub-networks. Our structure achieves
the state of the art accuracy of over 50\% on ShapeNet database with data
migration obstacle from synthetic images to real photos. This pipeline makes it
applicable to do recognition on real images only based on 3D models.Comment: 14 page
- …