4,455 research outputs found
Play and Learn: Using Video Games to Train Computer Vision Models
Video games are a compelling source of annotated data as they can readily
provide fine-grained groundtruth for diverse tasks. However, it is not clear
whether the synthetically generated data has enough resemblance to the
real-world images to improve the performance of computer vision models in
practice. We present experiments assessing the effectiveness on real-world data
of systems trained on synthetic RGB images that are extracted from a video
game. We collected over 60000 synthetic samples from a modern video game with
similar conditions to the real-world CamVid and Cityscapes datasets. We provide
several experiments to demonstrate that the synthetically generated RGB images
can be used to improve the performance of deep neural networks on both image
segmentation and depth estimation. These results show that a convolutional
network trained on synthetic data achieves a similar test error to a network
that is trained on real-world data for dense image classification. Furthermore,
the synthetically generated RGB images can provide similar or better results
compared to the real-world datasets if a simple domain adaptation technique is
applied. Our results suggest that collaboration with game developers for an
accessible interface to gather data is potentially a fruitful direction for
future work in computer vision.Comment: To appear in the British Machine Vision Conference (BMVC), September
2016. -v2: fixed a typo in the reference
Double Refinement Network for Efficient Indoor Monocular Depth Estimation
Monocular depth estimation is the task of obtaining a measure of distance for
each pixel using a single image. It is an important problem in computer vision
and is usually solved using neural networks. Though recent works in this area
have shown significant improvement in accuracy, the state-of-the-art methods
tend to require massive amounts of memory and time to process an image. The
main purpose of this work is to improve the performance of the latest solutions
with no decrease in accuracy. To this end, we introduce the Double Refinement
Network architecture. The proposed method achieves state-of-the-art results on
the standard benchmark RGB-D dataset NYU Depth v2, while its frames per second
rate is significantly higher (up to 18 times speedup per image at batch size 1)
and the RAM usage per image is lower
- …