Monocular depth estimation is the task of obtaining a measure of distance for
each pixel using a single image. It is an important problem in computer vision
and is usually solved using neural networks. Though recent works in this area
have shown significant improvement in accuracy, the state-of-the-art methods
tend to require massive amounts of memory and time to process an image. The
main purpose of this work is to improve the performance of the latest solutions
with no decrease in accuracy. To this end, we introduce the Double Refinement
Network architecture. The proposed method achieves state-of-the-art results on
the standard benchmark RGB-D dataset NYU Depth v2, while its frames per second
rate is significantly higher (up to 18 times speedup per image at batch size 1)
and the RAM usage per image is lower