24 research outputs found
Double Refinement Network for Efficient Indoor Monocular Depth Estimation
Monocular depth estimation is the task of obtaining a measure of distance for
each pixel using a single image. It is an important problem in computer vision
and is usually solved using neural networks. Though recent works in this area
have shown significant improvement in accuracy, the state-of-the-art methods
tend to require massive amounts of memory and time to process an image. The
main purpose of this work is to improve the performance of the latest solutions
with no decrease in accuracy. To this end, we introduce the Double Refinement
Network architecture. The proposed method achieves state-of-the-art results on
the standard benchmark RGB-D dataset NYU Depth v2, while its frames per second
rate is significantly higher (up to 18 times speedup per image at batch size 1)
and the RAM usage per image is lower
Independent Component Alignment for Multi-Task Learning
In a multi-task learning (MTL) setting, a single model is trained to tackle a
diverse set of tasks jointly. Despite rapid progress in the field, MTL remains
challenging due to optimization issues such as conflicting and dominating
gradients. In this work, we propose using a condition number of a linear system
of gradients as a stability criterion of an MTL optimization. We theoretically
demonstrate that a condition number reflects the aforementioned optimization
issues. Accordingly, we present Aligned-MTL, a novel MTL optimization approach
based on the proposed criterion, that eliminates instability in the training
process by aligning the orthogonal components of the linear system of
gradients. While many recent MTL approaches guarantee convergence to a minimum,
task trade-offs cannot be specified in advance. In contrast, Aligned-MTL
provably converges to an optimal point with pre-defined task-specific weights,
which provides more control over the optimization result. Through experiments,
we show that the proposed approach consistently improves performance on a
diverse set of MTL benchmarks, including semantic and instance segmentation,
depth estimation, surface normal estimation, and reinforcement learning. The
source code is publicly available at https://github.com/SamsungLabs/MTL