Deep learning with 3D and label geometry

Abstract

A fine-grained understanding of an image is two-fold: visual understanding and semantic understanding. The former strives to understand the intrinsic properties of the object in the image, whereas the latter aims at associating the diverse objects with certain semantics. All of these form the basis of an in-depth understanding of images. Today’s default architectures of deep convolutional networks have already shown a remarkable ability in capturing the 2D visual appearances of images, and mapping visual content to semantic classes thereafter. However, research on fine-grained image understanding, such as inferring the intrinsic 3D information and more structured semantics, is less explored. In this thesis, we look at the problems by asking "How to better utilize geometry for better image understanding?" In the first part, we research visual image understanding with 3D geometry. We show that it is possible to automatically explain a variety of visual contents in the image with texture-free 3D shapes. Furthermore, we develop a deep learning framework to reliably recover a set of 3D geometric attributes, such as the pose of an object and the surface normal of its shape, from a 2D image. In the second part, we explore label geometry for semantic image understanding. We find that a set of image classification problems have geometrically similar probability spaces. Therefore, label geometry is introduced, unifying one-vs.-rest classification, multi-label classification, and out-of-distribution classification in one framework. Moreover, we show that learned hierarchical label geometries can balance the accuracy and specificity of an image classifier

    Similar works