1 research outputs found
Learning generative models of mid-level structure in natural images
Natural images arise from complicated processes involving many factors of variation.
They reflect the wealth of shapes and appearances of objects in our three-dimensional
world, but they are also affected by factors such as distortions due to perspective, occlusions,
and illumination, giving rise to structure with regularities at many different
levels. Prior knowledge about these regularities and suitable representations that allow
efficient reasoning about the properties of a visual scene are important for many image
processing and computer vision tasks. This thesis focuses on models of image structure
at intermediate levels of complexity as required, for instance, for image inpainting
or segmentation. It aims at developing generative, probabilistic models of this kind of
structure, and, in particular, at devising strategies for learning such models in a largely
unsupervised manner from data.
One hallmark of natural images is that they can often be decomposed into regions
with very different visual characteristics. The main approach of this thesis is therefore
to represent images in terms of regions that are characterized by their shapes and
appearances, and an image is then composed from many such regions. We explore
approaches to learn about the appearance of regions, to learn about region shapes, and
ways to combine several regions to form a full image. To achieve this goal, we make
use of some ideas for unsupervised learning developed in the literature on models of
low-level image structure and in the “deep learning” literature. These models are used
as building blocks of more structured model formulations that incorporate additional
prior knowledge of how images are formed.
The thesis makes the following contributions: Firstly, we investigate a popular,
MRF based prior of natural image structure, the Field-of Experts, with respect to its
ability to model image textures, and propose an extended formulation that is considerably
more successful at this task. This formulation gives rise to a fully parametric,
translation-invariant probabilistic generative model of image textures. We illustrate
how this model can be used as a component of a more comprehensive model of images
comprising multiple textured regions. Secondly, we develop a model of region shape.
This work is an extension of the “Masked Restricted Boltzmann Machine” proposed by
Le Roux et al. (2011) and it allows explicit reasoning about the independent shapes and
relative depths of occluding objects. We develop an inference and unsupervised learning
scheme and demonstrate how this shape model, in combination with the masked
RBM gives rise to a good model of natural image patches. Finally, we demonstrate how this model of region shape can be extended to model shapes in large images. The
result is a generative model of large images which are formed by composition from
many small, partially overlapping and occluding objects