Image and video representation and modeling is an important topic in computer vision and image processing. An image model provides an abstraction of the large amount of data contained in an image and enables the systematic
development of algorithms for accomplishing a particular image-related task, such as detection, recognition and segmentation (analysis) as well as inpainting, summarization and colorization (synthesis).
Since an image is usually comprised of millions of pixels, developing models in such a high dimensional space is not always feasible.
One of the most popular ways of modeling images is to break them into patches; the reason is that not only is the dimensionality reduced, but it is easier to define similarities between patches as they experience less distortion as compared with defining similarity between images. Patch-based image models are often more flexible in modeling appearances by exploring redundancies in image and videos. By adjusting the patch size, these models trade off the good qualities of each end of the spectrum - the discriminative power of images and the representational power of pixel histograms.
When breaking an image into a collection of patches, one must be able to model two kinds of information in order to describe the image completely. On one hand, one must be able to model the patch appearance with some statistical model; on the other hand, there must be some other statistics to describe how the patches are organized together in an image. We call the first kind the "appearance model" and the second the "layout model".
In this thesis, we describe the historical progress made in the past decade starting from patch-based appearance models without considering layout information, onto how spatial modeling improves performance and enables applications in analysis tasks such as recognition, detection and segmentation as well as synthesis tasks such as colorization by explaining our works in the past three years. This thesis proposes both a discriminative formulation as well as a generative formulation in describing patch layouts. The algorithm developed upon the discriminative framework achieves state-of-the-art results in the joint detection and its subcategory recognition problem. Algorithms developed for these models are also discussed in the process with results and examples