This thesis concerns models for visual object classes that exhibit a reasonable amount of regularity,
such as faces, pedestrians, cells and human brains. Such models are useful for making
“within-object” inferences such as determining their individual characteristics and establishing
their identity. For example, the model could be used to predict the identity of a face, the pose
of a pedestrian or the phenotype of a cell and segment parts of a human brain.
Existing object modelling techniques have several limitations. First, most current methods
have targeted the above tasks individually using object specific representations; therefore, they
cannot be applied to other problems without major alterations. Second, most methods have been
designed to work with small databases which do not contain the variations in pose, illumination,
occlusion and background clutter seen in ‘real world’ images. Consequently, many existing
algorithms fail when tested on unconstrained databases. Finally, the complexity of the training
procedure in these methods makes it impractical to use large datasets.
In this thesis, we investigate patch-based models for object classes. Our models are capable
of exploiting very large databases of objects captured in uncontrolled environments. We
represent the test image with a regular grid of patches from a library of images of the same
object. All the domain specific information is held in this library: we use one set of images of
the object to help draw inferences about others. In each experimental chapter we investigate
a different within-object inference task. In particular we develop models for classification, regression,
semantic segmentation and identity recognition. In each task, we achieve results that
are comparable to or better than the state of the art. We conclude that patch-based representation
can be successfully used for the above tasks and shows promise for other applications such
as generation and localization