An Unsupervised Approach to Modelling Visual Data

Abstract

For very large visual datasets, producing expert ground-truth data for training supervised algorithms can represent a substantial human effort. In these situations there is scope for the use of unsupervised approaches that can model collections of images and automatically summarise their content. The primary motivation for this thesis comes from the problem of labelling large visual datasets of the seafloor obtained by an Autonomous Underwater Vehicle (AUV) for ecological analysis. It is expensive to label this data, as taxonomical experts for the specific region are required, whereas automatically generated summaries can be used to focus the efforts of experts, and inform decisions on additional sampling. The contributions in this thesis arise from modelling this visual data in entirely unsupervised ways to obtain comprehensive visual summaries. Firstly, popular unsupervised image feature learning approaches are adapted to work with large datasets and unsupervised clustering algorithms. Next, using Bayesian models the performance of rudimentary scene clustering is boosted by sharing clusters between multiple related datasets, such as regular photo albums or AUV surveys. These Bayesian scene clustering models are extended to simultaneously cluster sub-image segments to form unsupervised notions of “objects” within scenes. The frequency distribution of these objects within scenes is used as the scene descriptor for simultaneous scene clustering. Finally, this simultaneous clustering model is extended to make use of whole image descriptors, which encode rudimentary spatial information, as well as object frequency distributions to describe scenes. This is achieved by unifying the previously presented Bayesian clustering models, and in so doing rectifies some of their weaknesses and limitations. Hence, the final contribution of this thesis is a practical unsupervised algorithm for modelling images from the super-pixel to album levels, and is applicable to large datasets

    Similar works