Exploring geometrical structures in high-dimensional computer vision data

Abstract

In computer vision, objects such as local features, images and video sequences are often represented as high dimensional data points, although it is commonly believed that there are low dimensional geometrical structures that underline the data set. The low dimensional geometric information enables us to have a better understanding of the high dimensional data sets and is useful in solving computer vision problems. In this thesis, the geometrical structures are investigated from different perspectives according to different computer vision applications. For spectral clustering, the distribution of data points in the local region is summarised by a covariance matrix which is viewed as the Mahalanobis distance. For the action recognition problem, we extract subspace information for each action class. The query video sequence is labeled by information regarding its distance to the subspaces of the corresponding video classes. Three new algorithms are introduced for hashing-based approaches for approximate nearest neighbour (ANN) search problems, NOKMeans relaxes the orthogonal condition of the encoding functions in previous quantisation error based methods by representing data points in a new feature space; Auto-JacoBin uses a robust auto-encoder model to preserve the geometric information from the original space into the binary codes; and AGreedy assigns a score, which reflects the ability to preserve the order information in the local regions, for any set of encoding functions and an alternating greedy method is used to find a local optimal solution. The geometric information has the potential to bring better solutions for computer vision problems. As shown in our experiments, the benefits include increasing clustering accuracy, reducing the computation for recognising actions in videos and increasing retrieval performance for ANN problems

    Similar works