thesis

Spatial Deep Networks for Outdoor Scene Classification

Abstract

Scene classification has become an increasingly popular topic in computer vision. The techniques for scene classification can be widely used in many other aspects, such as detection, action recognition, and content-based image retrieval. Recently, the stationary property of images has been leveraged in conjunction with convolutional networks to perform classification tasks. In the existing approach, one random patch is extracted from each training image to learn filters for convolutional processes. However, feature learning only from one random patch per image is not robust because patches selected from di↵erent areas of an image may contain distinct scene objects which make the features of these patches have di↵erent descriptive power. In this dissertation, focusing on deep learning techniques, we propose a multi-scale network that utilizes multiple random patches and di↵erent patch dimensions to learn feature representations for images in order to improve the existing approach. Despite the much better performance the multi-scale network can achieve than the existing approach, lacking of local features and the spatial layout is one of the core limitations of both methods. Therefore, we propose a novel Spatial Deep Network (SDN) to further enhance the existing approach by exploiting the spatial layout of the image and constraining the random patch extraction to be performed in di↵erent areas of the image so as to e↵ectively restrict the patches to hold the necessary characteristics of di↵erent image areas. In this way, SDN yields compact but discriminative features that incorporate both global descriptors and the local spatial information for images. Experiment results show that SDN considerably exceeds the existing approach and multi-scale networks and achieves competitive performance with some widely used classification techniques on the OT dataset (developed by Oliva and Torralba). In order to evaluate the robustness of the proposed SDN, we also apply it to the content-based image retrieval on the Holidays dataset, where our features attain much better retrieval performance but have much lower feature dimensions compared to other state-of-the-art feature descriptors

    Similar works