Methods for Understanding and Improving Deep Learning Classification Models

Abstract

Recently proposed deep learning systems can achieve superior performance with respect to methods based on hand-crafted features on a broad range of tasks, not limited to the object recognition/detection tasks, but also on medical image analysis and game control applications. These advances can be credited in part to the rapid development of computation hardware, and the availability of large-scale public datasets. The training process of deep learning models is a challenging task because of the large number of parameters involved, which requires large annotated training sets. A number of recent works have tried to explain the behaviour of deep learning models during training and testing, but the whole field still has limited understanding of the functionality of deep learning models. In this thesis, we aim to develop methods that allow for a better understanding of the behaviour of deep learning models. With such methods, we attempt to improve the performance of deep learning models in several applications and reveal promising directions to explore with empirical evidence. Our first method is a novel nonlinear hierarchical classifier that uses off-the-shelf convolutional neural network (CNN) features. This nonlinear classifier is a tree-structured classifier that uses linear classifier as tree nodes. Experiments suggest that our proposed nonlinear hierarchical classifier achieves better results than the linear classifiers. In our second method, we use Maxout activation function to replace the common rectified linear unit (ReLU) function to increase the model capacity of deep learning models. We found that it can lead to an ill-conditioned training problem, given that the input data is generally not properly normalised. We show how to mitigate this problem by incorporating Batch Normalisation. This method allows us to build a deep learning model that surpassed the performance of several state-of-the-art methods. In the third method, we explore the possibility of introducing multiple-size features into deep learning models. Our design includes up to four different filter sizes to provide different spatial pattern candidates, and a max pooling function that selects the maximum response to represent the unit’s output. As an outcome of this work, we combine the multiple-size filters and the Batch-normalised Maxout activation unit from the second work to achieve the automatic spatial pattern selection within the activation unit. The result of this research shows significant improvements over the state-of-the-art on five publicly available computer vision datasets, including the ImageNet 2012 dataset. Finally, we propose two novel measurements derived from the eigenvalues of the approximate empirical Fisher matrix which can be efficiently calculated within the stochastic gradient descent (SGD) iteration. These measurements can be obtained efficiently even for the recent state-of-the-art deep residual networks. We show how to use these measurements to help select training hyper-parameters such as mini-batch size, model structure, learning rate and stochastic depth rate. By using these tools, we discover a new way to schedule the dynamic sampling and dynamic stochastic depth, which leads to performance improvements of deep learning models. We show the proposed training approach reaches competitive classification results in CIFAR-10 and CIFAR- 100 datasets with models that have significantly lower capacity compare to the current state-of-the-art in the field.Thesis (Ph.D.) -- University of Adelaide, School of Computer Science, 201

    Similar works