670 research outputs found

    On the Importance of Normalisation Layers in Deep Learning with Piecewise Linear Activation Units

    Full text link
    Deep feedforward neural networks with piecewise linear activations are currently producing the state-of-the-art results in several public datasets. The combination of deep learning models and piecewise linear activation functions allows for the estimation of exponentially complex functions with the use of a large number of subnetworks specialized in the classification of similar input examples. During the training process, these subnetworks avoid overfitting with an implicit regularization scheme based on the fact that they must share their parameters with other subnetworks. Using this framework, we have made an empirical observation that can improve even more the performance of such models. We notice that these models assume a balanced initial distribution of data points with respect to the domain of the piecewise linear activation function. If that assumption is violated, then the piecewise linear activation units can degenerate into purely linear activation units, which can result in a significant reduction of their capacity to learn complex functions. Furthermore, as the number of model layers increases, this unbalanced initial distribution makes the model ill-conditioned. Therefore, we propose the introduction of batch normalisation units into deep feedforward neural networks with piecewise linear activations, which drives a more balanced use of these activation units, where each region of the activation function is trained with a relatively large proportion of training samples. Also, this batch normalisation promotes the pre-conditioning of very deep learning models. We show that by introducing maxout and batch normalisation units to the network in network model results in a model that produces classification results that are better than or comparable to the current state of the art in CIFAR-10, CIFAR-100, MNIST, and SVHN datasets

    Methods for Understanding and Improving Deep Learning Classification Models

    Get PDF
    Recently proposed deep learning systems can achieve superior performance with respect to methods based on hand-crafted features on a broad range of tasks, not limited to the object recognition/detection tasks, but also on medical image analysis and game control applications. These advances can be credited in part to the rapid development of computation hardware, and the availability of large-scale public datasets. The training process of deep learning models is a challenging task because of the large number of parameters involved, which requires large annotated training sets. A number of recent works have tried to explain the behaviour of deep learning models during training and testing, but the whole field still has limited understanding of the functionality of deep learning models. In this thesis, we aim to develop methods that allow for a better understanding of the behaviour of deep learning models. With such methods, we attempt to improve the performance of deep learning models in several applications and reveal promising directions to explore with empirical evidence. Our first method is a novel nonlinear hierarchical classifier that uses off-the-shelf convolutional neural network (CNN) features. This nonlinear classifier is a tree-structured classifier that uses linear classifier as tree nodes. Experiments suggest that our proposed nonlinear hierarchical classifier achieves better results than the linear classifiers. In our second method, we use Maxout activation function to replace the common rectified linear unit (ReLU) function to increase the model capacity of deep learning models. We found that it can lead to an ill-conditioned training problem, given that the input data is generally not properly normalised. We show how to mitigate this problem by incorporating Batch Normalisation. This method allows us to build a deep learning model that surpassed the performance of several state-of-the-art methods. In the third method, we explore the possibility of introducing multiple-size features into deep learning models. Our design includes up to four different filter sizes to provide different spatial pattern candidates, and a max pooling function that selects the maximum response to represent the unit’s output. As an outcome of this work, we combine the multiple-size filters and the Batch-normalised Maxout activation unit from the second work to achieve the automatic spatial pattern selection within the activation unit. The result of this research shows significant improvements over the state-of-the-art on five publicly available computer vision datasets, including the ImageNet 2012 dataset. Finally, we propose two novel measurements derived from the eigenvalues of the approximate empirical Fisher matrix which can be efficiently calculated within the stochastic gradient descent (SGD) iteration. These measurements can be obtained efficiently even for the recent state-of-the-art deep residual networks. We show how to use these measurements to help select training hyper-parameters such as mini-batch size, model structure, learning rate and stochastic depth rate. By using these tools, we discover a new way to schedule the dynamic sampling and dynamic stochastic depth, which leads to performance improvements of deep learning models. We show the proposed training approach reaches competitive classification results in CIFAR-10 and CIFAR- 100 datasets with models that have significantly lower capacity compare to the current state-of-the-art in the field.Thesis (Ph.D.) -- University of Adelaide, School of Computer Science, 201

    Adaptive Activation Function Generation Through Fuzzy Inference for Grooming Text Categorisation

    Get PDF
    The activation function is introduced to determine the output of neural networks by mapping the resulting values of neurons into a specific range. The activation functions often suffer from ‘gradient vanishing’, ‘non zero-centred function outputs’, ‘exploding gradients’, and ‘dead neurons’, which may lead to deterioration in the classification performance. This paper proposes an activation function generation approach using the Takagi-Sugeno-Kang inference in an effort to address such challenges. In addition, the proposed method further optimises the coefficients in the activation function using the genetic algorithm such that the activation function can adapt to different applications. This approach has been applied to a digital forensics application of online grooming detection. The evaluations confirm the superiority of the proposed activation function for online grooming detection using an imbalanced data set

    StochasticNet in StochasticNet

    Get PDF
    Deep neural networks have been shown to outperform conventionalstate-of-the-art approaches in several structured predictionapplications. While high-performance computing devices such asGPUs has made developing very powerful deep neural networkspossible, it is not feasible to run these networks on low-cost, lowpowercomputing devices such as embedded CPUs or even embeddedGPUs. As such, there has been a lot of recent interestto produce efficient deep neural network architectures that can berun on small computing devices. Motivated by this, the idea ofStochasticNets was introduced, where deep neural networks areformed by leveraging random graph theory. It has been shownthat StochasticNet can form new networks with 2X or 3X architecturalefficiency while maintaining modeling accuracy. Motivated bythese promising results, here we investigate the idea of Stochastic-Net in StochasticNet (SiS), where highly-efficient deep neural networkswith Network in Network (NiN) architectures are formed ina stochastic manner. Such networks have an intertwining structurecomposed of convolutional layers and micro neural networksto boost the modeling accuracy. The experimental results showthat SiS can form deep neural networks with NiN architectures thathave 4X greater architectural efficiency with only a 2% dropin accuracy for the CIFAR10 dataset. The results are even morepromising for the SVHN dataset, where SiS formed deep neuralnetworks with NiN architectures that have 11.5X greater architecturalefficiency with only a 1% decrease in modeling accuracy
    • …
    corecore