791 research outputs found

    To go deep or wide in learning?

    Full text link
    To achieve acceptable performance for AI tasks, one can either use sophisticated feature extraction methods as the first layer in a two-layered supervised learning model, or learn the features directly using a deep (multi-layered) model. While the first approach is very problem-specific, the second approach has computational overheads in learning multiple layers and fine-tuning of the model. In this paper, we propose an approach called wide learning based on arc-cosine kernels, that learns a single layer of infinite width. We propose exact and inexact learning strategies for wide learning and show that wide learning with single layer outperforms single layer as well as deep architectures of finite width for some benchmark datasets.Comment: 9 pages, 1 figure, Accepted for publication in Seventeenth International Conference on Artificial Intelligence and Statistic

    Practical recommendations for gradient-based training of deep architectures

    Full text link
    Learning algorithms related to artificial neural networks and in particular for Deep Learning may seem to involve many bells and whistles, called hyper-parameters. This chapter is meant as a practical guide with recommendations for some of the most commonly used hyper-parameters, in particular in the context of learning algorithms based on back-propagated gradient and gradient-based optimization. It also discusses how to deal with the fact that more interesting results can be obtained when allowing one to adjust many hyper-parameters. Overall, it describes elements of the practice used to successfully and efficiently train and debug large-scale and often deep multi-layer neural networks. It closes with open questions about the training difficulties observed with deeper architectures

    An Efficient Learning Procedure for Deep Boltzmann Machines

    Get PDF
    We present a new learning algorithm for Boltzmann Machines that contain many layers of hidden variables. Data-dependent statistics are estimated using a variational approximation that tends to focus on a single mode, and data-independent statistics are estimated using persistent Markov chains. The use of two quite different techniques for estimating the two types of statistic that enter into the gradient of the log likelihood makes it practical to learn Boltzmann Machines with multiple hidden layers and millions of parameters. The learning can be made more efficient by using a layer-by-layer "pre-training" phase that initializes the weights sensibly. The pre-training also allows the variational inference to be initialized sensibly with a single bottom-up pass. We present results on the MNIST and NORB datasets showing that Deep Boltzmann Machines learn very good generative models of hand-written digits and 3-D objects. We also show that the features discovered by Deep Boltzmann Machines are a very effective way to initialize the hidden layers of feed-forward neural nets which are then discriminatively fine-tuned

    Design and Evolution of Deep Convolutional Neural Networks in Image Classification – A Review

    Get PDF
    Convolutional Neural Network(CNN) is a well-known computer vision approach successfully applied for various classification and recognition problems. It has an outstanding power to identify patterns in 1D and 2D data. Though invented in 80's, it became hugely successful after LeCun's work on digit identification. Several CNN based models have been developed to record splendid performance on ImageNet and other databases. Ability of the CNN in learning complex features at different hierarchy from the data had made it the most successful among deep learning algorithms. Innovative architectural designs and hyperaparameter optimization have greatly improved the efficiency of CNN in pattern recognition. This review majorly focuses on the evolution and history of CNN models. Landmark CNN architectures are discussed with their categorization depending on various parameters. In addition, this also explores the architectural details of different layers, activation function, optimizers and other hyperparameters used by CNN. Review concludes by shedding the light on the applications and observations to be considered while designing the network
    • …
    corecore