29 research outputs found

    Intelligent gradient amplification for deep neural networks

    Full text link
    Deep learning models offer superior performance compared to other machine learning techniques for a variety of tasks and domains, but pose their own challenges. In particular, deep learning models require larger training times as the depth of a model increases, and suffer from vanishing gradients. Several solutions address these problems independently, but there have been minimal efforts to identify an integrated solution that improves the performance of a model by addressing vanishing gradients, as well as accelerates the training process to achieve higher performance at larger learning rates. In this work, we intelligently determine which layers of a deep learning model to apply gradient amplification to, using a formulated approach that analyzes gradient fluctuations of layers during training. Detailed experiments are performed for simpler and deeper neural networks using two different intelligent measures and two different thresholds that determine the amplification layers, and a training strategy where gradients are amplified only during certain epochs. Results show that our amplification offers better performance compared to the original models, and achieves accuracy improvement of around 2.5% on CIFAR- 10 and around 4.5% on CIFAR-100 datasets, even when the models are trained with higher learning rates

    Advances in scaling deep learning algorithms

    Get PDF
    Les algorithmes d'apprentissage profond forment un nouvel ensemble de méthodes puissantes pour l'apprentissage automatique. L'idée est de combiner des couches de facteurs latents en hierarchies. Cela requiert souvent un coût computationel plus elevé et augmente aussi le nombre de paramètres du modèle. Ainsi, l'utilisation de ces méthodes sur des problèmes à plus grande échelle demande de réduire leur coût et aussi d'améliorer leur régularisation et leur optimization. Cette thèse adresse cette question sur ces trois perspectives. Nous étudions tout d'abord le problème de réduire le coût de certains algorithmes profonds. Nous proposons deux méthodes pour entrainer des machines de Boltzmann restreintes et des auto-encodeurs débruitants sur des distributions sparses à haute dimension. Ceci est important pour l'application de ces algorithmes pour le traitement de langues naturelles. Ces deux méthodes (Dauphin et al., 2011; Dauphin and Bengio, 2013) utilisent l'échantillonage par importance pour échantilloner l'objectif de ces modèles. Nous observons que cela réduit significativement le temps d'entrainement. L'accéleration atteint 2 ordres de magnitude sur plusieurs bancs d'essai. Deuxièmement, nous introduisont un puissant régularisateur pour les méthodes profondes. Les résultats expérimentaux démontrent qu'un bon régularisateur est crucial pour obtenir de bonnes performances avec des gros réseaux (Hinton et al., 2012). Dans Rifai et al. (2011), nous proposons un nouveau régularisateur qui combine l'apprentissage non-supervisé et la propagation de tangente (Simard et al., 1992). Cette méthode exploite des principes géometriques et permit au moment de la publication d'atteindre des résultats à l'état de l'art. Finalement, nous considérons le problème d'optimiser des surfaces non-convexes à haute dimensionalité comme celle des réseaux de neurones. Tradionellement, l'abondance de minimum locaux était considéré comme la principale difficulté dans ces problèmes. Dans Dauphin et al. (2014a) nous argumentons à partir de résultats en statistique physique, de la théorie des matrices aléatoires, de la théorie des réseaux de neurones et à partir de résultats expérimentaux qu'une difficulté plus profonde provient de la prolifération de points-selle. Dans ce papier nous proposons aussi une nouvelle méthode pour l'optimisation non-convexe.Deep learning algorithms are a new set of powerful methods for machine learning. The general idea is to combine layers of latent factors into hierarchies. This usually leads to a higher computational cost and having more parameters to tune. Thus scaling to larger problems will require not only reducing their computational cost but also improving regularization and optimization. This thesis investigates scaling from these three perspectives. We first study the problem of reducing the computational cost of some deep learning algorithms. We propose methods to scale restricted Boltzmann machines (RBM) and denoising auto-encoders (DAE) to very high-dimensional sparse distributions. This is important for applications of deep learning to natural language processing. Both methods (Dauphin et al., 2011; Dauphin and Bengio, 2013) rely on importance sampling to subsample the learning objective of these models. We show that this greatly reduces the training time, leading to 2 orders of magnitude speed ups on several benchmark datasets without losses in the quality of the model. Second, we introduce a powerful regularization method for deep neural nets. Experiments have shown that proper regularization is in many cases crucial to obtaining good performance out of larger networks (Hinton et al., 2012). In Rifai et al. (2011), we propose a new regularizer that combines unsupervised learning and tangent propagation (Simard et al., 1992). The method exploits several geometrical insights and was able at the time of publication to reach state-of-the-art results on competitive benchmarks. Finally, we consider the problem of optimizing over high-dimensional non-convex loss surfaces like those found in deep neural nets. Traditionally, the main difficulty in these problems is considered to be the abundance of local minima. In Dauphin et al. (2014a) we argue, based on results from statistical physics, random matrix theory, neural network theory, and empirical evidence, that the vast majority of critical points are saddle points, not local minima. We also propose a new optimization method for non-convex optimization

    Control and identification of non-linear systems using neural networks and reinforcement learning

    Get PDF
    Dissertação (mestrado)—Universidade de Brasília, Faculdade de Tecnologia, Departamento de Engenharia Elétrica, 2018.Este trabalho propõe um contolador adaptativo utilizando redes neuras e aprendizado por reforço para lidar com não-linearidades e variância no tempo. Para a realização de testes, um sistema de nível de líquidos de quarta ordem foi escolhido por apresentar uma gama de constantes de tempo e por possibilitar a mudança de parâmetros. O sistema foi identificado com redes neurais para prever estados futuros com o objetivo de compensar o atraso e melhorar a performance do controlador. Diversos testes foram realizados com diversas redes neurais para decidir qual rede neural seria utilizada para cada tarefa pertinente ao controlador. Os parâmetros do controlador foram ajustados e testados para que o controlador pudesse alcançar parâmetros arbitrários de performance. O controlador foi testado e comparado com o PI tradicional para validação e mostrou caracteristicas adaptativas e melhoria de performance ao longo do tempo, além disso, o controlador desenvolvido não necessita de informação prévia do sistema.Fundação de Apoio a Pesquisa do Distrito Federal (FAP-DF).This work presents a proposal of an adaptive controller using reinforcement learning and neural networks in order to deal with non-linearities and time-variance. To test the controller a fourth-order fluid level system was chosen because of its great range of time constants and the possibility of varying the system parameters. System identification was performed to predict future states of the system, bypass delay and enhance the controller’s performance. Several tests with different neural networks were made in order to decide which network would be assigned to which task. Various parameters of the controller were tested and tuned to achieve a controller that satisfied arbitrary specifications. The controller was tested against a conventional PI controller used as reference and has shown adaptive features and improvement during execution. Also, the proposed controller needs no previous information on the system in order to be designed

    Advances in Deep Learning through Gradient Amplification ?and Applications?

    Get PDF
    Deep neural networks currently play a prominent role in solving problems across a wide variety of disciplines. Improving performance of deep learning models and reducing their training times are some of the ongoing challenges. Increasing the depth of the networks improves performance but suffers from the problem of vanishing gradients and increased training times. In this research, we design methods to address these challenges in deep neural networks and demonstrate deep learning applications in several domains. We propose a gradient amplification based approach to train deep neural networks, which improves their training and testing accuraries, addresses vanishing gradients, as well as reduces the training time by reaching higher accuracies even at higher learning rates. We also develop an integrated training strategy to enable/disable amplification at certain epochs. Detailed analysis is performed on different neural networks using random amplification, where the layers to be amplified are selected randomly. The implications of gradient amplification on the number of layers, types of layers, amplification factors, training strategies and learning rates are studied in detail. With this knowledge, effective ways to update gradients are designed to perform amplification at layer-level and also at neuron-level. Lastly, we provide applications of deep learning methods to some of the challenging problems in the areas of smartgrids and bioinformatics. Deep neural networks with feed forward architectures are used to solve data integrity attacks in smart grids. We propose an image based preprocessing method to convert heterogenous genomic sequences into images which are then classified to detect Hepatitis C virus(HCV) infection stages. In summary, this research advances deep learning techniques and their applications to real world problems

    Forecasting the stock market index using artificial intelligence techniques

    Get PDF
    The weak form of Efficient Market hypothesis (EMH) states that it is impossible to forecast the future price of an asset based on the information contained in the historical prices of an asset. This means that the market behaves as a random walk and as a result makes forecasting impossible. Furthermore, financial forecasting is a difficult task due to the intrinsic complexity of the financial system. The objective of this work was to use artificial intelligence (AI) techniques to model and predict the future price of a stock market index. Three artificial intelligence techniques, namely, neural networks (NN), support vector machines and neuro-fuzzy systems are implemented in forecasting the future price of a stock market index based on its historical price information. Artificial intelligence techniques have the ability to take into consideration financial system complexities and they are used as financial time series forecasting tools. Two techniques are used to benchmark the AI techniques, namely, Autoregressive Moving Average (ARMA) which is linear modelling technique and random walk (RW) technique. The experimentation was performed on data obtained from the Johannesburg Stock Exchange. The data used was a series of past closing prices of the All Share Index. The results showed that the three techniques have the ability to predict the future price of the Index with an acceptable accuracy. All three artificial intelligence techniques outperformed the linear model. However, the random walk method outperfomed all the other techniques. These techniques show an ability to predict the future price however, because of the transaction costs of trading in the market, it is not possible to show that the three techniques can disprove the weak form of market efficiency. The results show that the ranking of performances support vector machines, neuro-fuzzy systems, multilayer perceptron neural networks is dependent on the accuracy measure used

    Forecasting the stock market index using artificial intelligence techniques

    Get PDF
    The weak form of Efficient Market hypothesis (EMH) states that it is impossible to forecast the future price of an asset based on the information contained in the historical prices of an asset. This means that the market behaves as a random walk and as a result makes forecasting impossible. Furthermore, financial forecasting is a difficult task due to the intrinsic complexity of the financial system. The objective of this work was to use artificial intelligence (AI) techniques to model and predict the future price of a stock market index. Three artificial intelligence techniques, namely, neural networks (NN), support vector machines and neuro-fuzzy systems are implemented in forecasting the future price of a stock market index based on its historical price information. Artificial intelligence techniques have the ability to take into consideration financial system complexities and they are used as financial time series forecasting tools. Two techniques are used to benchmark the AI techniques, namely, Autoregressive Moving Average (ARMA) which is linear modelling technique and random walk (RW) technique. The experimentation was performed on data obtained from the Johannesburg Stock Exchange. The data used was a series of past closing prices of the All Share Index. The results showed that the three techniques have the ability to predict the future price of the Index with an acceptable accuracy. All three artificial intelligence techniques outperformed the linear model. However, the random walk method outperfomed all the other techniques. These techniques show an ability to predict the future price however, because of the transaction costs of trading in the market, it is not possible to show that the three techniques can disprove the weak form of market efficiency. The results show that the ranking of performances support vector machines, neuro-fuzzy systems, multilayer perceptron neural networks is dependent on the accuracy measure used
    corecore