Search CORE

70,195 research outputs found

Deep Learning: Our Miraculous Year 1990-1991

Author: Schmidhuber Juergen
Publication venue
Publication date: 12/05/2020
Field of study

In 2020, we will celebrate that many of the basic ideas behind the deep learning revolution were published three decades ago within fewer than 12 months in our "Annus Mirabilis" or "Miraculous Year" 1990-1991 at TU Munich. Back then, few people were interested, but a quarter century later, neural networks based on these ideas were on over 3 billion devices such as smartphones, and used many billions of times per day, consuming a significant fraction of the world's compute.Comment: 37 pages, 188 references, based on work of 4 Oct 201

arXiv.org e-Print Archive

IEEE Access special section editorial: Artificial intelligence enabled networking

Author: Imran Muhammad Ali
Ni Qiang
Qadir Junaid
Vasilakos Athanasios V.
Yau Kok-Lim Alvin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

With today’s computer networks becoming increasingly dynamic, heterogeneous, and complex, there is great interest in deploying artificial intelligence (AI) based techniques for optimization and management of computer networks. AI techniques—that subsume multidisciplinary techniques from machine learning, optimization theory, game theory, control theory, and meta-heuristics—have long been applied to optimize computer networks in many diverse settings. Such an approach is gaining increased traction with the emergence of novel networking paradigms that promise to simplify network management (e.g., cloud computing, network functions virtualization, and software-defined networking) and provide intelligent services (e.g., future 5G mobile networks). Looking ahead, greater integration of AI into networking architectures can help develop a future vision of cognitive networks that will show network-wide intelligent behavior to solve problems of network heterogeneity, performance, and quality of service (QoS)

Crossref

Enlighten

Shakeout: A New Approach to Regularized Deep Neural Network Training

Author: Kang Guoliang
Li Jun
Tao Dacheng
Publication venue
Publication date: 01/05/2018
Field of study

Recent years have witnessed the success of deep neural networks in dealing with a plenty of practical problems. Dropout has played an essential role in many successful deep neural networks, by inducing regularization in the model training. In this paper, we present a new regularized training approach: Shakeout. Instead of randomly discarding units as Dropout does at the training stage, Shakeout randomly chooses to enhance or reverse each unit's contribution to the next layer. This minor modification of Dropout has the statistical trait: the regularizer induced by Shakeout adaptively combines

L_0

L_1

and

L_2

regularization terms. Our classification experiments with representative deep architectures on image datasets MNIST, CIFAR-10 and ImageNet show that Shakeout deals with over-fitting effectively and outperforms Dropout. We empirically demonstrate that Shakeout leads to sparser weights under both unsupervised and supervised settings. Shakeout also leads to the grouping effect of the input units in a layer. Considering the weights in reflecting the importance of connections, Shakeout is superior to Dropout, which is valuable for the deep model compression. Moreover, we demonstrate that Shakeout can effectively reduce the instability of the training process of the deep architecture.Comment: Appears at T-PAMI 201

arXiv.org e-Print Archive

OPUS - University of Technology Sydney

Multilevel Artificial Neural Network Training for Spatially Correlated Learning

Author: Mjolsness Eric
Scott C. B.
Publication venue
Publication date: 01/01/2019
Field of study

Multigrid modeling algorithms are a technique used to accelerate relaxation models running on a hierarchy of similar graphlike structures. We introduce and demonstrate a new method for training neural networks which uses multilevel methods. Using an objective function derived from a graph-distance metric, we perform orthogonally-constrained optimization to find optimal prolongation and restriction maps between graphs. We compare and contrast several methods for performing this numerical optimization, and additionally present some new theoretical results on upper bounds of this type of objective function. Once calculated, these optimal maps between graphs form the core of Multiscale Artificial Neural Network (MsANN) training, a new procedure we present which simultaneously trains a hierarchy of neural network models of varying spatial resolution. Parameter information is passed between members of this hierarchy according to standard coarsening and refinement schedules from the multiscale modelling literature. In our machine learning experiments, these models are able to learn faster than default training, achieving a comparable level of error in an order of magnitude fewer training examples.Comment: Manuscript (24 pages) and Supplementary Material (4 pages). Updated January 2019 to reflect new formulation of MsANN structure and new training procedur

arXiv.org e-Print Archive

eScholarship - University of California