Search CORE

1,023 research outputs found

Pushing Stochastic Gradient towards Second-Order Methods -- Backpropagation Learning with Transformations in Nonlinearities

Author: G.E. Hinton
N.N. Schraudolph
S. Amari
T. Raiko
Y.A. LeCun
Publication venue
Publication date: 01/01/2013
Field of study

Recently, we proposed to transform the outputs of each hidden neuron in a multi-layer perceptron network to have zero output and zero slope on average, and use separate shortcut connections to model the linear dependencies instead. We continue the work by firstly introducing a third transformation to normalize the scale of the outputs of each hidden neuron, and secondly by analyzing the connections to second order optimization methods. We show that the transformations make a simple stochastic gradient behave closer to second-order optimization methods and thus speed up learning. This is shown both in theory and with experiments. The experiments on the third transformation show that while it further increases the speed of learning, it can also hurt performance by converging to a worse local optimum, where both the inputs and outputs of many hidden neurons are close to zero.Comment: 10 pages, 5 figures, ICLR201

arXiv.org e-Print Archive

Crossref

A Survey on Bayesian Deep Learning

Author: Wang Hao
Yeung Dit-Yan
Publication venue
Publication date: 01/07/2020
Field of study

A comprehensive artificial intelligence system needs to not only perceive the environment with different `senses' (e.g., seeing and hearing) but also infer the world's conditional (or even causal) relations and corresponding uncertainty. The past decade has seen major advances in many perception tasks such as visual object recognition and speech recognition using deep learning models. For higher-level inference, however, probabilistic graphical models with their Bayesian nature are still more powerful and flexible. In recent years, Bayesian deep learning has emerged as a unified probabilistic framework to tightly integrate deep learning and Bayesian models. In this general framework, the perception of text or images using deep learning can boost the performance of higher-level inference and in turn, the feedback from the inference process is able to enhance the perception of text or images. This survey provides a comprehensive introduction to Bayesian deep learning and reviews its recent applications on recommender systems, topic models, control, etc. Besides, we also discuss the relationship and differences between Bayesian deep learning and other related topics such as Bayesian treatment of neural networks.Comment: To appear in ACM Computing Surveys (CSUR) 202

arXiv.org e-Print Archive

DSpace@MIT

Parallelization of Support Vector Machines

Author: Agen Olga
Publication venue: Tartu Ülikool
Publication date: 01/01/2013
Field of study

Tugivektormasin (Support Vector Machine) on masinõppe meetod, mida kasutakse andmete klassifitseerimiseks. Binaarse klassifikatsiooni probleem seisneb sellise funktsiooni või mudeli leidmisel, mis oskaks ennustada, mis klassi etteantud punkt x kuulub. Mudeli treenimiseks kasutatakse treeningandmeid. Oma töös võrdlesime iteratiivsed ja paralleelseid tugivektormasina algoritmide implementatsioone. Uurimise käigus avastasime et paralleelsed algoritmid, nagu oligi oodatud, töötavad palju kiiremini kui iteratiivsed, seejuures valesti klassifitseeritud punktide arv ei suurene.One of the techniques used for data classification is support vector machine (SVM). SVM takes binary classification as the fundamental problem and follows the geometrically intuitive approach to find a hyperplane that divides objects into two separate classes. The training part of the SVM aims to both maximize the width of the margin that surrounds the separating hyperplane and minimize the occurrence of classification errors. The goal of given thesis is to research efficiency in performance gained by using parallel approach to solve SVM and compare proposed techniques for parallelization in accuracy and computation speed

DSpace at Tartu University Library

Phase space techniques in neural network models

Author: Yau Hon Wah
Publication venue: The University of Edinburgh
Publication date: 01/01/1992
Field of study

Edinburgh Research Archive