4,284 research outputs found
Identifying and attacking the saddle point problem in high-dimensional non-convex optimization
A central challenge to many fields of science and engineering involves
minimizing non-convex error functions over continuous, high dimensional spaces.
Gradient descent or quasi-Newton methods are almost ubiquitously used to
perform such minimizations, and it is often thought that a main source of
difficulty for these local methods to find the global minimum is the
proliferation of local minima with much higher error than the global minimum.
Here we argue, based on results from statistical physics, random matrix theory,
neural network theory, and empirical evidence, that a deeper and more profound
difficulty originates from the proliferation of saddle points, not local
minima, especially in high dimensional problems of practical interest. Such
saddle points are surrounded by high error plateaus that can dramatically slow
down learning, and give the illusory impression of the existence of a local
minimum. Motivated by these arguments, we propose a new approach to
second-order optimization, the saddle-free Newton method, that can rapidly
escape high dimensional saddle points, unlike gradient descent and quasi-Newton
methods. We apply this algorithm to deep or recurrent neural network training,
and provide numerical evidence for its superior optimization performance.Comment: The theoretical review and analysis in this article draw heavily from
arXiv:1405.4604 [cs.LG
Practical Gauss-Newton Optimisation for Deep Learning
We present an efficient block-diagonal ap- proximation to the Gauss-Newton
matrix for feedforward neural networks. Our result- ing algorithm is
competitive against state- of-the-art first order optimisation methods, with
sometimes significant improvement in optimisation performance. Unlike
first-order methods, for which hyperparameter tuning of the optimisation
parameters is often a labo- rious process, our approach can provide good
performance even when used with default set- tings. A side result of our work
is that for piecewise linear transfer functions, the net- work objective
function can have no differ- entiable local maxima, which may partially explain
why such transfer functions facilitate effective optimisation.Comment: ICML 201
Metaheuristic design of feedforward neural networks: a review of two decades of research
Over the past two decades, the feedforward neural network (FNN) optimization has been a key interest among the researchers and practitioners of multiple disciplines. The FNN optimization is often viewed from the various perspectives: the optimization of weights, network architecture, activation nodes, learning parameters, learning environment, etc. Researchers adopted such different viewpoints mainly to improve the FNN's generalization ability. The gradient-descent algorithm such as backpropagation has been widely applied to optimize the FNNs. Its success is evident from the FNN's application to numerous real-world problems. However, due to the limitations of the gradient-based optimization methods, the metaheuristic algorithms including the evolutionary algorithms, swarm intelligence, etc., are still being widely explored by the researchers aiming to obtain generalized FNN for a given problem. This article attempts to summarize a broad spectrum of FNN optimization methodologies including conventional and metaheuristic approaches. This article also tries to connect various research directions emerged out of the FNN optimization practices, such as evolving neural network (NN), cooperative coevolution NN, complex-valued NN, deep learning, extreme learning machine, quantum NN, etc. Additionally, it provides interesting research challenges for future research to cope-up with the present information processing era
Training feedforward neural networks using orthogonal iteration of the Hessian eigenvectors
Introduction
Training algorithms for Multilayer Perceptrons optimize the set of W weights and biases, w, so as to minimize an
error function, E, applied to a set of N training patterns. The well-known back propagation algorithm combines an
efficient method of estimating the gradient of the error function in weight space, DE=g, with a simple gradient
descent procedure to adjust the weights, Dw = -hg. More efficient algorithms maintain the gradient estimation
procedure, but replace the update step with a faster non-linear optimization strategy [1].
Efficient non-linear optimization algorithms are based upon second order approximation [2]. When sufficiently
close to a minimum the error surface is approximately quadratic, the shape being determined by the Hessian matrix.
Bishop [1] presents a detailed discussion of the properties and significance of the Hessian matrix. In principle, if
sufficiently close to a minimum it is possible to move directly to the minimum using the Newton step, -H-1g.
In practice, the Newton step is not used as H-1 is very expensive to evaluate; in addition, when not sufficiently close
to a minimum, the Newton step may cause a disastrously poor step to be taken. Second order algorithms either build
up an approximation to H-1, or construct a search strategy that implicitly exploits its structure without evaluating it;
they also either take precautions to prevent steps that lead to a deterioration in error, or explicitly reject such steps.
In applying non-linear optimization algorithms to neural networks, a key consideration is the high-dimensional
nature of the search space. Neural networks with thousands of weights are not uncommon. Some algorithms have
O(W2) or O(W3) memory or execution times, and are hence impracticable in such cases. It is desirable to identify
algorithms that have limited memory requirements, particularly algorithms where one may trade memory usage
against convergence speed.
The paper describes a new training algorithm that has scalable memory requirements, which may range from O(W)
to O(W2), although in practice the useful range is limited to lower complexity levels. The algorithm is based upon a
novel iterative estimation of the principal eigen-subspace of the Hessian, together with a quadratic step estimation
procedure.
It is shown that the new algorithm has convergence time comparable to conjugate gradient descent, and may be
preferable if early stopping is used as it converges more quickly during the initial phases.
Section 2 overviews the principles of second order training algorithms. Section 3 introduces the new algorithm.
Second 4 discusses some experiments to confirm the algorithm's performance; section 5 concludes the paper
Causative factors of construction and demolition waste generation in Iraq Construction Industry
The construction industry has hurt the environment from the waste generated during
construction activities. Thus, it calls for serious measures to determine the causative
factors of construction waste generated. There are limited studies on factors causing
construction, and demolition (C&D) waste generation, and these limited studies only
focused on the quantification of construction waste. This study took the opportunity to
identify the causative factors for the C&D waste generation and also to determine the
risk level of each causal factor, and the most important minimization methods to
avoiding generating waste. This study was carried out based on the quantitative
approach. A total of 39 factors that causes construction waste generation that has been
identified from the literature review were considered which were then clustered into 4
groups. Improved questionnaire surveys by 38 construction experts (consultants,
contractors and clients) during the pilot study. The actual survey was conducted with
a total of 380 questionnaires, received with a response rate of 83.3%. Data analysis
was performed using SPSS software. Ranking analysis using the mean score approach
found the five most significant causative factors which are poor site management, poor
planning, lack of experience, rework and poor controlling. The result also indicated
that the majority of the identified factors having a high-risk level, in addition, the better
minimization method is environmental awareness. A structural model was developed
based on the 4 groups of causative factors using the Partial Least Squared-Structural
Equation Modelling (PLS-SEM) technique. It was found that the model fits due to the
goodness of fit (GOF ≥ 0.36= 0.658, substantial). Based on the outcome of this study,
39 factors were relevant to the generation of construction and demolition waste in Iraq.
These groups of factors should be avoided during construction works to reduce the
waste generated. The findings of this study are helpful to authorities and stakeholders
in formulating laws and regulations. Furthermore, it provides opportunities for future
researchers to conduct additional research’s on the factors that contribute to
construction waste generation
Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural Networks
Effective training of deep neural networks suffers from two main issues. The
first is that the parameter spaces of these models exhibit pathological
curvature. Recent methods address this problem by using adaptive
preconditioning for Stochastic Gradient Descent (SGD). These methods improve
convergence by adapting to the local geometry of parameter space. A second
issue is overfitting, which is typically addressed by early stopping. However,
recent work has demonstrated that Bayesian model averaging mitigates this
problem. The posterior can be sampled by using Stochastic Gradient Langevin
Dynamics (SGLD). However, the rapidly changing curvature renders default SGLD
methods inefficient. Here, we propose combining adaptive preconditioners with
SGLD. In support of this idea, we give theoretical properties on asymptotic
convergence and predictive risk. We also provide empirical results for Logistic
Regression, Feedforward Neural Nets, and Convolutional Neural Nets,
demonstrating that our preconditioned SGLD method gives state-of-the-art
performance on these models.Comment: AAAI 201
- …