4,284 research outputs found

    Identifying and attacking the saddle point problem in high-dimensional non-convex optimization

    Full text link
    A central challenge to many fields of science and engineering involves minimizing non-convex error functions over continuous, high dimensional spaces. Gradient descent or quasi-Newton methods are almost ubiquitously used to perform such minimizations, and it is often thought that a main source of difficulty for these local methods to find the global minimum is the proliferation of local minima with much higher error than the global minimum. Here we argue, based on results from statistical physics, random matrix theory, neural network theory, and empirical evidence, that a deeper and more profound difficulty originates from the proliferation of saddle points, not local minima, especially in high dimensional problems of practical interest. Such saddle points are surrounded by high error plateaus that can dramatically slow down learning, and give the illusory impression of the existence of a local minimum. Motivated by these arguments, we propose a new approach to second-order optimization, the saddle-free Newton method, that can rapidly escape high dimensional saddle points, unlike gradient descent and quasi-Newton methods. We apply this algorithm to deep or recurrent neural network training, and provide numerical evidence for its superior optimization performance.Comment: The theoretical review and analysis in this article draw heavily from arXiv:1405.4604 [cs.LG

    Practical Gauss-Newton Optimisation for Deep Learning

    Get PDF
    We present an efficient block-diagonal ap- proximation to the Gauss-Newton matrix for feedforward neural networks. Our result- ing algorithm is competitive against state- of-the-art first order optimisation methods, with sometimes significant improvement in optimisation performance. Unlike first-order methods, for which hyperparameter tuning of the optimisation parameters is often a labo- rious process, our approach can provide good performance even when used with default set- tings. A side result of our work is that for piecewise linear transfer functions, the net- work objective function can have no differ- entiable local maxima, which may partially explain why such transfer functions facilitate effective optimisation.Comment: ICML 201

    Metaheuristic design of feedforward neural networks: a review of two decades of research

    Get PDF
    Over the past two decades, the feedforward neural network (FNN) optimization has been a key interest among the researchers and practitioners of multiple disciplines. The FNN optimization is often viewed from the various perspectives: the optimization of weights, network architecture, activation nodes, learning parameters, learning environment, etc. Researchers adopted such different viewpoints mainly to improve the FNN's generalization ability. The gradient-descent algorithm such as backpropagation has been widely applied to optimize the FNNs. Its success is evident from the FNN's application to numerous real-world problems. However, due to the limitations of the gradient-based optimization methods, the metaheuristic algorithms including the evolutionary algorithms, swarm intelligence, etc., are still being widely explored by the researchers aiming to obtain generalized FNN for a given problem. This article attempts to summarize a broad spectrum of FNN optimization methodologies including conventional and metaheuristic approaches. This article also tries to connect various research directions emerged out of the FNN optimization practices, such as evolving neural network (NN), cooperative coevolution NN, complex-valued NN, deep learning, extreme learning machine, quantum NN, etc. Additionally, it provides interesting research challenges for future research to cope-up with the present information processing era

    Training feedforward neural networks using orthogonal iteration of the Hessian eigenvectors

    Get PDF
    Introduction Training algorithms for Multilayer Perceptrons optimize the set of W weights and biases, w, so as to minimize an error function, E, applied to a set of N training patterns. The well-known back propagation algorithm combines an efficient method of estimating the gradient of the error function in weight space, DE=g, with a simple gradient descent procedure to adjust the weights, Dw = -hg. More efficient algorithms maintain the gradient estimation procedure, but replace the update step with a faster non-linear optimization strategy [1]. Efficient non-linear optimization algorithms are based upon second order approximation [2]. When sufficiently close to a minimum the error surface is approximately quadratic, the shape being determined by the Hessian matrix. Bishop [1] presents a detailed discussion of the properties and significance of the Hessian matrix. In principle, if sufficiently close to a minimum it is possible to move directly to the minimum using the Newton step, -H-1g. In practice, the Newton step is not used as H-1 is very expensive to evaluate; in addition, when not sufficiently close to a minimum, the Newton step may cause a disastrously poor step to be taken. Second order algorithms either build up an approximation to H-1, or construct a search strategy that implicitly exploits its structure without evaluating it; they also either take precautions to prevent steps that lead to a deterioration in error, or explicitly reject such steps. In applying non-linear optimization algorithms to neural networks, a key consideration is the high-dimensional nature of the search space. Neural networks with thousands of weights are not uncommon. Some algorithms have O(W2) or O(W3) memory or execution times, and are hence impracticable in such cases. It is desirable to identify algorithms that have limited memory requirements, particularly algorithms where one may trade memory usage against convergence speed. The paper describes a new training algorithm that has scalable memory requirements, which may range from O(W) to O(W2), although in practice the useful range is limited to lower complexity levels. The algorithm is based upon a novel iterative estimation of the principal eigen-subspace of the Hessian, together with a quadratic step estimation procedure. It is shown that the new algorithm has convergence time comparable to conjugate gradient descent, and may be preferable if early stopping is used as it converges more quickly during the initial phases. Section 2 overviews the principles of second order training algorithms. Section 3 introduces the new algorithm. Second 4 discusses some experiments to confirm the algorithm's performance; section 5 concludes the paper

    Causative factors of construction and demolition waste generation in Iraq Construction Industry

    Get PDF
    The construction industry has hurt the environment from the waste generated during construction activities. Thus, it calls for serious measures to determine the causative factors of construction waste generated. There are limited studies on factors causing construction, and demolition (C&D) waste generation, and these limited studies only focused on the quantification of construction waste. This study took the opportunity to identify the causative factors for the C&D waste generation and also to determine the risk level of each causal factor, and the most important minimization methods to avoiding generating waste. This study was carried out based on the quantitative approach. A total of 39 factors that causes construction waste generation that has been identified from the literature review were considered which were then clustered into 4 groups. Improved questionnaire surveys by 38 construction experts (consultants, contractors and clients) during the pilot study. The actual survey was conducted with a total of 380 questionnaires, received with a response rate of 83.3%. Data analysis was performed using SPSS software. Ranking analysis using the mean score approach found the five most significant causative factors which are poor site management, poor planning, lack of experience, rework and poor controlling. The result also indicated that the majority of the identified factors having a high-risk level, in addition, the better minimization method is environmental awareness. A structural model was developed based on the 4 groups of causative factors using the Partial Least Squared-Structural Equation Modelling (PLS-SEM) technique. It was found that the model fits due to the goodness of fit (GOF ≥ 0.36= 0.658, substantial). Based on the outcome of this study, 39 factors were relevant to the generation of construction and demolition waste in Iraq. These groups of factors should be avoided during construction works to reduce the waste generated. The findings of this study are helpful to authorities and stakeholders in formulating laws and regulations. Furthermore, it provides opportunities for future researchers to conduct additional research’s on the factors that contribute to construction waste generation

    Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural Networks

    Full text link
    Effective training of deep neural networks suffers from two main issues. The first is that the parameter spaces of these models exhibit pathological curvature. Recent methods address this problem by using adaptive preconditioning for Stochastic Gradient Descent (SGD). These methods improve convergence by adapting to the local geometry of parameter space. A second issue is overfitting, which is typically addressed by early stopping. However, recent work has demonstrated that Bayesian model averaging mitigates this problem. The posterior can be sampled by using Stochastic Gradient Langevin Dynamics (SGLD). However, the rapidly changing curvature renders default SGLD methods inefficient. Here, we propose combining adaptive preconditioners with SGLD. In support of this idea, we give theoretical properties on asymptotic convergence and predictive risk. We also provide empirical results for Logistic Regression, Feedforward Neural Nets, and Convolutional Neural Nets, demonstrating that our preconditioned SGLD method gives state-of-the-art performance on these models.Comment: AAAI 201
    corecore