1,857 research outputs found
Metaheuristic design of feedforward neural networks: a review of two decades of research
Over the past two decades, the feedforward neural network (FNN) optimization has been a key interest among the researchers and practitioners of multiple disciplines. The FNN optimization is often viewed from the various perspectives: the optimization of weights, network architecture, activation nodes, learning parameters, learning environment, etc. Researchers adopted such different viewpoints mainly to improve the FNN's generalization ability. The gradient-descent algorithm such as backpropagation has been widely applied to optimize the FNNs. Its success is evident from the FNN's application to numerous real-world problems. However, due to the limitations of the gradient-based optimization methods, the metaheuristic algorithms including the evolutionary algorithms, swarm intelligence, etc., are still being widely explored by the researchers aiming to obtain generalized FNN for a given problem. This article attempts to summarize a broad spectrum of FNN optimization methodologies including conventional and metaheuristic approaches. This article also tries to connect various research directions emerged out of the FNN optimization practices, such as evolving neural network (NN), cooperative coevolution NN, complex-valued NN, deep learning, extreme learning machine, quantum NN, etc. Additionally, it provides interesting research challenges for future research to cope-up with the present information processing era
Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural Networks
Effective training of deep neural networks suffers from two main issues. The
first is that the parameter spaces of these models exhibit pathological
curvature. Recent methods address this problem by using adaptive
preconditioning for Stochastic Gradient Descent (SGD). These methods improve
convergence by adapting to the local geometry of parameter space. A second
issue is overfitting, which is typically addressed by early stopping. However,
recent work has demonstrated that Bayesian model averaging mitigates this
problem. The posterior can be sampled by using Stochastic Gradient Langevin
Dynamics (SGLD). However, the rapidly changing curvature renders default SGLD
methods inefficient. Here, we propose combining adaptive preconditioners with
SGLD. In support of this idea, we give theoretical properties on asymptotic
convergence and predictive risk. We also provide empirical results for Logistic
Regression, Feedforward Neural Nets, and Convolutional Neural Nets,
demonstrating that our preconditioned SGLD method gives state-of-the-art
performance on these models.Comment: AAAI 201
On-the-fly adaptivity for nonlinear twoscale simulations using artificial neural networks and reduced order modeling
A multi-fidelity surrogate model for highly nonlinear multiscale problems is
proposed. It is based on the introduction of two different surrogate models and
an adaptive on-the-fly switching. The two concurrent surrogates are built
incrementally starting from a moderate set of evaluations of the full order
model. Therefore, a reduced order model (ROM) is generated. Using a hybrid
ROM-preconditioned FE solver, additional effective stress-strain data is
simulated while the number of samples is kept to a moderate level by using a
dedicated and physics-guided sampling technique. Machine learning (ML) is
subsequently used to build the second surrogate by means of artificial neural
networks (ANN). Different ANN architectures are explored and the features used
as inputs of the ANN are fine tuned in order to improve the overall quality of
the ML model. Additional ANN surrogates for the stress errors are generated.
Therefore, conservative design guidelines for error surrogates are presented by
adapting the loss functions of the ANN training in pure regression or pure
classification settings. The error surrogates can be used as quality indicators
in order to adaptively select the appropriate -- i.e. efficient yet accurate --
surrogate. Two strategies for the on-the-fly switching are investigated and a
practicable and robust algorithm is proposed that eliminates relevant technical
difficulties attributed to model switching. The provided algorithms and ANN
design guidelines can easily be adopted for different problem settings and,
thereby, they enable generalization of the used machine learning techniques for
a wide range of applications. The resulting hybrid surrogate is employed in
challenging multilevel FE simulations for a three-phase composite with
pseudo-plastic micro-constituents. Numerical examples highlight the performance
of the proposed approach
Incremental construction of LSTM recurrent neural network
Long Short--Term Memory (LSTM) is a recurrent neural network that
uses structures called memory blocks to allow the net remember
significant events distant in the past input sequence in order to
solve long time lag tasks, where other RNN approaches fail.
Throughout this work we have performed experiments using LSTM
networks extended with growing abilities, which we call GLSTM.
Four methods of training growing LSTM has been compared. These
methods include cascade and fully connected hidden layers as well
as two different levels of freezing previous weights in the
cascade case. GLSTM has been applied to a forecasting problem in a biomedical domain, where the input/output behavior of five
controllers of the Central Nervous System control has to be
modelled. We have compared growing LSTM results against other
neural networks approaches, and our work applying conventional
LSTM to the task at hand.Postprint (published version
A New Optimization Algorithm for Single Hidden Layer Feedforward Neural Networks
Feedforward neural networks are the most commonly used function approximation techniques in neural networks. By the universal approximation theorem, it is clear that a single-hidden layer feedforward neural network (FNN) is sufficient to approximate the corresponding desired outputs arbitrarily close. Some researchers use genetic algorithms (GAs) to explore the global optimal solution of the FNN structure. However, it is rather time consuming to use GA for the training of FNN. In this paper, we propose a new optimization algorithm for a single-hidden layer FNN. The method is based on the convex combination algorithm for massaging information in the hidden layer. In fact, this technique explores a continuum idea which combines the classic mutation and crossover strategies in GA together. The proposed method has the advantage over GA which requires a lot of preprocessing works in breaking down the data into a sequence of binary codes before learning or mutation can apply. Also, we set up a new error function to measure the performance of the FNN and obtain the optimal choice of the connection weights and thus the nonlinear optimization problem can be solved directly. Several computational experiments are used to illustrate the proposed algorithm, which has good exploration and exploitation capabilities in search of the optimal weight for single hidden layer FNNs
- …