249 research outputs found
A new Sigma-Pi-Sigma neural network based on and regularization and applications
As one type of the important higher-order neural networks developed in the last decade, the Sigma-Pi-Sigma neural network has more powerful nonlinear mapping capabilities compared with other popular neural networks. This paper is concerned with a new Sigma-Pi-Sigma neural network based on a and regularization batch gradient method, and the numerical experiments for classification and regression problems prove that the proposed algorithm is effective and has better properties comparing with other classical penalization methods. The proposed model combines the sparse solution tendency of norm and the high benefits in efficiency of the norm, which can regulate the complexity of a network and prevent overfitting. Also, the numerical oscillation, induced by the non-differentiability of plus regularization at the origin, can be eliminated by a smoothing technique to approximate the objective function
Theoretical Explanation of Activation Sparsity through Flat Minima and Adversarial Robustness
A recent empirical observation of activation sparsity in MLP layers offers an
opportunity to drastically reduce computation costs for free. Despite several
works attributing it to training dynamics, the theoretical explanation of
activation sparsity's emergence is restricted to shallow networks, small
training steps well as modified training, even though the sparsity has been
found in deep models trained by vanilla protocols for large steps. To fill the
three gaps, we propose the notion of gradient sparsity as the source of
activation sparsity and a theoretical explanation based on it that explains
gradient sparsity and then activation sparsity as necessary steps to
adversarial robustness w.r.t. hidden features and parameters, which is
approximately the flatness of minima for well-learned models. The theory
applies to standardly trained LayerNorm-ed pure MLPs, and further to
Transformers or other architectures if noises are added to weights during
training. To eliminate other sources of flatness when arguing sparsities'
necessity, we discover the phenomenon of spectral concentration, i.e., the
ratio between the largest and the smallest non-zero singular values of weight
matrices is small. We utilize random matrix theory (RMT) as a powerful
theoretical tool to analyze stochastic gradient noises and discuss the
emergence of spectral concentration. With these insights, we propose two
plug-and-play modules for both training from scratch and sparsity finetuning,
as well as one radical modification that only applies to from-scratch training.
Another under-testing module for both sparsity and flatness is also immediate
from our theories. Validational experiments are conducted to verify our
explanation. Experiments for productivity demonstrate modifications'
improvement in sparsity, indicating further theoretical cost reduction in both
training and inference
Tree-Structure Expectation Propagation for LDPC Decoding over the BEC
We present the tree-structure expectation propagation (Tree-EP) algorithm to
decode low-density parity-check (LDPC) codes over discrete memoryless channels
(DMCs). EP generalizes belief propagation (BP) in two ways. First, it can be
used with any exponential family distribution over the cliques in the graph.
Second, it can impose additional constraints on the marginal distributions. We
use this second property to impose pair-wise marginal constraints over pairs of
variables connected to a check node of the LDPC code's Tanner graph. Thanks to
these additional constraints, the Tree-EP marginal estimates for each variable
in the graph are more accurate than those provided by BP. We also reformulate
the Tree-EP algorithm for the binary erasure channel (BEC) as a peeling-type
algorithm (TEP) and we show that the algorithm has the same computational
complexity as BP and it decodes a higher fraction of errors. We describe the
TEP decoding process by a set of differential equations that represents the
expected residual graph evolution as a function of the code parameters. The
solution of these equations is used to predict the TEP decoder performance in
both the asymptotic regime and the finite-length regime over the BEC. While the
asymptotic threshold of the TEP decoder is the same as the BP decoder for
regular and optimized codes, we propose a scaling law (SL) for finite-length
LDPC codes, which accurately approximates the TEP improved performance and
facilitates its optimization
Design and Management of Manufacturing Systems
Although the design and management of manufacturing systems have been explored in the literature for many years now, they still remain topical problems in the current scientific research. The changing market trends, globalization, the constant pressure to reduce production costs, and technical and technological progress make it necessary to search for new manufacturing methods and ways of organizing them, and to modify manufacturing system design paradigms. This book presents current research in different areas connected with the design and management of manufacturing systems and covers such subject areas as: methods supporting the design of manufacturing systems, methods of improving maintenance processes in companies, the design and improvement of manufacturing processes, the control of production processes in modern manufacturing systems production methods and techniques used in modern manufacturing systems and environmental aspects of production and their impact on the design and management of manufacturing systems. The wide range of research findings reported in this book confirms that the design of manufacturing systems is a complex problem and that the achievement of goals set for modern manufacturing systems requires interdisciplinary knowledge and the simultaneous design of the product, process and system, as well as the knowledge of modern manufacturing and organizational methods and techniques
Fitness Landscape Analysis of Feed-Forward Neural Networks
Neural network training is a highly non-convex optimisation problem with poorly understood properties. Due to the inherent high dimensionality, neural network search spaces cannot be intuitively visualised, thus other means to establish search space properties have to be employed. Fitness landscape analysis encompasses a selection of techniques designed to estimate the properties of a search landscape associated with an optimisation problem. Applied to neural network training, fitness landscape analysis can be used to establish a link between the properties of the error landscape and various neural network hyperparameters. This study applies fitness landscape analysis to investigate the influence of the search space boundaries, regularisation parameters, loss functions, activation functions, and feed-forward neural network architectures on the properties of the resulting error landscape. A novel gradient-based sampling technique is proposed, together with a novel method to quantify and visualise stationary points and the associated basins of attraction in neural network error landscapes.Thesis (PhD)--University of Pretoria, 2019.NRFComputer SciencePhDUnrestricte
Adaptive control and neural network control of nonlinear discrete-time systems
Ph.DDOCTOR OF PHILOSOPH
- …