4 research outputs found
Network insensitivity to parameter noise via adversarial regularization
Neuromorphic neural network processors, in the form of compute-in-memory
crossbar arrays of memristors, or in the form of subthreshold analog and
mixed-signal ASICs, promise enormous advantages in compute density and energy
efficiency for NN-based ML tasks. However, these technologies are prone to
computational non-idealities, due to process variation and intrinsic device
physics. This degrades the task performance of networks deployed to the
processor, by introducing parameter noise into the deployed model. While it is
possible to calibrate each device, or train networks individually for each
processor, these approaches are expensive and impractical for commercial
deployment. Alternative methods are therefore needed to train networks that are
inherently robust against parameter variation, as a consequence of network
architecture and parameters. We present a new adversarial network optimisation
algorithm that attacks network parameters during training, and promotes robust
performance during inference in the face of parameter variation. Our approach
introduces a regularization term penalising the susceptibility of a network to
weight perturbation. We compare against previous approaches for producing
parameter insensitivity such as dropout, weight smoothing and introducing
parameter noise during training. We show that our approach produces models that
are more robust to targeted parameter variation, and equally robust to random
parameter variation. Our approach finds minima in flatter locations in the
weight-loss landscape compared with other approaches, highlighting that the
networks found by our technique are less sensitive to parameter perturbation.
Our work provides an approach to deploy neural network architectures to
inference devices that suffer from computational non-idealities, with minimal
loss of performance. ..
Recommended from our members
Computational Inversion with Wasserstein Distances and Neural Network Induced Loss Functions
This thesis presents a systematic computational investigation of loss functions in solving inverse problems of partial differential equations. The primary efforts are spent on understanding optimization-based computational inversion with loss functions defined with the Wasserstein metrics and with deep learning models. The scientific contributions of the thesis can be summarized in two directions.
In the first part of this thesis, we investigate the general impacts of different Wasserstein metrics and the properties of the approximate solutions to inverse problems obtained by minimizing loss functions based on such metrics. We contrast the results to those of classical computational inversion with loss functions based on the ² and ⁻ metric. We identify critical parameters, both in the metrics and the inverse problems to be solved, that control the performance of the reconstruction algorithms. We highlight the frequency disparity in the reconstructions with the Wasserstein metrics as well as its consequences, for instance, the pre-conditioning effect, the robustness against high-frequency noise, and the loss of resolution when data used contain random noise. We examine the impact of mass unbalance and conduct a comparative study on the differences and important factors of various unbalanced Wasserstein metrics.
In the second part of the thesis, we propose loss functions formed on a novel offline-online computational strategy for coupling classical least-square computational inversion with modern deep learning approaches for full waveform inversion (FWI) to achieve advantages that can not be achieved with only one component. In a nutshell, we develop an offline learning strategy to construct a robust approximation to the inverse operator and utilize it to produce a viable initial guess and design a new loss function for the online inversion with a new dataset. We demonstrate through both theoretical analysis and numerical simulations that our neural network induced loss functions developed by the coupling strategy improve the loss landscape as well as computational efficiency of FWI with reliable offline training on moderate computational resources in terms of both the size of the training dataset and the computational cost needed