21 research outputs found
The Convex Landscape of Neural Networks: Characterizing Global Optima and Stationary Points via Lasso Models
Due to the non-convex nature of training Deep Neural Network (DNN) models,
their effectiveness relies on the use of non-convex optimization heuristics.
Traditional methods for training DNNs often require costly empirical methods to
produce successful models and do not have a clear theoretical foundation. In
this study, we examine the use of convex optimization theory and sparse
recovery models to refine the training process of neural networks and provide a
better interpretation of their optimal weights. We focus on training two-layer
neural networks with piecewise linear activations and demonstrate that they can
be formulated as a finite-dimensional convex program. These programs include a
regularization term that promotes sparsity, which constitutes a variant of
group Lasso. We first utilize semi-infinite programming theory to prove strong
duality for finite width neural networks and then we express these
architectures equivalently as high dimensional convex sparse recovery models.
Remarkably, the worst-case complexity to solve the convex program is polynomial
in the number of samples and number of neurons when the rank of the data matrix
is bounded, which is the case in convolutional networks. To extend our method
to training data of arbitrary rank, we develop a novel polynomial-time
approximation scheme based on zonotope subsampling that comes with a guaranteed
approximation ratio. We also show that all the stationary of the nonconvex
training objective can be characterized as the global optimum of a subsampled
convex program. Our convex models can be trained using standard convex solvers
without resorting to heuristics or extensive hyper-parameter tuning unlike
non-convex methods. Through extensive numerical experiments, we show that
convex models can outperform traditional non-convex methods and are not
sensitive to optimizer hyperparameters.Comment: A preliminary version of part of this work was published at ICML 2020
with the title "Neural Networks are Convex Regularizers: Exact
Polynomial-time Convex Optimization Formulations for Two-layer Networks
Path Regularization: A Convexity and Sparsity Inducing Regularization for Parallel ReLU Networks
Understanding the fundamental principles behind the success of deep neural
networks is one of the most important open questions in the current literature.
To this end, we study the training problem of deep neural networks and
introduce an analytic approach to unveil hidden convexity in the optimization
landscape. We consider a deep parallel ReLU network architecture, which also
includes standard deep networks and ResNets as its special cases. We then show
that pathwise regularized training problems can be represented as an exact
convex optimization problem. We further prove that the equivalent convex
problem is regularized via a group sparsity inducing norm. Thus, a path
regularized parallel ReLU network can be viewed as a parsimonious convex model
in high dimensions. More importantly, we show that the computational complexity
required to globally optimize the equivalent convex problem is fully
polynomial-time in feature dimension and number of samples. Therefore, we prove
polynomial-time trainability of path regularized ReLU networks with global
optimality guarantees. We also provide several numerical experiments
corroborating our theory
Convexifying Transformers: Improving optimization and understanding of transformer networks
Understanding the fundamental mechanism behind the success of transformer
networks is still an open problem in the deep learning literature. Although
their remarkable performance has been mostly attributed to the self-attention
mechanism, the literature still lacks a solid analysis of these networks and
interpretation of the functions learned by them. To this end, we study the
training problem of attention/transformer networks and introduce a novel convex
analytic approach to improve the understanding and optimization of these
networks. Particularly, we first introduce a convex alternative to the
self-attention mechanism and reformulate the regularized training problem of
transformer networks with our alternative convex attention. Then, we cast the
reformulation as a convex optimization problem that is interpretable and easier
to optimize. Moreover, as a byproduct of our convex analysis, we reveal an
implicit regularization mechanism, which promotes sparsity across tokens.
Therefore, we not only improve the optimization of attention/transformer
networks but also provide a solid theoretical understanding of the functions
learned by them. We also demonstrate the effectiveness of our theory through
several numerical experiments
Fixing the NTK: From Neural Network Linearizations to Exact Convex Programs
Recently, theoretical analyses of deep neural networks have broadly focused
on two directions: 1) Providing insight into neural network training by SGD in
the limit of infinite hidden-layer width and infinitesimally small learning
rate (also known as gradient flow) via the Neural Tangent Kernel (NTK), and 2)
Globally optimizing the regularized training objective via cone-constrained
convex reformulations of ReLU networks. The latter research direction also
yielded an alternative formulation of the ReLU network, called a gated ReLU
network, that is globally optimizable via efficient unconstrained convex
programs. In this work, we interpret the convex program for this gated ReLU
network as a Multiple Kernel Learning (MKL) model with a weighted data masking
feature map and establish a connection to the NTK. Specifically, we show that
for a particular choice of mask weights that do not depend on the learning
targets, this kernel is equivalent to the NTK of the gated ReLU network on the
training data. A consequence of this lack of dependence on the targets is that
the NTK cannot perform better than the optimal MKL kernel on the training set.
By using iterative reweighting, we improve the weights induced by the NTK to
obtain the optimal MKL kernel which is equivalent to the solution of the exact
convex reformulation of the gated ReLU network. We also provide several
numerical simulations corroborating our theory. Additionally, we provide an
analysis of the prediction error of the resulting optimal kernel via
consistency results for the group lasso.Comment: Accepted to Neurips 202
CalibFPA: A Focal Plane Array Imaging System based on Online Deep-Learning Calibration
Compressive focal plane arrays (FPA) enable cost-effective high-resolution
(HR) imaging by acquisition of several multiplexed measurements on a
low-resolution (LR) sensor. Multiplexed encoding of the visual scene is
typically performed via electronically controllable spatial light modulators
(SLM). An HR image is then reconstructed from the encoded measurements by
solving an inverse problem that involves the forward model of the imaging
system. To capture system non-idealities such as optical aberrations, a
mainstream approach is to conduct an offline calibration scan to measure the
system response for a point source at each spatial location on the imaging
grid. However, it is challenging to run calibration scans when using structured
SLMs as they cannot encode individual grid locations. In this study, we propose
a novel compressive FPA system based on online deep-learning calibration of
multiplexed LR measurements (CalibFPA). We introduce a piezo-stage that
locomotes a pre-printed fixed coded aperture. A deep neural network is then
leveraged to correct for the influences of system non-idealities in multiplexed
measurements without the need for offline calibration scans. Finally, a deep
plug-and-play algorithm is used to reconstruct images from corrected
measurements. On simulated and experimental datasets, we demonstrate that
CalibFPA outperforms state-of-the-art compressive FPA methods. We also report
analyses to validate the design elements in CalibFPA and assess computational
complexity