26,435 research outputs found
Proximal Mean-field for Neural Network Quantization
Compressing large Neural Networks (NN) by quantizing the parameters, while
maintaining the performance is highly desirable due to reduced memory and time
complexity. In this work, we cast NN quantization as a discrete labelling
problem, and by examining relaxations, we design an efficient iterative
optimization procedure that involves stochastic gradient descent followed by a
projection. We prove that our simple projected gradient descent approach is, in
fact, equivalent to a proximal version of the well-known mean-field method.
These findings would allow the decades-old and theoretically grounded research
on MRF optimization to be used to design better network quantization schemes.
Our experiments on standard classification datasets (MNIST, CIFAR10/100,
TinyImageNet) with convolutional and residual architectures show that our
algorithm obtains fully-quantized networks with accuracies very close to the
floating-point reference networks
On Training and Evaluation of Neural Network Approaches for Model Predictive Control
The contribution of this paper is a framework for training and evaluation of
Model Predictive Control (MPC) implemented using constrained neural networks.
Recent studies have proposed to use neural networks with differentiable convex
optimization layers to implement model predictive controllers. The motivation
is to replace real-time optimization in safety critical feedback control
systems with learnt mappings in the form of neural networks with optimization
layers. Such mappings take as the input the state vector and predict the
control law as the output. The learning takes place using training data
generated from off-line MPC simulations. However, a general framework for
characterization of learning approaches in terms of both model validation and
efficient training data generation is lacking in literature. In this paper, we
take the first steps towards developing such a coherent framework. We discuss
how the learning problem has similarities with system identification, in
particular input design, model structure selection and model validation. We
consider the study of neural network architectures in PyTorch with the explicit
MPC constraints implemented as a differentiable optimization layer using CVXPY.
We propose an efficient approach of generating MPC input samples subject to the
MPC model constraints using a hit-and-run sampler. The corresponding true
outputs are generated by solving the MPC offline using OSOP. We propose
different metrics to validate the resulting approaches. Our study further aims
to explore the advantages of incorporating domain knowledge into the network
structure from a training and evaluation perspective. Different model
structures are numerically tested using the proposed framework in order to
obtain more insights in the properties of constrained neural networks based
MPC
Fast L1-Minimization Algorithm for Sparse Approximation Based on an Improved LPNN-LCA framework
The aim of sparse approximation is to estimate a sparse signal according to
the measurement matrix and an observation vector. It is widely used in data
analytics, image processing, and communication, etc. Up to now, a lot of
research has been done in this area, and many off-the-shelf algorithms have
been proposed. However, most of them cannot offer a real-time solution. To some
extent, this shortcoming limits its application prospects. To address this
issue, we devise a novel sparse approximation algorithm based on Lagrange
programming neural network (LPNN), locally competitive algorithm (LCA), and
projection theorem. LPNN and LCA are both analog neural network which can help
us get a real-time solution. The non-differentiable objective function can be
solved by the concept of LCA. Utilizing the projection theorem, we further
modify the dynamics and proposed a new system with global asymptotic stability.
Simulation results show that the proposed sparse approximation method has the
real-time solutions with satisfactory MSEs
Constrained Deep Learning using Conditional Gradient and Applications in Computer Vision
A number of results have recently demonstrated the benefits of incorporating
various constraints when training deep architectures in vision and machine
learning. The advantages range from guarantees for statistical generalization
to better accuracy to compression. But support for general constraints within
widely used libraries remains scarce and their broader deployment within many
applications that can benefit from them remains under-explored. Part of the
reason is that Stochastic gradient descent (SGD), the workhorse for training
deep neural networks, does not natively deal with constraints with global scope
very well. In this paper, we revisit a classical first order scheme from
numerical optimization, Conditional Gradients (CG), that has, thus far had
limited applicability in training deep models. We show via rigorous analysis
how various constraints can be naturally handled by modifications of this
algorithm. We provide convergence guarantees and show a suite of immediate
benefits that are possible -- from training ResNets with fewer layers but
better accuracy simply by substituting in our version of CG to faster training
of GANs with 50% fewer epochs in image inpainting applications to provably
better generalization guarantees using efficiently implementable forms of
recently proposed regularizers
Learning Sparse Visual Representations with Leaky Capped Norm Regularizers
Sparsity inducing regularization is an important part for learning
over-complete visual representations. Despite the popularity of
regularization, in this paper, we investigate the usage of non-convex
regularizations in this problem. Our contribution consists of three parts.
First, we propose the leaky capped norm regularization (LCNR), which allows
model weights below a certain threshold to be regularized more strongly as
opposed to those above, therefore imposes strong sparsity and only introduces
controllable estimation bias. We propose a majorization-minimization algorithm
to optimize the joint objective function. Second, our study over monocular 3D
shape recovery and neural networks with LCNR outperforms and other
non-convex regularizations, achieving state-of-the-art performance and faster
convergence. Third, we prove a theoretical global convergence speed on the 3D
recovery problem. To the best of our knowledge, this is the first convergence
analysis of the 3D recovery problem
Rectified Factor Networks
We propose rectified factor networks (RFNs) to efficiently construct very
sparse, non-linear, high-dimensional representations of the input. RFN models
identify rare and small events in the input, have a low interference between
code units, have a small reconstruction error, and explain the data covariance
structure. RFN learning is a generalized alternating minimization algorithm
derived from the posterior regularization method which enforces non-negative
and normalized posterior means. We proof convergence and correctness of the RFN
learning algorithm. On benchmarks, RFNs are compared to other unsupervised
methods like autoencoders, RBMs, factor analysis, ICA, and PCA. In contrast to
previous sparse coding methods, RFNs yield sparser codes, capture the data's
covariance structure more precisely, and have a significantly smaller
reconstruction error. We test RFNs as pretraining technique for deep networks
on different vision datasets, where RFNs were superior to RBMs and
autoencoders. On gene expression data from two pharmaceutical drug discovery
studies, RFNs detected small and rare gene modules that revealed highly
relevant new biological insights which were so far missed by other unsupervised
methods.Comment: 9 pages + 49 pages supplemen
Controlling Neural Networks via Energy Dissipation
The last decade has shown a tremendous success in solving various computer
vision problems with the help of deep learning techniques. Lately, many works
have demonstrated that learning-based approaches with suitable network
architectures even exhibit superior performance for the solution of (ill-posed)
image reconstruction problems such as deblurring, super-resolution, or medical
image reconstruction. The drawback of purely learning-based methods, however,
is that they cannot provide provable guarantees for the trained network to
follow a given data formation process during inference. In this work we propose
energy dissipating networks that iteratively compute a descent direction with
respect to a given cost function or energy at the currently estimated
reconstruction. Therefore, an adaptive step size rule such as a line-search,
along with a suitable number of iterations can guarantee the reconstruction to
follow a given data formation model encoded in the energy to arbitrary
precision, and hence control the model's behavior even during test time. We
prove that under standard assumptions, descent using the direction predicted by
the network converges (linearly) to the global minimum of the energy. We
illustrate the effectiveness of the proposed approach in experiments on single
image super resolution and computed tomography (CT) reconstruction, and further
illustrate extensions to convex feasibility problems.Comment: Published as a conference paper at ICCV 2019, Seou
Convergence Analyses of Online ADAM Algorithm in Convex Setting and Two-Layer ReLU Neural Network
Nowadays, online learning is an appealing learning paradigm, which is of
great interest in practice due to the recent emergence of large scale
applications such as online advertising placement and online web ranking.
Standard online learning assumes a finite number of samples while in practice
data is streamed infinitely. In such a setting gradient descent with a
diminishing learning rate does not work. We first introduce regret with rolling
window, a new performance metric for online streaming learning, which measures
the performance of an algorithm on every fixed number of contiguous samples. At
the same time, we propose a family of algorithms based on gradient descent with
a constant or adaptive learning rate and provide very technical analyses
establishing regret bound properties of the algorithms. We cover the convex
setting showing the regret of the order of the square root of the size of the
window in the constant and dynamic learning rate scenarios. Our proof is
applicable also to the standard online setting where we provide the first
analysis of the same regret order (the previous proofs have flaws). We also
study a two layer neural network setting with ReLU activation. In this case we
establish that if initial weights are close to a stationary point, the same
square root regret bound is attainable. We conduct computational experiments
demonstrating a superior performance of the proposed algorithms
A Method Based on Convex Cone Model for Image-Set Classification with CNN Features
In this paper, we propose a method for image-set classification based on
convex cone models, focusing on the effectiveness of convolutional neural
network (CNN) features as inputs. CNN features have non-negative values when
using the rectified linear unit as an activation function. This naturally leads
us to model a set of CNN features by a convex cone and measure the geometric
similarity of convex cones for classification. To establish this framework, we
sequentially define multiple angles between two convex cones by repeating the
alternating least squares method and then define the geometric similarity
between the cones using the obtained angles. Moreover, to enhance our method,
we introduce a discriminant space, maximizing the between-class variance (gaps)
and minimizes the within-class variance of the projected convex cones onto the
discriminant space, similar to a Fisher discriminant analysis. Finally,
classification is based on the similarity between projected convex cones. The
effectiveness of the proposed method was demonstrated experimentally using a
private, multi-view hand shape dataset and two public databases.Comment: Accepted at the International Joint Conference on Neural Networks,
IJCNN, 201
Memory Bounded Deep Convolutional Networks
In this work, we investigate the use of sparsity-inducing regularizers during
training of Convolution Neural Networks (CNNs). These regularizers encourage
that fewer connections in the convolution and fully connected layers take
non-zero values and in effect result in sparse connectivity between hidden
units in the deep network. This in turn reduces the memory and runtime cost
involved in deploying the learned CNNs. We show that training with such
regularization can still be performed using stochastic gradient descent
implying that it can be used easily in existing codebases. Experimental
evaluation of our approach on MNIST, CIFAR, and ImageNet datasets shows that
our regularizers can result in dramatic reductions in memory requirements. For
instance, when applied on AlexNet, our method can reduce the memory consumption
by a factor of four with minimal loss in accuracy
- …