11,212 research outputs found
Magnification Control in Winner Relaxing Neural Gas
An important goal in neural map learning, which can conveniently be
accomplished by magnification control, is to achieve information optimal coding
in the sense of information theory. In the present contribution we consider the
winner relaxing approach for the neural gas network. Originally, winner
relaxing learning is a slight modification of the self-organizing map learning
rule that allows for adjustment of the magnification behavior by an a priori
chosen control parameter. We transfer this approach to the neural gas
algorithm. The magnification exponent can be calculated analytically for
arbitrary dimension from a continuum theory, and the entropy of the resulting
map is studied numerically conf irming the theoretical prediction. The
influence of a diagonal term, which can be added without impacting the
magnification, is studied numerically. This approach to maps of maximal mutual
information is interesting for applications as the winner relaxing term only
adds computational cost of same order and is easy to implement. In particular,
it is not necessary to estimate the generally unknown data probability density
as in other magnification control approaches.Comment: 14pages, 2 figure
Magnification Control in Self-Organizing Maps and Neural Gas
We consider different ways to control the magnification in self-organizing
maps (SOM) and neural gas (NG). Starting from early approaches of magnification
control in vector quantization, we then concentrate on different approaches for
SOM and NG. We show that three structurally similar approaches can be applied
to both algorithms: localized learning, concave-convex learning, and winner
relaxing learning. Thereby, the approach of concave-convex learning in SOM is
extended to a more general description, whereas the concave-convex learning for
NG is new. In general, the control mechanisms generate only slightly different
behavior comparing both neural algorithms. However, we emphasize that the NG
results are valid for any data dimension, whereas in the SOM case the results
hold only for the one-dimensional case.Comment: 24 pages, 4 figure
Towards Robust Neural Networks via Random Self-ensemble
Recent studies have revealed the vulnerability of deep neural networks: A
small adversarial perturbation that is imperceptible to human can easily make a
well-trained deep neural network misclassify. This makes it unsafe to apply
neural networks in security-critical applications. In this paper, we propose a
new defense algorithm called Random Self-Ensemble (RSE) by combining two
important concepts: {\bf randomness} and {\bf ensemble}. To protect a targeted
model, RSE adds random noise layers to the neural network to prevent the strong
gradient-based attacks, and ensembles the prediction over random noises to
stabilize the performance. We show that our algorithm is equivalent to ensemble
an infinite number of noisy models without any additional memory
overhead, and the proposed training procedure based on noisy stochastic
gradient descent can ensure the ensemble model has a good predictive
capability. Our algorithm significantly outperforms previous defense techniques
on real data sets. For instance, on CIFAR-10 with VGG network (which has 92\%
accuracy without any attack), under the strong C\&W attack within a certain
distortion tolerance, the accuracy of unprotected model drops to less than
10\%, the best previous defense technique has accuracy, while our method
still has prediction accuracy under the same level of attack. Finally,
our method is simple and easy to integrate into any neural network.Comment: ECCV 2018 camera read
BPGrad: Towards Global Optimality in Deep Learning via Branch and Pruning
Understanding the global optimality in deep learning (DL) has been attracting
more and more attention recently. Conventional DL solvers, however, have not
been developed intentionally to seek for such global optimality. In this paper
we propose a novel approximation algorithm, BPGrad, towards optimizing deep
models globally via branch and pruning. Our BPGrad algorithm is based on the
assumption of Lipschitz continuity in DL, and as a result it can adaptively
determine the step size for current gradient given the history of previous
updates, wherein theoretically no smaller steps can achieve the global
optimality. We prove that, by repeating such branch-and-pruning procedure, we
can locate the global optimality within finite iterations. Empirically an
efficient solver based on BPGrad for DL is proposed as well, and it outperforms
conventional DL solvers such as Adagrad, Adadelta, RMSProp, and Adam in the
tasks of object recognition, detection, and segmentation
- …