28,088 research outputs found
Cross-Entropy Loss and Low-Rank Features Have Responsibility for Adversarial Examples
State-of-the-art neural networks are vulnerable to adversarial examples; they
can easily misclassify inputs that are imperceptibly different than their
training and test data. In this work, we establish that the use of
cross-entropy loss function and the low-rank features of the training data have
responsibility for the existence of these inputs. Based on this observation, we
suggest that addressing adversarial examples requires rethinking the use of
cross-entropy loss function and looking for an alternative that is more suited
for minimization with low-rank features. In this direction, we present a
training scheme called differential training, which uses a loss function
defined on the differences between the features of points from opposite
classes. We show that differential training can ensure a large margin between
the decision boundary of the neural network and the points in the training
dataset. This larger margin increases the amount of perturbation needed to flip
the prediction of the classifier and makes it harder to find an adversarial
example with small perturbations. We test differential training on a binary
classification task with CIFAR-10 dataset and demonstrate that it radically
reduces the ratio of images for which an adversarial example could be found --
not only in the training dataset, but in the test dataset as well
A Dual-Dimer Method for Training Physics-Constrained Neural Networks with Minimax Architecture
Data sparsity is a common issue to train machine learning tools such as
neural networks for engineering and scientific applications, where experiments
and simulations are expensive. Recently physics-constrained neural networks
(PCNNs) were developed to reduce the required amount of training data. However,
the weights of different losses from data and physical constraints are adjusted
empirically in PCNNs. In this paper, a new physics-constrained neural network
with the minimax architecture (PCNN-MM) is proposed so that the weights of
different losses can be adjusted systematically. The training of the PCNN-MM is
searching the high-order saddle points of the objective function. A novel
saddle point search algorithm called Dual-Dimer method is developed. It is
demonstrated that the Dual-Dimer method is computationally more efficient than
the gradient descent ascent method for nonconvex-nonconcave functions and
provides additional eigenvalue information to verify search results. A heat
transfer example also shows that the convergence of PCNN-MMs is faster than
that of traditional PCNNs.Comment: 34 pages, 5 figures, accepted by neural network
Training Recurrent Neural Networks via Dynamical Trajectory-Based Optimization
This paper introduces a new method to train recurrent neural networks using
dynamical trajectory-based optimization. The optimization method utilizes a
projected gradient system (PGS) and a quotient gradient system (QGS) to
determine the feasible regions of an optimization problem and search the
feasible regions for local minima. By exploring the feasible regions, local
minima are identified and the local minimum with the lowest cost is chosen as
the global minimum of the optimization problem. Lyapunov theory is used to
prove the stability of the local minima and their stability in the presence of
measurement errors. Numerical examples show that the new approach provides
better results than genetic algorithm and error backpropagation (EBP) trained
networks
Statistical mechanics of complex neural systems and high dimensional data
Recent experimental advances in neuroscience have opened new vistas into the
immense complexity of neuronal networks. This proliferation of data challenges
us on two parallel fronts. First, how can we form adequate theoretical
frameworks for understanding how dynamical network processes cooperate across
widely disparate spatiotemporal scales to solve important computational
problems? And second, how can we extract meaningful models of neuronal systems
from high dimensional datasets? To aid in these challenges, we give a
pedagogical review of a collection of ideas and theoretical methods arising at
the intersection of statistical physics, computer science and neurobiology. We
introduce the interrelated replica and cavity methods, which originated in
statistical physics as powerful ways to quantitatively analyze large highly
heterogeneous systems of many interacting degrees of freedom. We also introduce
the closely related notion of message passing in graphical models, which
originated in computer science as a distributed algorithm capable of solving
large inference and optimization problems involving many coupled variables. We
then show how both the statistical physics and computer science perspectives
can be applied in a wide diversity of contexts to problems arising in
theoretical neuroscience and data analysis. Along the way we discuss spin
glasses, learning theory, illusions of structure in noise, random matrices,
dimensionality reduction, and compressed sensing, all within the unified
formalism of the replica method. Moreover, we review recent conceptual
connections between message passing in graphical models, and neural computation
and learning. Overall, these ideas illustrate how statistical physics and
computer science might provide a lens through which we can uncover emergent
computational functions buried deep within the dynamical complexities of
neuronal networks.Comment: 72 pages, 8 figures, iopart.cls, to appear in JSTA
Kernel-Based Training of Generative Networks
Generative adversarial networks (GANs) are designed with the help of min-max
optimization problems that are solved with stochastic gradient-type algorithms
which are known to be non-robust. In this work we revisit a non-adversarial
method based on kernels which relies on a pure minimization problem and propose
a simple stochastic gradient algorithm for the computation of its solution.
Using simplified tools from Stochastic Approximation theory we demonstrate that
batch versions of the algorithm or smoothing of the gradient do not improve
convergence. These observations allow for the development of a training
algorithm that enjoys reduced computational complexity and increased robustness
while exhibiting similar synthesis characteristics as classical GANs
The Construction of High Order Convergent Look-Ahead Finite Difference Formulas for Zhang Neural Networks
Zhang Neural Networks rely on convergent 1-step ahead finite difference
formulas of which very few are known. Those which are known have been
constructed in ad-hoc ways and suffer from low truncation error orders. This
paper develops a constructive method to find convergent look-ahead finite
difference schemes of higher truncation error orders. The method consists of
seeding the free variables of a linear system comprised of Taylor expansion
coefficients followed by a minimization algorithm for the maximal magnitude
root of the formula's characteristic polynomial. This helps us find new
convergent 1-step ahead finite difference formulas of any truncation error
order. Once a polynomial has been found with roots inside the complex unit
circle and no repeated roots on it, the associated look-ahead ZNN
discretization formula is convergent and can be used for solving any
discretized ZNN based model. Our method recreates and validates the few known
convergent formulas, all of which have truncation error orders at most 4. It
also creates new convergent 1-step ahead difference formulas with truncation
error orders 5 through 8
A Hebbian/Anti-Hebbian Neural Network for Linear Subspace Learning: A Derivation from Multidimensional Scaling of Streaming Data
Neural network models of early sensory processing typically reduce the
dimensionality of streaming input data. Such networks learn the principal
subspace, in the sense of principal component analysis (PCA), by adjusting
synaptic weights according to activity-dependent learning rules. When derived
from a principled cost function these rules are nonlocal and hence biologically
implausible. At the same time, biologically plausible local rules have been
postulated rather than derived from a principled cost function. Here, to bridge
this gap, we derive a biologically plausible network for subspace learning on
streaming data by minimizing a principled cost function. In a departure from
previous work, where cost was quantified by the representation, or
reconstruction, error, we adopt a multidimensional scaling (MDS) cost function
for streaming data. The resulting algorithm relies only on biologically
plausible Hebbian and anti-Hebbian local learning rules. In a stochastic
setting, synaptic weights converge to a stationary state which projects the
input data onto the principal subspace. If the data are generated by a
nonstationary distribution, the network can track the principal subspace. Thus,
our result makes a step towards an algorithmic theory of neural computation.Comment: Accepted for publication in Neural Computatio
A linear approach for sparse coding by a two-layer neural network
Many approaches to transform classification problems from non-linear to
linear by feature transformation have been recently presented in the
literature. These notably include sparse coding methods and deep neural
networks. However, many of these approaches require the repeated application of
a learning process upon the presentation of unseen data input vectors, or else
involve the use of large numbers of parameters and hyper-parameters, which must
be chosen through cross-validation, thus increasing running time dramatically.
In this paper, we propose and experimentally investigate a new approach for the
purpose of overcoming limitations of both kinds. The proposed approach makes
use of a linear auto-associative network (called SCNN) with just one hidden
layer. The combination of this architecture with a specific error function to
be minimized enables one to learn a linear encoder computing a sparse code
which turns out to be as similar as possible to the sparse coding that one
obtains by re-training the neural network. Importantly, the linearity of SCNN
and the choice of the error function allow one to achieve reduced running time
in the learning phase. The proposed architecture is evaluated on the basis of
two standard machine learning tasks. Its performances are compared with those
of recently proposed non-linear auto-associative neural networks. The overall
results suggest that linear encoders can be profitably used to obtain sparse
data representations in the context of machine learning problems, provided that
an appropriate error function is used during the learning phase
"Parallel Training Considered Harmful?": Comparing series-parallel and parallel feedforward network training
Neural network models for dynamic systems can be trained either in parallel
or in series-parallel configurations. Influenced by early arguments, several
papers justify the choice of series-parallel rather than parallel configuration
claiming it has a lower computational cost, better stability properties during
training and provides more accurate results. Other published results, on the
other hand, defend parallel training as being more robust and capable of
yielding more accu- rate long-term predictions. The main contribution of this
paper is to present a study comparing both methods under the same unified
framework. We focus on three aspects: i) robustness of the estimation in the
presence of noise; ii) computational cost; and, iii) convergence. A unifying
mathematical framework and simulation studies show situations where each
training method provides better validation results, being parallel training
better in what is believed to be more realistic scenarios. An example using
measured data seems to reinforce such claim. We also show, with a novel
complexity analysis and numerical examples, that both methods have similar
computational cost, being series series-parallel training, however, more
amenable to parallelization. Some informal discussion about stability and
convergence properties is presented and explored in the examples
Recommended from our members
An improved connectionist activation function for energy minimization
Symmetric networks that are based on energy minimization, such as Boltzmann machines or Hopfield nets, are used extensively for optimization, constraint satisfaction, and approximation of NP-hard problems. Nevertheless, finding a global minimum for the energy function is not guaranteed, and even a local minimum may take an exponential number of steps. We propose an improvement to the standard activation function used for such networks. The improved algorithm guarantees that a global minimum is found in linear time for tree-like subnetworks. The algorithm is uniform and does not assume that the network is a tree. It performs no worse than the standard algorithms for any network topology. In the case where there are trees growing from a cyclic subnetwork, the new algorithm performs better than the standard algorithms by avoiding local minima along the trees and by optimizing the free energy of these trees in linear time. The algorithm is self-stabilizing for trees (cycle-free undirected graphs) and remains correct under various scheduling demons. However, no uniform protocol exists to optimize trees under a pure distributed demon and no such protocol exists for cyclic networks under central demon
- …