Search CORE

28,088 research outputs found

Cross-Entropy Loss and Low-Rank Features Have Responsibility for Adversarial Examples

Author: Nar Kamil
Ocal Orhan
Ramchandran Kannan
Sastry S. Shankar
Publication venue
Publication date: 24/01/2019
Field of study

State-of-the-art neural networks are vulnerable to adversarial examples; they can easily misclassify inputs that are imperceptibly different than their training and test data. In this work, we establish that the use of cross-entropy loss function and the low-rank features of the training data have responsibility for the existence of these inputs. Based on this observation, we suggest that addressing adversarial examples requires rethinking the use of cross-entropy loss function and looking for an alternative that is more suited for minimization with low-rank features. In this direction, we present a training scheme called differential training, which uses a loss function defined on the differences between the features of points from opposite classes. We show that differential training can ensure a large margin between the decision boundary of the neural network and the points in the training dataset. This larger margin increases the amount of perturbation needed to flip the prediction of the classifier and makes it harder to find an adversarial example with small perturbations. We test differential training on a binary classification task with CIFAR-10 dataset and demonstrate that it radically reduces the ratio of images for which an adversarial example could be found -- not only in the training dataset, but in the test dataset as well

arXiv.org e-Print Archive

A Dual-Dimer Method for Training Physics-Constrained Neural Networks with Minimax Architecture

Author: Liu Dehao
Wang Yan
Publication venue: 'Elsevier BV'
Publication date: 01/01/2021
Field of study

Data sparsity is a common issue to train machine learning tools such as neural networks for engineering and scientific applications, where experiments and simulations are expensive. Recently physics-constrained neural networks (PCNNs) were developed to reduce the required amount of training data. However, the weights of different losses from data and physical constraints are adjusted empirically in PCNNs. In this paper, a new physics-constrained neural network with the minimax architecture (PCNN-MM) is proposed so that the weights of different losses can be adjusted systematically. The training of the PCNN-MM is searching the high-order saddle points of the objective function. A novel saddle point search algorithm called Dual-Dimer method is developed. It is demonstrated that the Dual-Dimer method is computationally more efficient than the gradient descent ascent method for nonconvex-nonconcave functions and provides additional eigenvalue information to verify search results. A heat transfer example also shows that the convergence of PCNN-MMs is faster than that of traditional PCNNs.Comment: 34 pages, 5 figures, accepted by neural network

arXiv.org e-Print Archive

Training Recurrent Neural Networks via Dynamical Trajectory-Based Optimization

Author: Fadali M. Sami
Khodabandehlou Hamid
Publication venue: 'Elsevier BV'
Publication date: 10/05/2018
Field of study

This paper introduces a new method to train recurrent neural networks using dynamical trajectory-based optimization. The optimization method utilizes a projected gradient system (PGS) and a quotient gradient system (QGS) to determine the feasible regions of an optimization problem and search the feasible regions for local minima. By exploring the feasible regions, local minima are identified and the local minimum with the lowest cost is chosen as the global minimum of the optimization problem. Lyapunov theory is used to prove the stability of the local minima and their stability in the presence of measurement errors. Numerical examples show that the new approach provides better results than genetic algorithm and error backpropagation (EBP) trained networks

arXiv.org e-Print Archive

Statistical mechanics of complex neural systems and high dimensional data

Author: Advani Madhu
Ganguli Surya
Lahiri Subhaneil
Publication venue: 'IOP Publishing'
Publication date: 29/01/2013
Field of study

Recent experimental advances in neuroscience have opened new vistas into the immense complexity of neuronal networks. This proliferation of data challenges us on two parallel fronts. First, how can we form adequate theoretical frameworks for understanding how dynamical network processes cooperate across widely disparate spatiotemporal scales to solve important computational problems? And second, how can we extract meaningful models of neuronal systems from high dimensional datasets? To aid in these challenges, we give a pedagogical review of a collection of ideas and theoretical methods arising at the intersection of statistical physics, computer science and neurobiology. We introduce the interrelated replica and cavity methods, which originated in statistical physics as powerful ways to quantitatively analyze large highly heterogeneous systems of many interacting degrees of freedom. We also introduce the closely related notion of message passing in graphical models, which originated in computer science as a distributed algorithm capable of solving large inference and optimization problems involving many coupled variables. We then show how both the statistical physics and computer science perspectives can be applied in a wide diversity of contexts to problems arising in theoretical neuroscience and data analysis. Along the way we discuss spin glasses, learning theory, illusions of structure in noise, random matrices, dimensionality reduction, and compressed sensing, all within the unified formalism of the replica method. Moreover, we review recent conceptual connections between message passing in graphical models, and neural computation and learning. Overall, these ideas illustrate how statistical physics and computer science might provide a lens through which we can uncover emergent computational functions buried deep within the dynamical complexities of neuronal networks.Comment: 72 pages, 8 figures, iopart.cls, to appear in JSTA

arXiv.org e-Print Archive

Kernel-Based Training of Generative Networks

Author: Basioti Kalliopi
Moustakides George V.
Psarakis Emmanouil Z.
Publication venue
Publication date: 23/11/2018
Field of study

Generative adversarial networks (GANs) are designed with the help of min-max optimization problems that are solved with stochastic gradient-type algorithms which are known to be non-robust. In this work we revisit a non-adversarial method based on kernels which relies on a pure minimization problem and propose a simple stochastic gradient algorithm for the computation of its solution. Using simplified tools from Stochastic Approximation theory we demonstrate that batch versions of the algorithm or smoothing of the gradient do not improve convergence. These observations allow for the development of a training algorithm that enjoys reduced computational complexity and increased robustness while exhibiting similar synthesis characteristics as classical GANs

arXiv.org e-Print Archive

The Construction of High Order Convergent Look-Ahead Finite Difference Formulas for Zhang Neural Networks

Author: Uhlig Frank
Publication venue
Publication date: 23/04/2019
Field of study

Zhang Neural Networks rely on convergent 1-step ahead finite difference formulas of which very few are known. Those which are known have been constructed in ad-hoc ways and suffer from low truncation error orders. This paper develops a constructive method to find convergent look-ahead finite difference schemes of higher truncation error orders. The method consists of seeding the free variables of a linear system comprised of Taylor expansion coefficients followed by a minimization algorithm for the maximal magnitude root of the formula's characteristic polynomial. This helps us find new convergent 1-step ahead finite difference formulas of any truncation error order. Once a polynomial has been found with roots inside the complex unit circle and no repeated roots on it, the associated look-ahead ZNN discretization formula is convergent and can be used for solving any discretized ZNN based model. Our method recreates and validates the few known convergent formulas, all of which have truncation error orders at most 4. It also creates new convergent 1-step ahead difference formulas with truncation error orders 5 through 8

arXiv.org e-Print Archive

A Hebbian/Anti-Hebbian Neural Network for Linear Subspace Learning: A Derivation from Multidimensional Scaling of Streaming Data

Author: Chklovskii Dmitri B.
Hu Tao
Pehlevan Cengiz
Publication venue: 'MIT Press - Journals'
Publication date: 02/03/2015
Field of study

Neural network models of early sensory processing typically reduce the dimensionality of streaming input data. Such networks learn the principal subspace, in the sense of principal component analysis (PCA), by adjusting synaptic weights according to activity-dependent learning rules. When derived from a principled cost function these rules are nonlocal and hence biologically implausible. At the same time, biologically plausible local rules have been postulated rather than derived from a principled cost function. Here, to bridge this gap, we derive a biologically plausible network for subspace learning on streaming data by minimizing a principled cost function. In a departure from previous work, where cost was quantified by the representation, or reconstruction, error, we adopt a multidimensional scaling (MDS) cost function for streaming data. The resulting algorithm relies only on biologically plausible Hebbian and anti-Hebbian local learning rules. In a stochastic setting, synaptic weights converge to a stationary state which projects the input data onto the principal subspace. If the data are generated by a nonstationary distribution, the network can track the principal subspace. Thus, our result makes a step towards an algorithmic theory of neural computation.Comment: Accepted for publication in Neural Computatio

arXiv.org e-Print Archive

A linear approach for sparse coding by a two-layer neural network

Author: Montalto Alessandro
Prevete Roberto
Tessitore Giovanni
Publication venue
Publication date: 01/01/2015
Field of study

Many approaches to transform classification problems from non-linear to linear by feature transformation have been recently presented in the literature. These notably include sparse coding methods and deep neural networks. However, many of these approaches require the repeated application of a learning process upon the presentation of unseen data input vectors, or else involve the use of large numbers of parameters and hyper-parameters, which must be chosen through cross-validation, thus increasing running time dramatically. In this paper, we propose and experimentally investigate a new approach for the purpose of overcoming limitations of both kinds. The proposed approach makes use of a linear auto-associative network (called SCNN) with just one hidden layer. The combination of this architecture with a specific error function to be minimized enables one to learn a linear encoder computing a sparse code which turns out to be as similar as possible to the sparse coding that one obtains by re-training the neural network. Importantly, the linearity of SCNN and the choice of the error function allow one to achieve reduced running time in the learning phase. The proposed architecture is evaluated on the basis of two standard machine learning tasks. Its performances are compared with those of recently proposed non-linear auto-associative neural networks. The overall results suggest that linear encoders can be profitably used to obtain sparse data representations in the context of machine learning problems, provided that an appropriate error function is used during the learning phase

arXiv.org e-Print Archive

"Parallel Training Considered Harmful?": Comparing series-parallel and parallel feedforward network training

Author: Aguirre Luis A.
Ribeiro Antônio H.
Publication venue: 'Elsevier BV'
Publication date: 14/08/2018
Field of study

Neural network models for dynamic systems can be trained either in parallel or in series-parallel configurations. Influenced by early arguments, several papers justify the choice of series-parallel rather than parallel configuration claiming it has a lower computational cost, better stability properties during training and provides more accurate results. Other published results, on the other hand, defend parallel training as being more robust and capable of yielding more accu- rate long-term predictions. The main contribution of this paper is to present a study comparing both methods under the same unified framework. We focus on three aspects: i) robustness of the estimation in the presence of noise; ii) computational cost; and, iii) convergence. A unifying mathematical framework and simulation studies show situations where each training method provides better validation results, being parallel training better in what is believed to be more realistic scenarios. An example using measured data seems to reinforce such claim. We also show, with a novel complexity analysis and numerical examples, that both methods have similar computational cost, being series series-parallel training, however, more amenable to parallelization. Some informal discussion about stability and convergence properties is presented and explored in the examples

arXiv.org e-Print Archive

Recommended from our members

An improved connectionist activation function for energy minimization

Author: Dechter Rina
Pinkas Gadi
Publication venue: eScholarship, University of California
Publication date: 01/01/1992
Field of study

Symmetric networks that are based on energy minimization, such as Boltzmann machines or Hopfield nets, are used extensively for optimization, constraint satisfaction, and approximation of NP-hard problems. Nevertheless, finding a global minimum for the energy function is not guaranteed, and even a local minimum may take an exponential number of steps. We propose an improvement to the standard activation function used for such networks. The improved algorithm guarantees that a global minimum is found in linear time for tree-like subnetworks. The algorithm is uniform and does not assume that the network is a tree. It performs no worse than the standard algorithms for any network topology. In the case where there are trees growing from a cyclic subnetwork, the new algorithm performs better than the standard algorithms by avoiding local minima along the trees and by optimizing the free energy of these trees in linear time. The algorithm is self-stabilizing for trees (cycle-free undirected graphs) and remains correct under various scheduling demons. However, no uniform protocol exists to optimize trees under a pure distributed demon and no such protocol exists for cyclic networks under central demon

eScholarship - University of California