32 research outputs found

    Combining Machine Learning and Physics to Understand Glassy Systems

    Full text link
    Our understanding of supercooled liquids and glasses has lagged significantly behind that of simple liquids and crystalline solids. This is in part due to the many possibly relevant degrees of freedom that are present due to the disorder inherent to these systems and in part to non-equilibrium effects which are difficult to treat in the standard context of statistical physics. Together these issues have resulted in a field whose theories are under-constrained by experiment and where fundamental questions are still unresolved. Mean field results have been successful in infinite dimensions but it is unclear to what extent they apply to realistic systems and assume uniform local structure. At odds with this are theories premised on the existence of structural defects. However, until recently it has been impossible to find structural signatures that are predictive of dynamics. Here we summarize and recast the results from several recent papers offering a data driven approach to building a phenomenological theory of disordered materials by combining machine learning with physical intuition

    Mean Field Residual Networks: On the Edge of Chaos

    Full text link
    We study randomly initialized residual networks using mean field theory and the theory of difference equations. Classical feedforward neural networks, such as those with tanh activations, exhibit exponential behavior on the average when propagating inputs forward or gradients backward. The exponential forward dynamics causes rapid collapsing of the input space geometry, while the exponential backward dynamics causes drastic vanishing or exploding gradients. We show, in contrast, that by adding skip connections, the network will, depending on the nonlinearity, adopt subexponential forward and backward dynamics, and in many cases in fact polynomial. The exponents of these polynomials are obtained through analytic methods and proved and verified empirically to be correct. In terms of the "edge of chaos" hypothesis, these subexponential and polynomial laws allow residual networks to "hover over the boundary between stability and chaos," thus preserving the geometry of the input space and the gradient information flow. In our experiments, for each activation function we study here, we initialize residual networks with different hyperparameters and train them on MNIST. Remarkably, our initialization time theory can accurately predict test time performance of these networks, by tracking either the expected amount of gradient explosion or the expected squared distance between the images of two input vectors. Importantly, we show, theoretically as well as empirically, that common initializations such as the Xavier or the He schemes are not optimal for residual networks, because the optimal initialization variances depend on the depth. Finally, we have made mathematical contributions by deriving several new identities for the kernels of powers of ReLU functions by relating them to the zeroth Bessel function of the second kind.Comment: NIPS 201

    The Emergence of Spectral Universality in Deep Networks

    Full text link
    Recent work has shown that tight concentration of the entire spectrum of singular values of a deep network's input-output Jacobian around one at initialization can speed up learning by orders of magnitude. Therefore, to guide important design choices, it is important to build a full theoretical understanding of the spectra of Jacobians at initialization. To this end, we leverage powerful tools from free probability theory to provide a detailed analytic understanding of how a deep network's Jacobian spectrum depends on various hyperparameters including the nonlinearity, the weight and bias distributions, and the depth. For a variety of nonlinearities, our work reveals the emergence of new universal limiting spectral distributions that remain concentrated around one even as the depth goes to infinity.Comment: 17 pages, 4 figures. Appearing at the 21st International Conference on Artificial Intelligence and Statistics (AISTATS) 201

    Predicting plasticity with soft vibrational modes: from dislocations to glasses

    Full text link
    We show that quasi localized low-frequency modes in the vibrational spectrum can be used to construct soft spots, or regions vulnerable to rearrangement, which serve as a universal tool for the identification of flow defects in solids. We show that soft spots not only encode spatial information, via their location, but also directional information, via directors for particles within each soft spot. Single crystals with isolated dislocations exhibit low-frequency phonon modes that localize at the core, and their polarization pattern predicts the motion of atoms during elementary dislocation glide in exquisite detail. Even in polycrystals and disordered solids, we find that the directors associated with particles in soft spots are highly correlated with the direction of particle displacements in rearrangements

    Deep equilibrium networks are sensitive to initialization statistics

    Full text link
    Deep equilibrium networks (DEQs) are a promising way to construct models which trade off memory for compute. However, theoretical understanding of these models is still lacking compared to traditional networks, in part because of the repeated application of a single set of weights. We show that DEQs are sensitive to the higher order statistics of the matrix families from which they are initialized. In particular, initializing with orthogonal or symmetric matrices allows for greater stability in training. This gives us a practical prescription for initializations which allow for training with a broader range of initial weight scales

    Disentangling Trainability and Generalization in Deep Neural Networks

    Full text link
    A longstanding goal in the theory of deep learning is to characterize the conditions under which a given neural network architecture will be trainable, and if so, how well it might generalize to unseen data. In this work, we provide such a characterization in the limit of very wide and very deep networks, for which the analysis simplifies considerably. For wide networks, the trajectory under gradient descent is governed by the Neural Tangent Kernel (NTK), and for deep networks the NTK itself maintains only weak data dependence. By analyzing the spectrum of the NTK, we formulate necessary conditions for trainability and generalization across a range of architectures, including Fully Connected Networks (FCNs) and Convolutional Neural Networks (CNNs). We identify large regions of hyperparameter space for which networks can memorize the training set but completely fail to generalize. We find that CNNs without global average pooling behave almost identically to FCNs, but that CNNs with pooling have markedly different and often better generalization performance. These theoretical results are corroborated experimentally on CIFAR10 for a variety of network architectures and we include a colab notebook that reproduces the essential results of the paper.Comment: 22 pages, 3 figures, ICML 2020. Associated Colab notebook at https://colab.research.google.com/github/google/neural-tangents/blob/master/notebooks/Disentangling_Trainability_and_Generalization.ipyn

    Dynamical Isometry and a Mean Field Theory of RNNs: Gating Enables Signal Propagation in Recurrent Neural Networks

    Full text link
    Recurrent neural networks have gained widespread use in modeling sequence data across various domains. While many successful recurrent architectures employ a notion of gating, the exact mechanism that enables such remarkable performance is not well understood. We develop a theory for signal propagation in recurrent networks after random initialization using a combination of mean field theory and random matrix theory. To simplify our discussion, we introduce a new RNN cell with a simple gating mechanism that we call the minimalRNN and compare it with vanilla RNNs. Our theory allows us to define a maximum timescale over which RNNs can remember an input. We show that this theory predicts trainability for both recurrent architectures. We show that gated recurrent networks feature a much broader, more robust, trainable region than vanilla RNNs, which corroborates recent experimental findings. Finally, we develop a closed-form critical initialization scheme that achieves dynamical isometry in both vanilla RNNs and minimalRNNs. We show that this results in significantly improvement in training dynamics. Finally, we demonstrate that the minimalRNN achieves comparable performance to its more complex counterparts, such as LSTMs or GRUs, on a language modeling task.Comment: ICML 2018 Conference Proceeding

    Neural Message Passing for Quantum Chemistry

    Full text link
    Supervised learning on molecules has incredible potential to be useful in chemistry, drug discovery, and materials science. Luckily, several promising and closely related neural network models invariant to molecular symmetries have already been described in the literature. These models learn a message passing algorithm and aggregation procedure to compute a function of their entire input graph. At this point, the next step is to find a particularly effective variant of this general approach and apply it to chemical prediction benchmarks until we either solve them or reach the limits of the approach. In this paper, we reformulate existing models into a single common framework we call Message Passing Neural Networks (MPNNs) and explore additional novel variations within this framework. Using MPNNs we demonstrate state of the art results on an important molecular property prediction benchmark; these results are strong enough that we believe future work should focus on datasets with larger molecules or more accurate ground truth labels.Comment: 14 page

    A structural approach to relaxation in glassy liquids

    Full text link
    When a liquid freezes, a change in the local atomic structure marks the transition to the crystal. When a liquid is cooled to form a glass, however, no noticeable structural change marks the glass transition. Indeed, characteristic features of glassy dynamics that appear below an onset temperature, T_0, are qualitatively captured by mean field theory, which assumes uniform local structure at all temperatures. Even studies of more realistic systems have found only weak correlations between structure and dynamics. This raises the question: is structure important to glassy dynamics in three dimensions? Here, we answer this question affirmatively by using machine learning methods to identify a new field, that we call softness, which characterizes local structure and is strongly correlated with rearrangement dynamics. We find that the onset of glassy dynamics at T_0 is marked by the onset of correlations between softness (i.e. structure) and dynamics. Moreover, we use softness to construct a simple model of slow glassy relaxation that is in excellent agreement with our simulation results, showing that a theory of the evolution of softness in time would constitute a theory of glassy dynamics

    Stability of jammed packings II: the transverse length scale

    Full text link
    As a function of packing fraction at zero temperature and applied stress, an amorphous packing of spheres exhibits a jamming transition where the system is sensitive to boundary conditions even in the thermodynamic limit. Upon further compression, the system should become insensitive to boundary conditions provided it is sufficiently large. Here we explore the linear response to a large class of boundary perturbations in 2 and 3 dimensions. We consider each finite packing with periodic-boundary conditions as the basis of an infinite square or cubic lattice and study properties of vibrational modes at arbitrary wave vector. We find that the stability of such modes be understood in terms of a competition between plane waves and the anomalous vibrational modes associated with the jamming transition; infinitesimal boundary perturbations become irrelevant for systems that are larger than a length scale that characterizes the transverse excitations. This previously identified length diverges at the jamming transition.Comment: 8 pages, 5 figure
    corecore