2,552 research outputs found

    Implicit Gradient Regularization

    Full text link
    Gradient descent can be surprisingly good at optimizing deep neural networks without overfitting and without explicit regularization. We find that the discrete steps of gradient descent implicitly regularize models by penalizing gradient descent trajectories that have large loss gradients. We call this Implicit Gradient Regularization (IGR) and we use backward error analysis to calculate the size of this regularization. We confirm empirically that implicit gradient regularization biases gradient descent toward flat minima, where test errors are small and solutions are robust to noisy parameter perturbations. Furthermore, we demonstrate that the implicit gradient regularization term can be used as an explicit regularizer, allowing us to control this gradient regularization directly. More broadly, our work indicates that backward error analysis is a useful theoretical approach to the perennial question of how learning rate, model size, and parameter regularization interact to determine the properties of overparameterized models optimized with gradient descent

    Why neural networks find simple solutions: the many regularizers of geometric complexity

    Full text link
    In many contexts, simpler models are preferable to more complex models and the control of this model complexity is the goal for many methods in machine learning such as regularization, hyperparameter tuning and architecture design. In deep learning, it has been difficult to understand the underlying mechanisms of complexity control, since many traditional measures are not naturally suitable for deep neural networks. Here we develop the notion of geometric complexity, which is a measure of the variability of the model function, computed using a discrete Dirichlet energy. Using a combination of theoretical arguments and empirical results, we show that many common training heuristics such as parameter norm regularization, spectral norm regularization, flatness regularization, implicit gradient regularization, noise regularization and the choice of parameter initialization all act to control geometric complexity, providing a unifying framework in which to characterize the behavior of deep learning models.Comment: Accepted as a NeurIPS 2022 pape

    Distinct Quantum States Can Be Compatible with a Single State of Reality

    Get PDF
    Perhaps the quantum state represents information about reality, and not reality directly. Wave function collapse is then possibly no more mysterious than a Bayesian update of a probability distribution given new data. We consider models for quantum systems with measurement outcomes determined by an underlying physical state of the system but where several quantum states are consistent with a single underlying state---i.e., probability distributions for distinct quantum states overlap. Significantly, we demonstrate by example that additional assumptions are always necessary to rule out such a model.Comment: 5 pages, 2 figure

    Pseudospectral Calculation of the Wavefunction of Helium and the Negative Hydrogen Ion

    Full text link
    We study the numerical solution of the non-relativistic Schr\"{o}dinger equation for two-electron atoms in ground and excited S-states using pseudospectral (PS) methods of calculation. The calculation achieves convergence rates for the energy, Cauchy error in the wavefunction, and variance in local energy that are exponentially fast for all practical purposes. The method requires three separate subdomains to handle the wavefunction's cusp-like behavior near the two-particle coalescences. The use of three subdomains is essential to maintaining exponential convergence. A comparison of several different treatments of the cusps and the semi-infinite domain suggest that the simplest prescription is sufficient. For many purposes it proves unnecessary to handle the logarithmic behavior near the three-particle coalescence in a special way. The PS method has many virtues: no explicit assumptions need be made about the asymptotic behavior of the wavefunction near cusps or at large distances, the local energy is exactly equal to the calculated global energy at all collocation points, local errors go down everywhere with increasing resolution, the effective basis using Chebyshev polynomials is complete and simple, and the method is easily extensible to other bound states. This study serves as a proof-of-principle of the method for more general two- and possibly three-electron applications.Comment: 23 pages, 20 figures, 2 tables, Final refereed version - Some references added, some stylistic changes, added paragraph to matrix methods section, added last sentence to abstract
    • …
    corecore