6,222 research outputs found
Proximal Mean Field Learning in Shallow Neural Networks
We propose a custom learning algorithm for shallow over-parameterized neural
networks, i.e., networks with single hidden layer having infinite width. The
infinite width of the hidden layer serves as an abstraction for the
over-parameterization. Building on the recent mean field interpretations of
learning dynamics in shallow neural networks, we realize mean field learning as
a computational algorithm, rather than as an analytical tool. Specifically, we
design a Sinkhorn regularized proximal algorithm to approximate the
distributional flow for the learning dynamics over weighted point clouds. In
this setting, a contractive fixed point recursion computes the time-varying
weights, numerically realizing the interacting Wasserstein gradient flow of the
parameter distribution supported over the neuronal ensemble. An appealing
aspect of the proposed algorithm is that the measure-valued recursions allow
meshless computation. We demonstrate the proposed computational framework of
interacting weighted particle evolution on binary and multi-class
classification. Our algorithm performs gradient descent of the free energy
associated with the risk functional
Gradient Flows for Sampling: Mean-Field Models, Gaussian Approximations and Affine Invariance
Sampling a probability distribution with an unknown normalization constant is
a fundamental problem in computational science and engineering. This task may
be cast as an optimization problem over all probability measures, and an
initial distribution can be evolved to the desired minimizer dynamically via
gradient flows. Mean-field models, whose law is governed by the gradient flow
in the space of probability measures, may also be identified; particle
approximations of these mean-field models form the basis of algorithms. The
gradient flow approach is also the basis of algorithms for variational
inference, in which the optimization is performed over a parameterized family
of probability distributions such as Gaussians, and the underlying gradient
flow is restricted to the parameterized family.
By choosing different energy functionals and metrics for the gradient flow,
different algorithms with different convergence properties arise. In this
paper, we concentrate on the Kullback-Leibler divergence after showing that, up
to scaling, it has the unique property that the gradient flows resulting from
this choice of energy do not depend on the normalization constant. For the
metrics, we focus on variants of the Fisher-Rao, Wasserstein, and Stein
metrics; we introduce the affine invariance property for gradient flows, and
their corresponding mean-field models, determine whether a given metric leads
to affine invariance, and modify it to make it affine invariant if it does not.
We study the resulting gradient flows in both probability density space and
Gaussian space. The flow in the Gaussian space may be understood as a Gaussian
approximation of the flow. We demonstrate that the Gaussian approximation based
on the metric and through moment closure coincide, establish connections
between them, and study their long-time convergence properties showing the
advantages of affine invariance.Comment: 82 pages, 8 figures (Welcome any feedback!
Resolving transition metal chemical space: feature selection for machine learning and structure-property relationships
Machine learning (ML) of quantum mechanical properties shows promise for
accelerating chemical discovery. For transition metal chemistry where accurate
calculations are computationally costly and available training data sets are
small, the molecular representation becomes a critical ingredient in ML model
predictive accuracy. We introduce a series of revised autocorrelation functions
(RACs) that encode relationships between the heuristic atomic properties (e.g.,
size, connectivity, and electronegativity) on a molecular graph. We alter the
starting point, scope, and nature of the quantities evaluated in standard ACs
to make these RACs amenable to inorganic chemistry. On an organic molecule set,
we first demonstrate superior standard AC performance to other
presently-available topological descriptors for ML model training, with mean
unsigned errors (MUEs) for atomization energies on set-aside test molecules as
low as 6 kcal/mol. For inorganic chemistry, our RACs yield 1 kcal/mol ML MUEs
on set-aside test molecules in spin-state splitting in comparison to 15-20x
higher errors from feature sets that encode whole-molecule structural
information. Systematic feature selection methods including univariate
filtering, recursive feature elimination, and direct optimization (e.g., random
forest and LASSO) are compared. Random-forest- or LASSO-selected subsets 4-5x
smaller than RAC-155 produce sub- to 1-kcal/mol spin-splitting MUEs, with good
transferability to metal-ligand bond length prediction (0.004-5 {\AA} MUE) and
redox potential on a smaller data set (0.2-0.3 eV MUE). Evaluation of feature
selection results across property sets reveals the relative importance of
local, electronic descriptors (e.g., electronegativity, atomic number) in
spin-splitting and distal, steric effects in redox potential and bond lengths.Comment: 43 double spaced pages, 11 figures, 4 table
Kohn-Sham theory with paramagnetic currents: compatibility and functional differentiability
Recent work has established Moreau-Yosida regularization as a mathematical
tool to achieve rigorous functional differentiability in density-functional
theory. In this article, we extend this tool to paramagnetic
current-density-functional theory, the most common density-functional framework
for magnetic field effects. The extension includes a well-defined Kohn-Sham
iteration scheme with a partial convergence result. To this end, we rely on a
formulation of Moreau-Yosida regularization for reflexive and strictly convex
function spaces. The optimal -characterization of the paramagnetic current
density is derived from the -representability conditions.
A crucial prerequisite for the convex formulation of paramagnetic
current-density-functional theory, termed compatibility between function spaces
for the particle density and the current density, is pointed out and analyzed.
Several results about compatible function spaces are given, including their
recursive construction. The regularized, exact functionals are calculated
numerically for a Kohn-Sham iteration on a quantum ring, illustrating their
performance for different regularization parameters
- …