19 research outputs found
Topological Superconducting Phases of Weakly Coupled Quantum Wires
An array of quantum wires is a natural starting point in realizing
two-dimensional topological phases. We study a system of weakly coupled quantum
wires with Rashba spin-orbit coupling, proximity coupled to a conventional
s-wave superconductor. A variety of topological phases are found in this model.
These phases are characterized by "Strong" and "Weak" topological invariants,
that capture the appearance of mid-gap Majorana modes (either chiral or
non-chiral) on edges along and perpendicular to the wires. In particular, a
phase with a single chiral Majorana edge mode (analogous to a
superconductor) can be realized. At special values of the magnetic field and
chemical potential, this edge mode is almost completely localized at the
outmost wires. In addition, a phase with two co-propagating chiral edge modes
is observed. We also consider ways to distinguish experimentally between the
different phases in tunneling experiments
Optimal minimax rate of learning interaction kernels
Nonparametric estimation of nonlocal interaction kernels is crucial in
various applications involving interacting particle systems. The inference
challenge, situated at the nexus of statistical learning and inverse problems,
comes from the nonlocal dependency. A central question is whether the optimal
minimax rate of convergence for this problem aligns with the rate of
in classical nonparametric regression, where
is the sample size and represents the smoothness exponent of the radial
kernel. Our study confirms this alignment for systems with a finite number of
particles.
We introduce a tamed least squares estimator (tLSE) that attains the optimal
convergence rate for a broad class of exchangeable distributions. The tLSE
bridges the smallest eigenvalue of random matrices and Sobolev embedding. This
estimator relies on nonasymptotic estimates for the left tail probability of
the smallest eigenvalue of the normal matrix. The lower minimax rate is derived
using the Fano-Tsybakov hypothesis testing method. Our findings reveal that
provided the inverse problem in the large sample limit satisfies a coercivity
condition, the left tail probability does not alter the bias-variance tradeoff,
and the optimal minimax rate remains intact. Our tLSE method offers a
straightforward approach for establishing the optimal minimax rate for models
with either local or nonlocal dependency.Comment: 42 pages, 1 figur
Separation of Scales and a Thermodynamic Description of Feature Learning in Some CNNs
Deep neural networks (DNNs) are powerful tools for compressing and distilling
information. Their scale and complexity, often involving billions of
inter-dependent parameters, render direct microscopic analysis difficult. Under
such circumstances, a common strategy is to identify slow variables that
average the erratic behavior of the fast microscopic variables. Here, we
identify a similar separation of scales occurring in fully trained finitely
over-parameterized deep convolutional neural networks (CNNs) and fully
connected networks (FCNs). Specifically, we show that DNN layers couple only
through the second moment (kernels) of their activations and pre-activations.
Moreover, the latter fluctuates in a nearly Gaussian manner. For infinite width
DNNs, these kernels are inert, while for finite ones they adapt to the data and
yield a tractable data-aware Gaussian Process. The resulting thermodynamic
theory of deep learning yields accurate predictions in various settings. In
addition, it provides new ways of analyzing and understanding DNNs in general
Spectral-Bias and Kernel-Task Alignment in Physically Informed Neural Networks
Physically informed neural networks (PINNs) are a promising emerging method
for solving differential equations. As in many other deep learning approaches,
the choice of PINN design and training protocol requires careful craftsmanship.
Here, we suggest a comprehensive theoretical framework that sheds light on this
important problem. Leveraging an equivalence between infinitely
over-parameterized neural networks and Gaussian process regression (GPR), we
derive an integro-differential equation that governs PINN prediction in the
large data-set limit -- the Neurally-Informed Equation (NIE). This equation
augments the original one by a kernel term reflecting architecture choices and
allows quantifying implicit bias induced by the network via a spectral
decomposition of the source term in the original differential equation
Grokking as a First Order Phase Transition in Two Layer Networks
A key property of deep neural networks (DNNs) is their ability to learn new
features during training. This intriguing aspect of deep learning stands out
most clearly in recently reported Grokking phenomena. While mainly reflected as
a sudden increase in test accuracy, Grokking is also believed to be a beyond
lazy-learning/Gaussian Process (GP) phenomenon involving feature learning. Here
we apply a recent development in the theory of feature learning, the adaptive
kernel approach, to two teacher-student models with cubic-polynomial and
modular addition teachers. We provide analytical predictions on feature
learning and Grokking properties of these models and demonstrate a mapping
between Grokking and the theory of phase transitions. We show that after
Grokking, the state of the DNN is analogous to the mixed phase following a
first-order phase transition. In this mixed phase, the DNN generates useful
internal representations of the teacher that are sharply distinct from those
before the transition
Hitting the High-Dimensional Notes: An ODE for SGD learning dynamics on GLMs and multi-index models
We analyze the dynamics of streaming stochastic gradient descent (SGD) in the
high-dimensional limit when applied to generalized linear models and
multi-index models (e.g. logistic regression, phase retrieval) with general
data-covariance. In particular, we demonstrate a deterministic equivalent of
SGD in the form of a system of ordinary differential equations that describes a
wide class of statistics, such as the risk and other measures of
sub-optimality. This equivalence holds with overwhelming probability when the
model parameter count grows proportionally to the number of data. This
framework allows us to obtain learning rate thresholds for stability of SGD as
well as convergence guarantees. In addition to the deterministic equivalent, we
introduce an SDE with a simplified diffusion coefficient (homogenized SGD)
which allows us to analyze the dynamics of general statistics of SGD iterates.
Finally, we illustrate this theory on some standard examples and show numerical
simulations which give an excellent match to the theory.Comment: Preliminary versio
