22 research outputs found
Recommended from our members
When Can Nonconvex Optimization Problems be Solved with Gradient Descent? A Few Case Studies
Gradient descent and related algorithms are ubiquitously used to solve optimization problems arising in machine learning and signal processing. In many cases, these problems are nonconvex yet such simple algorithms are still effective. In an attempt to better understand this phenomenon, we study a number of nonconvex problems, proving that they can be solved efficiently with gradient descent. We will consider complete, orthogonal dictionary learning, and present a geometric analysis allowing us to obtain efficient convergence rate for gradient descent that hold with high probability. We also show that similar geometric structure is present in other nonconvex problems such as generalized phase retrieval.
Turning next to neural networks, we will also calculate conditions on certain classes of networks under which signals and gradients propagate through the network in a stable manner during the initial stages of training. Initialization schemes derived using these calculations allow training recurrent networks on long sequence tasks, and in the case of networks with low precision activation functions they make explicit a tradeoff between the reduction in precision and the maximal depth of a model that can be trained with gradient descent.
We finally consider manifold classification with a deep feed-forward neural network, for a particularly simple configuration of the manifolds. We provide an end-to-end analysis of the training process, proving that under certain conditions on the architectural hyperparameters of the network, it can successfully classify any point on the manifolds with high probability given a sufficient number of independent samples from the manifold, in a timely manner. Our analysis relates the depth and width of the network to its fitting capacity and statistical regularity respectively in early stages of training
On quantum backpropagation, information reuse, and cheating measurement collapse
The success of modern deep learning hinges on the ability to train neural
networks at scale. Through clever reuse of intermediate information,
backpropagation facilitates training through gradient computation at a total
cost roughly proportional to running the function, rather than incurring an
additional factor proportional to the number of parameters - which can now be
in the trillions. Naively, one expects that quantum measurement collapse
entirely rules out the reuse of quantum information as in backpropagation. But
recent developments in shadow tomography, which assumes access to multiple
copies of a quantum state, have challenged that notion. Here, we investigate
whether parameterized quantum models can train as efficiently as classical
neural networks. We show that achieving backpropagation scaling is impossible
without access to multiple copies of a state. With this added ability, we
introduce an algorithm with foundations in shadow tomography that matches
backpropagation scaling in quantum resources while reducing classical auxiliary
computational costs to open problems in shadow tomography. These results
highlight the nuance of reusing quantum information for practical purposes and
clarify the unique difficulties in training large quantum models, which could
alter the course of quantum machine learning.Comment: 29 pages, 2 figure
Dynamics of magnetization at infinite temperature in a Heisenberg spin chain
Understanding universal aspects of quantum dynamics is an unresolved problem
in statistical mechanics. In particular, the spin dynamics of the 1D Heisenberg
model were conjectured to belong to the Kardar-Parisi-Zhang (KPZ) universality
class based on the scaling of the infinite-temperature spin-spin correlation
function. In a chain of 46 superconducting qubits, we study the probability
distribution, , of the magnetization transferred across the
chain's center. The first two moments of show superdiffusive
behavior, a hallmark of KPZ universality. However, the third and fourth moments
rule out the KPZ conjecture and allow for evaluating other theories. Our
results highlight the importance of studying higher moments in determining
dynamic universality classes and provide key insights into universal behavior
in quantum systems