Search CORE

27 research outputs found

Sampling is as easy as learning the score: theory for diffusion models with minimal data assumptions

Author: Chen Sitan
Chewi Sinho
Li Jerry
Li Yuanzhi
Salim Adil
Zhang Anru R.
Publication venue
Publication date: 04/10/2022
Field of study

We provide theoretical convergence guarantees for score-based generative models (SGMs) such as denoising diffusion probabilistic models (DDPMs), which constitute the backbone of large-scale real-world generative models such as DALL

\cdot

E 2. Our main result is that, assuming accurate score estimates, such SGMs can efficiently sample from essentially any realistic data distribution. In contrast to prior works, our results (1) hold for an

L^2

-accurate score estimate (rather than

L^\infty

-accurate); (2) do not require restrictive functional inequality conditions that preclude substantial non-log-concavity; (3) scale polynomially in all relevant problem parameters; and (4) match state-of-the-art complexity guarantees for discretization of the Langevin diffusion, provided that the score error is sufficiently small. We view this as strong theoretical justification for the empirical success of SGMs. We also examine SGMs based on the critically damped Langevin diffusion (CLD). Contrary to conventional wisdom, we provide evidence that the use of the CLD does not reduce the complexity of SGMs.Comment: 30 page

arXiv.org e-Print Archive

Quantum advantage in learning from experiments

Author: Babbush Ryan
Broughton Michael
Chen Sitan
Cotler Jordan
Huang Hsin-Yuan
Kueng Richard
Li Jerry
McClean Jarrod R.
Mohseni Masoud
Neven Hartmut
Preskill John
Publication venue: 'American Association for the Advancement of Science (AAAS)'
Publication date: 01/12/2021
Field of study

Quantum technology has the potential to revolutionize how we acquire and process experimental data to learn about the physical world. An experimental setup that transduces data from a physical system to a stable quantum memory, and processes that data using a quantum computer, could have significant advantages over conventional experiments in which the physical system is measured and the outcomes are processed using a classical computer. We prove that, in various tasks, quantum machines can learn from exponentially fewer experiments than those required in conventional experiments. The exponential advantage holds in predicting properties of physical systems, performing quantum principal component analysis on noisy states, and learning approximate models of physical dynamics. In some tasks, the quantum processing needed to achieve the exponential advantage can be modest; for example, one can simultaneously learn about many noncommuting observables by processing only two copies of the system. Conducting experiments with up to 40 superconducting qubits and 1300 quantum gates, we demonstrate that a substantial quantum advantage can be realized using today's relatively noisy quantum processors. Our results highlight how quantum technology can enable powerful new strategies to learn about nature.Comment: 6 pages, 17 figures + 46 page appendix; open-source code available at https://github.com/quantumlib/ReCirq/tree/master/recirq/qml_lf

arXiv.org e-Print Archive

Learning (Very) Simple Generative Models Is Hard

Author: Chen Sitan
Li Jerry
Li Yuanzhi
Publication venue
Publication date: 31/05/2022
Field of study

Motivated by the recent empirical successes of deep generative models, we study the computational complexity of the following unsupervised learning problem. For an unknown neural network

F:\mathbb{R}^d\to\mathbb{R}^{d'}

, let

D

be the distribution over

\mathbb{R}^{d'}

given by pushing the standard Gaussian

\mathcal{N}(0,\textrm{Id}_d)

through

F

. Given i.i.d. samples from

D

, the goal is to output any distribution close to

D

in statistical distance. We show under the statistical query (SQ) model that no polynomial-time algorithm can solve this problem even when the output coordinates of

F

are one-hidden-layer ReLU networks with

\log(d)

neurons. Previously, the best lower bounds for this problem simply followed from lower bounds for supervised learning and required at least two hidden layers and

\mathrm{poly}(d)

neurons [Daniely-Vardi '21, Chen-Gollakota-Klivans-Meka '22]. The key ingredient in our proof is an ODE-based construction of a compactly supported, piecewise-linear function

f

with polynomially-bounded slopes such that the pushforward of

\mathcal{N}(0,1)

under

f

matches all low-degree moments of

\mathcal{N}(0,1)

.Comment: 24 pages, 2 figure

arXiv.org e-Print Archive

Transfer learning algorithm for image classification task and its convergence analysis

Author: Cheah Chien Chern
Li Sitan
Publication venue
Publication date: 01/01/2023
Field of study

Theoretical analysis of transfer learning of the deep neural networks (DNN) is crucial in ensuring stability or convergence and gaining a better understanding of the networks for further development. However, most current transfer learning methods are black-box approaches that are more focused on empirical studies. This paper develops a transfer learning algorithm for deep convolutional neural networks (CNN) with batch normalization layers. A convergence-guaranteed transfer learning algorithm is proposed to train the classifier of a deep CNN with pretrained convolutional layers. Two classification case studies based on VGG11 with the MNIST dataset and CIFAR10 dataset are presented to demonstrate the performance of the proposed approach and explore the effect of batch normalization layers on transfer learning.Ministry of Education (MOE)Submitted/Accepted versionThis work was supported by the Ministry of Education (MOE) Singapore, Academic Research Fund (AcRF) Tier 1, under Grant RG65/22

DR-NTU (Digital Repository of NTU)

Learning Mixtures of Linear Regressions in Subexponential Time via Fourier Moments

Author: Chen Sitan
Li Jerry
Song Zhao
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 02/11/2022
Field of study

DSpace@MIT