Search CORE

6 research outputs found

Hybrid Stochastic-Deterministic Minibatch Proximal Gradient: Less-Than-Single-Pass Optimization with Nearly Optimal Generalization

Author: Yuan Xiaotong
Zhou Pan
Publication venue
Publication date: 01/07/2020
Field of study

Stochastic variance-reduced gradient (SVRG) algorithms have been shown to work favorably in solving large-scale learning problems. Despite the remarkable success, the stochastic gradient complexity of SVRG-type algorithms usually scales linearly with data size and thus could still be expensive for huge data. To address this deficiency, we propose a hybrid stochastic-deterministic minibatch proximal gradient (HSDMPG) algorithm for strongly-convex problems that enjoys provably improved data-size-independent complexity guarantees. More precisely, for quadratic loss

F(\theta)

n

components, we prove that HSDMPG can attain an

\epsilon

-optimization-error

\mathbb{E}[F(\theta)-F(\theta^*)]\leq\epsilon

within

\mathcal{O}\Big(\frac{\kappa^{1.5}\epsilon^{0.75}\log^{1.5}(\frac{1}{\epsilon})+1}{\epsilon}\wedge\Big(\kappa \sqrt{n}\log^{1.5}\big(\frac{1}{\epsilon}\big)+n\log\big(\frac{1}{\epsilon}\big)\Big)\Big)

stochastic gradient evaluations, where

\kappa

is condition number. For generic strongly convex loss functions, we prove a nearly identical complexity bound though at the cost of slightly increased logarithmic factors. For large-scale learning problems, our complexity bounds are superior to those of the prior state-of-the-art SVRG algorithms with or without dependence on data size. Particularly, in the case of

\epsilon=\mathcal{O}\big(1/\sqrt{n}\big)

which is at the order of intrinsic excess error bound of a learning model and thus sufficient for generalization, the stochastic gradient complexity bounds of HSDMPG for quadratic and generic loss functions are respectively

\mathcal{O} (n^{0.875}\log^{1.5}(n))

and

\mathcal{O} (n^{0.875}\log^{2.25}(n))

, which to our best knowledge, for the first time achieve optimal generalization in less than a single pass over data. Extensive numerical results demonstrate the computational advantages of our algorithm over the prior ones

arXiv.org e-Print Archive

Institutional Knowledge at Singapore Management University

Understanding Generalization and Optimization Performance of Deep CNNs

Author: Feng Jiashi
Zhou Pan
Publication venue
Publication date: 28/05/2018
Field of study

This work aims to provide understandings on the remarkable success of deep convolutional neural networks (CNNs) by theoretically analyzing their generalization performance and establishing optimization guarantees for gradient descent based training algorithms. Specifically, for a CNN model consisting of

l

convolutional layers and one fully connected layer, we prove that its generalization error is bounded by \mathcal{O}(\sqrt{\dt\widetilde{\varrho}/n}) where

\theta

denotes freedom degree of the network parameters and \widetilde{\varrho}=\mathcal{O}(\log(\prod_{i=1}^{l}\rwi{i} (\ki{i}-\si{i}+1)/p)+\log(\rf)) encapsulates architecture parameters including the kernel size \ki{i}, stride \si{i}, pooling size

p

and parameter magnitude \rwi{i}. To our best knowledge, this is the first generalization bound that only depends on \mathcal{O}(\log(\prod_{i=1}^{l+1}\rwi{i})), tighter than existing ones that all involve an exponential term like \mathcal{O}(\prod_{i=1}^{l+1}\rwi{i}). Besides, we prove that for an arbitrary gradient descent algorithm, the computed approximate stationary point by minimizing empirical risk is also an approximate stationary point to the population risk. This well explains why gradient descent training algorithms usually perform sufficiently well in practice. Furthermore, we prove the one-to-one correspondence and convergence guarantees for the non-degenerate stationary points between the empirical and population risks. It implies that the computed local minimum for the empirical risk is also close to a local minimum for the population risk, thus ensuring the good generalization performance of CNNs.Comment: This paper was accepted by ICML. It has 38 page

arXiv.org e-Print Archive

Institutional Knowledge at Singapore Management University

Theoretical Deep Learning

Author: He Fengxiang
Publication venue: Faculty of Engineering, School of Computer Science
Publication date: 01/01/2021
Field of study

Deep learning has long been criticised as a black-box model for lacking sound theoretical explanation. During the PhD course, I explore and establish theoretical foundations for deep learning. In this thesis, I present my contributions positioned upon existing literature: (1) analysing the generalizability of the neural networks with residual connections via complexity and capacity-based hypothesis complexity measures; (2) modeling stochastic gradient descent (SGD) by stochastic differential equations (SDEs) and their dynamics, and further characterizing the generalizability of deep learning; (3) understanding the geometrical structures of the loss landscape that drives the trajectories of the dynamic systems, which sheds light in reconciling the over-representation and excellent generalizability of deep learning; and (4) discovering the interplay between generalization, privacy preservation, and adversarial robustness, which have seen rising concerns in deep learning deployment

Sydney eScholarship

Empirical risk landscape analysis for understanding deep neural networks

Author: FENG Jiashi
ZHOU Pan
Publication venue: ICLR
Publication date: 01/05/2018
Field of study

Institutional Knowledge at Singapore Management University