Search CORE

3 research outputs found

Agnostic Learning of Halfspaces with Gradient Descent via Soft Margins

Author: Cao Yuan
Frei Spencer
Gu Quanquan
Publication venue
Publication date: 13/02/2021
Field of study

We analyze the properties of gradient descent on convex surrogates for the zero-one loss for the agnostic learning of linear halfspaces. If

\mathsf{OPT}

is the best classification error achieved by a halfspace, by appealing to the notion of soft margins we are able to show that gradient descent finds halfspaces with classification error

\tilde O(\mathsf{OPT}^{1/2}) + \varepsilon

\mathrm{poly}(d,1/\varepsilon)

time and sample complexity for a broad class of distributions that includes log-concave isotropic distributions as a subclass. Along the way we answer a question recently posed by Ji et al. (2020) on how the tail behavior of a loss function can affect sample complexity and runtime guarantees for gradient descent.Comment: 25 pages, 1 tabl

arXiv.org e-Print Archive

Sample-Optimal PAC Learning of Halfspaces with Malicious Noise

Author: Shen Jie
Publication venue
Publication date: 11/02/2021
Field of study

We study efficient PAC learning of homogeneous halfspaces in

\mathbb{R}^d

in the presence of malicious noise of Valiant~(1985). This is a challenging noise model and only until recently has near-optimal noise tolerance bound been established under the mild condition that the unlabeled data distribution is isotropic log-concave. However, it remains unsettled how to obtain the optimal sample complexity simultaneously. In this work, we present a new analysis for the algorithm of Awasthi et al.~(2017) and show that it essentially achieves the near-optimal sample complexity bound of

\tilde{O}(d)

, improving the best known result of

\tilde{O}(d^2)

. Our main ingredient is a novel incorporation of a Matrix Chernoff-type inequality to bound the spectrum of an empirical covariance matrix for well-behaved distributions, in conjunction with a careful exploration of the localization schemes of Awasthi et al.~(2017). We further extend the algorithm and analysis to the more general and stronger nasty noise model of Bshouty~et~al. (2002), showing that it is still possible to achieve near-optimal noise tolerance and sample complexity in polynomial time.Comment: arXiv admin note: text overlap with arXiv:2006.0378

arXiv.org e-Print Archive

On InstaHide, Phase Retrieval, and Sparse Matrix Factorization

Author: Chen Sitan
Li Xiaoxiao
Song Zhao
Zhuo Danyang
Publication venue
Publication date: 24/03/2021
Field of study

In this work, we examine the security of InstaHide, a scheme recently proposed by [Huang, Song, Li and Arora, ICML'20] for preserving the security of private datasets in the context of distributed learning. To generate a synthetic training example to be shared among the distributed learners, InstaHide takes a convex combination of private feature vectors and randomly flips the sign of each entry of the resulting vector with probability 1/2. A salient question is whether this scheme is secure in any provable sense, perhaps under a plausible hardness assumption and assuming the distributions generating the public and private data satisfy certain properties. We show that the answer to this appears to be quite subtle and closely related to the average-case complexity of a new multi-task, missing-data version of the classic problem of phase retrieval. Motivated by this connection, we design a provable algorithm that can recover private vectors using only the public vectors and synthetic vectors generated by InstaHide, under the assumption that the private and public vectors are isotropic Gaussian.Comment: 30 pages, to appear in ICLR 2021, v2: updated discussion of follow-up wor

arXiv.org e-Print Archive