3 research outputs found

    Agnostic Learning of Halfspaces with Gradient Descent via Soft Margins

    Full text link
    We analyze the properties of gradient descent on convex surrogates for the zero-one loss for the agnostic learning of linear halfspaces. If OPT\mathsf{OPT} is the best classification error achieved by a halfspace, by appealing to the notion of soft margins we are able to show that gradient descent finds halfspaces with classification error O~(OPT1/2)+ε\tilde O(\mathsf{OPT}^{1/2}) + \varepsilon in poly(d,1/ε)\mathrm{poly}(d,1/\varepsilon) time and sample complexity for a broad class of distributions that includes log-concave isotropic distributions as a subclass. Along the way we answer a question recently posed by Ji et al. (2020) on how the tail behavior of a loss function can affect sample complexity and runtime guarantees for gradient descent.Comment: 25 pages, 1 tabl

    Sample-Optimal PAC Learning of Halfspaces with Malicious Noise

    Full text link
    We study efficient PAC learning of homogeneous halfspaces in Rd\mathbb{R}^d in the presence of malicious noise of Valiant~(1985). This is a challenging noise model and only until recently has near-optimal noise tolerance bound been established under the mild condition that the unlabeled data distribution is isotropic log-concave. However, it remains unsettled how to obtain the optimal sample complexity simultaneously. In this work, we present a new analysis for the algorithm of Awasthi et al.~(2017) and show that it essentially achieves the near-optimal sample complexity bound of O~(d)\tilde{O}(d), improving the best known result of O~(d2)\tilde{O}(d^2). Our main ingredient is a novel incorporation of a Matrix Chernoff-type inequality to bound the spectrum of an empirical covariance matrix for well-behaved distributions, in conjunction with a careful exploration of the localization schemes of Awasthi et al.~(2017). We further extend the algorithm and analysis to the more general and stronger nasty noise model of Bshouty~et~al. (2002), showing that it is still possible to achieve near-optimal noise tolerance and sample complexity in polynomial time.Comment: arXiv admin note: text overlap with arXiv:2006.0378

    On InstaHide, Phase Retrieval, and Sparse Matrix Factorization

    Full text link
    In this work, we examine the security of InstaHide, a scheme recently proposed by [Huang, Song, Li and Arora, ICML'20] for preserving the security of private datasets in the context of distributed learning. To generate a synthetic training example to be shared among the distributed learners, InstaHide takes a convex combination of private feature vectors and randomly flips the sign of each entry of the resulting vector with probability 1/2. A salient question is whether this scheme is secure in any provable sense, perhaps under a plausible hardness assumption and assuming the distributions generating the public and private data satisfy certain properties. We show that the answer to this appears to be quite subtle and closely related to the average-case complexity of a new multi-task, missing-data version of the classic problem of phase retrieval. Motivated by this connection, we design a provable algorithm that can recover private vectors using only the public vectors and synthetic vectors generated by InstaHide, under the assumption that the private and public vectors are isotropic Gaussian.Comment: 30 pages, to appear in ICLR 2021, v2: updated discussion of follow-up wor
    corecore