Search CORE

140 research outputs found

Robust Estimators in High Dimensions without the Computational Intractability

Author: Diakonikolas Ilias
Kamath Gautam
Kane Daniel M.
Li Jerry
Moitra Ankur
Stewart Alistair
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/10/2016
Field of study

We study high-dimensional distribution learning in an agnostic setting where an adversary is allowed to arbitrarily corrupt an epsilon fraction of the samples. Such questions have a rich history spanning statistics, machine learning and theoretical computer science. Even in the most basic settings, the only known approaches are either computationally inefficient or lose dimension dependent factors in their error guarantees. This raises the following question: Is high-dimensional agnostic distribution learning even possible, algorithmically? In this work, we obtain the first computationally efficient algorithms for agnostically learning several fundamental classes of high-dimensional distributions: (1) a single Gaussian, (2) a product distribution on the hypercube, (3) mixtures of two product distributions (under a natural balancedness condition), and (4) mixtures of k Gaussians with identical spherical covariances. All our algorithms achieve error that is independent of the dimension, and in many cases depends nearly-linearly on the fraction of adversarially corrupted samples. Moreover, we develop a general recipe for detecting and correcting corruptions in high-dimensions, that may be applicable to many other problems.United States. Office of Naval Research (Grant N00014-12-1-0999)National Science Foundation (U.S.) (CAREER Award CCF-1453261)National Science Foundation (U.S.) (CAREER Award CCF-0953960)Google (Firm) (Faculty Research Award)National Science Foundation (U.S.). Graduate Research Fellowship ProgramNEC Corporatio

DSpace@MIT

Crossref

Edinburgh Research Explorer

eScholarship - University of California

Resilient Distributed Optimization Algorithms for Resource Allocation

Author: Alizadeh Mahnoosh
Uribe Cesar
Wai Hoi-To
Publication venue: eScholarship, University of California
Publication date: 01/09/2019
Field of study

Distributed algorithms provide flexibility over centralized algorithms for resource allocation problems, e.g., cyber-physical systems. However, the distributed nature of these algorithms often makes the systems susceptible to man-in-the-middle attacks, especially when messages are transmitted between price-taking agents and a central coordinator. We propose a resilient strategy for distributed algorithms under the framework of primal-dual distributed optimization. We formulate a robust optimization model that accounts for Byzantine attacks on the communication channels between agents and coordinator. We propose a resilient primal-dual algorithm using state-of-the-art robust statistics methods. The proposed algorithm is shown to converge to a neighborhood of the robust optimization model, where the neighborhood's radius is proportional to the fraction of attacked channels.Comment: 15 pages, 1 figure, accepted to CDC 201

arXiv.org e-Print Archive

eScholarship - University of California

Spectral Signatures in Backdoor Attacks

Author: Li Jerry
Madry Aleksander
Tran Brandon
Publication venue
Publication date: 01/11/2018
Field of study

A recent line of work has uncovered a new form of data poisoning: so-called \emph{backdoor} attacks. These attacks are particularly dangerous because they do not affect a network's behavior on typical, benign data. Rather, the network only deviates from its expected output when triggered by a perturbation planted by an adversary. In this paper, we identify a new property of all known backdoor attacks, which we call \emph{spectral signatures}. This property allows us to utilize tools from robust statistics to thwart the attacks. We demonstrate the efficacy of these signatures in detecting and removing poisoned examples on real image sets and state of the art neural network architectures. We believe that understanding spectral signatures is a crucial first step towards designing ML systems secure against such backdoor attacksComment: 16 pages, accepted to NIPS 201

arXiv.org e-Print Archive

DSpace@MIT

Sample-Efficient Learning of Mixtures

Author: Ashtiani Hassan
Ben-David Shai
Mehrabian Abbas
Publication venue
Publication date: 29/04/2018
Field of study

We consider PAC learning of probability distributions (a.k.a. density estimation), where we are given an i.i.d. sample generated from an unknown target distribution, and want to output a distribution that is close to the target in total variation distance. Let

\mathcal F

be an arbitrary class of probability distributions, and let

\mathcal{F}^k

denote the class of

k

-mixtures of elements of

\mathcal F

. Assuming the existence of a method for learning

\mathcal F

with sample complexity

m_{\mathcal{F}}(\epsilon)

, we provide a method for learning

\mathcal F^k

with sample complexity

O({k\log k \cdot m_{\mathcal F}(\epsilon) }/{\epsilon^{2}})

. Our mixture learning algorithm has the property that, if the

\mathcal F

-learner is proper/agnostic, then the

\mathcal F^k

-learner would be proper/agnostic as well. This general result enables us to improve the best known sample complexity upper bounds for a variety of important mixture classes. First, we show that the class of mixtures of

k

axis-aligned Gaussians in

\mathbb{R}^d

is PAC-learnable in the agnostic setting with

\widetilde{O}({kd}/{\epsilon ^ 4})

samples, which is tight in

k

and

d

up to logarithmic factors. Second, we show that the class of mixtures of

k

Gaussians in

\mathbb{R}^d

is PAC-learnable in the agnostic setting with sample complexity

\widetilde{O}({kd^2}/{\epsilon ^ 4})

, which improves the previous known bounds of

\widetilde{O}({k^3d^2}/{\epsilon ^ 4})

and

\widetilde{O}(k^4d^4/\epsilon ^ 2)

in its dependence on

k

and

d

. Finally, we show that the class of mixtures of

k

log-concave distributions over

\mathbb{R}^d

is PAC-learnable using

\widetilde{O}(d^{(d+5)/2}\epsilon^{-(d+9)/2}k)

samples.Comment: A bug from the previous version, which appeared in AAAI 2018 proceedings, is fixed. 18 page

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

All-In-One Robust Estimator of the Gaussian Mean

Author: Dalalyan Arnak S.
Minasyan Arshak
Publication venue
Publication date: 04/03/2021
Field of study

The goal of this paper is to show that a single robust estimator of the mean of a multivariate Gaussian distribution can enjoy five desirable properties. First, it is computationally tractable in the sense that it can be computed in a time which is at most polynomial in dimension, sample size and the logarithm of the inverse of the contamination rate. Second, it is equivariant by translations, uniform scaling and orthogonal transformations. Third, it has a high breakdown point equal to

0.5

, and a nearly-minimax-rate-breakdown point approximately equal to

0.28

. Fourth, it is minimax rate optimal, up to a logarithmic factor, when data consists of independent observations corrupted by adversarially chosen outliers. Fifth, it is asymptotically efficient when the rate of contamination tends to zero. The estimator is obtained by an iterative reweighting approach. Each sample point is assigned a weight that is iteratively updated by solving a convex optimization problem. We also establish a dimension-free non-asymptotic risk bound for the expected error of the proposed estimator. It is the first result of this kind in the literature and involves only the effective rank of the covariance matrix. Finally, we show that the obtained results can be extended to sub-Gaussian distributions, as well as to the cases of unknown rate of contamination or unknown covariance matrix.Comment: 41 pages, 5 figures; added sub-Gaussian case with unknown Sigma or ep

arXiv.org e-Print Archive

HAL-Polytechnique

Robust polynomial regression up to the information theoretic limit

Author: Kane Daniel
Karmalkar Sushrut
Price Eric
Publication venue
Publication date: 10/08/2017
Field of study

We consider the problem of robust polynomial regression, where one receives samples

(x_i, y_i)

that are usually within

\sigma

of a polynomial

y = p(x)

, but have a

\rho

chance of being arbitrary adversarial outliers. Previously, it was known how to efficiently estimate

p

only when

\rho < \frac{1}{\log d}

. We give an algorithm that works for the entire feasible range of

\rho < 1/2

, while simultaneously improving other parameters of the problem. We complement our algorithm, which gives a factor 2 approximation, with impossibility results that show, for example, that a

1.09

approximation is impossible even with infinitely many samples.Comment: 19 Pages. To appear in FOCS 201

arXiv.org e-Print Archive

Crossref

eScholarship - University of California