Search CORE

2,974 research outputs found

A Polynomial Time Algorithm for Lossy Population Recovery

Author: Moitra Ankur
Saks Michael
Publication venue
Publication date: 01/01/2013
Field of study

We give a polynomial time algorithm for the lossy population recovery problem. In this problem, the goal is to approximately learn an unknown distribution on binary strings of length

n

from lossy samples: for some parameter

\mu

each coordinate of the sample is preserved with probability

\mu

and otherwise is replaced by a `?'. The running time and number of samples needed for our algorithm is polynomial in

n

and

1/\varepsilon

for each fixed

\mu>0

. This improves on algorithm of Wigderson and Yehudayoff that runs in quasi-polynomial time for any

\mu > 0

and the polynomial time algorithm of Dvir et al which was shown to work for

\mu \gtrapprox 0.30

by Batman et al. In fact, our algorithm also works in the more general framework of Batman et al. in which there is no a priori bound on the size of the support of the distribution. The algorithm we analyze is implicit in previous work; our main contribution is to analyze the algorithm by showing (via linear programming duality and connections to complex analysis) that a certain matrix associated with the problem has a robust local inverse even though its condition number is exponentially small. A corollary of our result is the first polynomial time algorithm for learning DNFs in the restriction access model of Dvir et al

arXiv.org e-Print Archive

CiteSeerX

Crossref

A performance index approach to aerodynamic design with the use of analysis codes only

Author: Barger Raymond L.
Moitra Anutosh
Publication venue
Publication date
Field of study

A method is described for designing an aerodynamic configuration for a specified performance vector, based on results from several similar, but not identical, trial configurations, each defined by a geometry parameter vector. The theory shows the method effective provided that: (1) the results for the trial configuration provide sufficient variation so that a linear combination of them approximates the specified performance; and (2) the difference between the performance vectors (including the specifed performance) are sufficiently small that the linearity assumption of sensitivity analysis applies to the differences. A computed example describes the design of a high supersonic Mach number missile wing body configuration based on results from a set of four trial configurations

NASA Technical Reports Server

Algorithms and Hardness for Robust Subspace Recovery

Author: Hardt Moritz
Moitra Ankur
Publication venue
Publication date: 03/12/2013
Field of study

We consider a fundamental problem in unsupervised learning called \emph{subspace recovery}: given a collection of

m

points in

\mathbb{R}^n

, if many but not necessarily all of these points are contained in a

d

-dimensional subspace

T

can we find it? The points contained in

T

are called {\em inliers} and the remaining points are {\em outliers}. This problem has received considerable attention in computer science and in statistics. Yet efficient algorithms from computer science are not robust to {\em adversarial} outliers, and the estimators from robust statistics are hard to compute in high dimensions. Are there algorithms for subspace recovery that are both robust to outliers and efficient? We give an algorithm that finds

T

when it contains more than a

\frac{d}{n}

fraction of the points. Hence, for say

d = n/2

this estimator is both easy to compute and well-behaved when there are a constant fraction of outliers. We prove that it is Small Set Expansion hard to find

T

when the fraction of errors is any larger, thus giving evidence that our estimator is an {\em optimal} compromise between efficiency and robustness. As it turns out, this basic problem has a surprising number of connections to other areas including small set expansion, matroid theory and functional analysis that we make use of here.Comment: Appeared in Proceedings of COLT 201

arXiv.org e-Print Archive

CiteSeerX

An Almost Optimal Algorithm for Computing Nonnegative Rank

Author: Moitra Ankur
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/10/2014
Field of study

Here, we give an algorithm for deciding if the nonnegative rank of a matrix M of dimension m \times n$ is at most r which runs in time (nm)[superscript O(r2)]. This is the first exact algorithm that runs in time singly exponential in r. This algorithm (and earlier algorithms) are built on methods for finding a solution to a system of polynomial inequalities (if one exists). Notably, the best algorithms for this task run in time exponential in the number of variables but polynomial in all of the other parameters (the number of inequalities and the maximum degree). Hence, these algorithms motivate natural algebraic questions whose solution have immediate algorithmic implications: How many variables do we need to represent the decision problem, and does M have nonnegative rank at most r? A naive formulation uses nr + mr variables and yields an algorithm that is exponential in n and m even for constant r. Arora et al. [Proceedings of STOC, 2012, pp. 145--162] recently reduced the number of variables to 2r[superscript 2] 2[superscript r], and here we exponentially reduce the number of variables to 2r[superscript 2] and this yields our main algorithm. In fact, the algorithm that we obtain is nearly optimal (under the exponential time hypothesis) since an algorithm that runs in time (nm)[superscript o(r)] would yield a subexponential algorithm for 3-SAT [Proceedings of STOC, 2012, pp. 145--162]. Our main result is based on establishing a normal form for nonnegative matrix factorization---which in turn allows us to exploit algebraic dependence among a large collection of linear transformations with variable entries. Additionally, we also demonstrate that nonnegative rank cannot be certified by even a very large submatrix of M, and this property also follows from the intuition gained from viewing nonnegative rank through the lens of systems of polynomial inequalities.National Science Foundation (U.S.) (Computing and Innovation Fellowship)National Science Foundation (U.S.) (grant DMS-0835373

DSpace@MIT

Crossref

Provable ICA with Unknown Gaussian Noise, and Implications for Gaussian Mixtures and Autoencoders

Author: Arora Sanjeev
Ge Rong
Moitra Ankur
Sachdeva Sushant
Publication venue
Publication date: 01/01/2012
Field of study

We present a new algorithm for Independent Component Analysis (ICA) which has provable performance guarantees. In particular, suppose we are given samples of the form

y = Ax + \eta

where

A

is an unknown

n \times n

matrix and

x

is a random variable whose components are independent and have a fourth moment strictly less than that of a standard Gaussian random variable and

\eta

is an

n

-dimensional Gaussian random variable with unknown covariance

\Sigma

: We give an algorithm that provable recovers

A

and

\Sigma

up to an additive

\epsilon

and whose running time and sample complexity are polynomial in

n

and

1 / \epsilon

. To accomplish this, we introduce a novel "quasi-whitening" step that may be useful in other contexts in which the covariance of Gaussian noise is not known in advance. We also give a general framework for finding all local optima of a function (given an oracle for approximately finding just one) and this is a crucial step in our algorithm, one that has been overlooked in previous attempts, and allows us to control the accumulation of error when we find the columns of

A

one by one via local search

arXiv.org e-Print Archive

CiteSeerX