216,284 research outputs found

    Sampling from large matrices: an approach through geometric functional analysis

    Full text link
    We study random submatrices of a large matrix A. We show how to approximately compute A from its random submatrix of the smallest possible size O(r log r) with a small error in the spectral norm, where r = ||A||_F^2 / ||A||_2^2 is the numerical rank of A. The numerical rank is always bounded by, and is a stable relaxation of, the rank of A. This yields an asymptotically optimal guarantee in an algorithm for computing low-rank approximations of A. We also prove asymptotically optimal estimates on the spectral norm and the cut-norm of random submatrices of A. The result for the cut-norm yields a slight improvement on the best known sample complexity for an approximation algorithm for MAX-2CSP problems. We use methods of Probability in Banach spaces, in particular the law of large numbers for operator-valued random variables.Comment: Our initial claim about Max-2-CSP problems is corrected. We put an exponential failure probability for the algorithm for low-rank approximations. Proofs are a little more explaine

    Streaming Lower Bounds for Approximating MAX-CUT

    Full text link
    We consider the problem of estimating the value of max cut in a graph in the streaming model of computation. At one extreme, there is a trivial 22-approximation for this problem that uses only O(logn)O(\log n) space, namely, count the number of edges and output half of this value as the estimate for max cut value. On the other extreme, if one allows O~(n)\tilde{O}(n) space, then a near-optimal solution to the max cut value can be obtained by storing an O~(n)\tilde{O}(n)-size sparsifier that essentially preserves the max cut. An intriguing question is if poly-logarithmic space suffices to obtain a non-trivial approximation to the max-cut value (that is, beating the factor 22). It was recently shown that the problem of estimating the size of a maximum matching in a graph admits a non-trivial approximation in poly-logarithmic space. Our main result is that any streaming algorithm that breaks the 22-approximation barrier requires Ω~(n)\tilde{\Omega}(\sqrt{n}) space even if the edges of the input graph are presented in random order. Our result is obtained by exhibiting a distribution over graphs which are either bipartite or 12\frac{1}{2}-far from being bipartite, and establishing that Ω~(n)\tilde{\Omega}(\sqrt{n}) space is necessary to differentiate between these two cases. Thus as a direct corollary we obtain that Ω~(n)\tilde{\Omega}(\sqrt{n}) space is also necessary to test if a graph is bipartite or 12\frac{1}{2}-far from being bipartite. We also show that for any ϵ>0\epsilon > 0, any streaming algorithm that obtains a (1+ϵ)(1 + \epsilon)-approximation to the max cut value when edges arrive in adversarial order requires n1O(ϵ)n^{1 - O(\epsilon)} space, implying that Ω(n)\Omega(n) space is necessary to obtain an arbitrarily good approximation to the max cut value

    Near-Optimal Algorithms for Online Matrix Prediction

    Full text link
    In several online prediction problems of recent interest the comparison class is composed of matrices with bounded entries. For example, in the online max-cut problem, the comparison class is matrices which represent cuts of a given graph and in online gambling the comparison class is matrices which represent permutations over n teams. Another important example is online collaborative filtering in which a widely used comparison class is the set of matrices with a small trace norm. In this paper we isolate a property of matrices, which we call (beta,tau)-decomposability, and derive an efficient online learning algorithm, that enjoys a regret bound of O*(sqrt(beta tau T)) for all problems in which the comparison class is composed of (beta,tau)-decomposable matrices. By analyzing the decomposability of cut matrices, triangular matrices, and low trace-norm matrices, we derive near optimal regret bounds for online max-cut, online gambling, and online collaborative filtering. In particular, this resolves (in the affirmative) an open problem posed by Abernethy (2010); Kleinberg et al (2010). Finally, we derive lower bounds for the three problems and show that our upper bounds are optimal up to logarithmic factors. In particular, our lower bound for the online collaborative filtering problem resolves another open problem posed by Shamir and Srebro (2011).Comment: 25 page

    Parallel Cross-Entropy Optimization

    Get PDF
    The cross-entropy (CE) method is a modern and effective optimization method well suited to parallel implementations. There is a vast array of problems today, some of which are highly complex and can take weeks or even longer to solve using current optimization techniques. This paper presents a general method for designing parallel CE algorithms for multiple instruction multiple data (MIVID) distributed memory machines using the message passing interface (MPI) library routines. We provide examples of its performance for two well-known test-cases: the (discrete) Max-Cut problem and (continuous) Rosenbrock problem. Speedup factors and a comparison to sequential CE methods are reported

    Efficient regularized isotonic regression with application to gene--gene interaction search

    Full text link
    Isotonic regression is a nonparametric approach for fitting monotonic models to data that has been widely studied from both theoretical and practical perspectives. However, this approach encounters computational and statistical overfitting issues in higher dimensions. To address both concerns, we present an algorithm, which we term Isotonic Recursive Partitioning (IRP), for isotonic regression based on recursively partitioning the covariate space through solution of progressively smaller "best cut" subproblems. This creates a regularized sequence of isotonic models of increasing model complexity that converges to the global isotonic regression solution. The models along the sequence are often more accurate than the unregularized isotonic regression model because of the complexity control they offer. We quantify this complexity control through estimation of degrees of freedom along the path. Success of the regularized models in prediction and IRPs favorable computational properties are demonstrated through a series of simulated and real data experiments. We discuss application of IRP to the problem of searching for gene--gene interactions and epistasis, and demonstrate it on data from genome-wide association studies of three common diseases.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS504 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org
    corecore