216,284 research outputs found
Sampling from large matrices: an approach through geometric functional analysis
We study random submatrices of a large matrix A. We show how to approximately
compute A from its random submatrix of the smallest possible size O(r log r)
with a small error in the spectral norm, where r = ||A||_F^2 / ||A||_2^2 is the
numerical rank of A. The numerical rank is always bounded by, and is a stable
relaxation of, the rank of A. This yields an asymptotically optimal guarantee
in an algorithm for computing low-rank approximations of A. We also prove
asymptotically optimal estimates on the spectral norm and the cut-norm of
random submatrices of A. The result for the cut-norm yields a slight
improvement on the best known sample complexity for an approximation algorithm
for MAX-2CSP problems. We use methods of Probability in Banach spaces, in
particular the law of large numbers for operator-valued random variables.Comment: Our initial claim about Max-2-CSP problems is corrected. We put an
exponential failure probability for the algorithm for low-rank
approximations. Proofs are a little more explaine
Streaming Lower Bounds for Approximating MAX-CUT
We consider the problem of estimating the value of max cut in a graph in the
streaming model of computation. At one extreme, there is a trivial
-approximation for this problem that uses only space, namely,
count the number of edges and output half of this value as the estimate for max
cut value. On the other extreme, if one allows space, then a
near-optimal solution to the max cut value can be obtained by storing an
-size sparsifier that essentially preserves the max cut. An
intriguing question is if poly-logarithmic space suffices to obtain a
non-trivial approximation to the max-cut value (that is, beating the factor
). It was recently shown that the problem of estimating the size of a
maximum matching in a graph admits a non-trivial approximation in
poly-logarithmic space.
Our main result is that any streaming algorithm that breaks the
-approximation barrier requires space even if the
edges of the input graph are presented in random order. Our result is obtained
by exhibiting a distribution over graphs which are either bipartite or
-far from being bipartite, and establishing that
space is necessary to differentiate between these
two cases. Thus as a direct corollary we obtain that
space is also necessary to test if a graph is bipartite or -far
from being bipartite.
We also show that for any , any streaming algorithm that
obtains a -approximation to the max cut value when edges arrive
in adversarial order requires space, implying that
space is necessary to obtain an arbitrarily good approximation to
the max cut value
Near-Optimal Algorithms for Online Matrix Prediction
In several online prediction problems of recent interest the comparison class
is composed of matrices with bounded entries. For example, in the online
max-cut problem, the comparison class is matrices which represent cuts of a
given graph and in online gambling the comparison class is matrices which
represent permutations over n teams. Another important example is online
collaborative filtering in which a widely used comparison class is the set of
matrices with a small trace norm. In this paper we isolate a property of
matrices, which we call (beta,tau)-decomposability, and derive an efficient
online learning algorithm, that enjoys a regret bound of O*(sqrt(beta tau T))
for all problems in which the comparison class is composed of
(beta,tau)-decomposable matrices. By analyzing the decomposability of cut
matrices, triangular matrices, and low trace-norm matrices, we derive near
optimal regret bounds for online max-cut, online gambling, and online
collaborative filtering. In particular, this resolves (in the affirmative) an
open problem posed by Abernethy (2010); Kleinberg et al (2010). Finally, we
derive lower bounds for the three problems and show that our upper bounds are
optimal up to logarithmic factors. In particular, our lower bound for the
online collaborative filtering problem resolves another open problem posed by
Shamir and Srebro (2011).Comment: 25 page
Parallel Cross-Entropy Optimization
The cross-entropy (CE) method is a modern and effective optimization method well suited to parallel implementations. There is a vast array of problems today, some of which are highly complex and can take weeks or even longer to solve using current optimization techniques. This paper presents a general method for designing parallel CE algorithms for multiple instruction multiple data (MIVID) distributed memory machines using the message passing interface (MPI) library routines. We provide examples of its performance for two well-known test-cases: the (discrete) Max-Cut problem and (continuous) Rosenbrock problem. Speedup factors and a comparison to sequential CE methods are reported
Efficient regularized isotonic regression with application to gene--gene interaction search
Isotonic regression is a nonparametric approach for fitting monotonic models
to data that has been widely studied from both theoretical and practical
perspectives. However, this approach encounters computational and statistical
overfitting issues in higher dimensions. To address both concerns, we present
an algorithm, which we term Isotonic Recursive Partitioning (IRP), for isotonic
regression based on recursively partitioning the covariate space through
solution of progressively smaller "best cut" subproblems. This creates a
regularized sequence of isotonic models of increasing model complexity that
converges to the global isotonic regression solution. The models along the
sequence are often more accurate than the unregularized isotonic regression
model because of the complexity control they offer. We quantify this complexity
control through estimation of degrees of freedom along the path. Success of the
regularized models in prediction and IRPs favorable computational properties
are demonstrated through a series of simulated and real data experiments. We
discuss application of IRP to the problem of searching for gene--gene
interactions and epistasis, and demonstrate it on data from genome-wide
association studies of three common diseases.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS504 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …