9,457 research outputs found
Cyclic Block Coordinate Descent With Variance Reduction for Composite Nonconvex Optimization
Nonconvex optimization is central in solving many machine learning problems,
in which block-wise structure is commonly encountered. In this work, we propose
cyclic block coordinate methods for nonconvex optimization problems with
non-asymptotic gradient norm guarantees. Our convergence analysis is based on a
gradient Lipschitz condition with respect to a Mahalanobis norm, inspired by a
recent progress on cyclic block coordinate methods. In deterministic settings,
our convergence guarantee matches the guarantee of (full-gradient) gradient
descent, but with the gradient Lipschitz constant being defined w.r.t.~the
Mahalanobis norm. In stochastic settings, we use recursive variance reduction
to decrease the per-iteration cost and match the arithmetic operation
complexity of current optimal stochastic full-gradient methods, with a unified
analysis for both finite-sum and infinite-sum cases. We further prove the
faster, linear convergence of our methods when a Polyak-{\L}ojasiewicz (P{\L})
condition holds for the objective function. To the best of our knowledge, our
work is the first to provide variance-reduced convergence guarantees for a
cyclic block coordinate method. Our experimental results demonstrate the
efficacy of the proposed variance-reduced cyclic scheme in training deep neural
nets
Nonnegative factorization and the maximum edge biclique problem
Nonnegative matrix factorization (NMF) is a data analysis technique based on the approximation of a nonnegative matrix with a product of two nonnegative factors, which allows compression and interpretation of nonnegative data. In this paper, we study the case of rank-one factorization and show that when the matrix to be factored is not required to be nonnegative, the corresponding problem (R1NF) becomes NP-hard. This sheds new light on the complexity of NMF since any algorithm for fixed-rank NMF must be able to solve at least implicitly such rank-one subproblems. Our proof relies on a reduction of the maximum edge biclique problem to R1NF. We also link stationary points of R1NF to feasible solutions of the biclique problem, which allows us to design a new type of biclique finding algorithm based on the application of a block-coordinate descent scheme to R1NF. We show that this algorithm, whose algorithmic complexity per iteration is proportional to the number of edges in the graph, is guaranteed to converge to a biclique and that it performs competitively with existing methods on random graphs and text mining datasets.nonnegative matrix factorization, rank-one factorization, maximum edge biclique problem, algorithmic complexity, biclique finding algorithm
- …