Search CORE

14,861 research outputs found

Algorithms, Initializations, and Convergence for the Nonnegative Matrix Factorization

Author: Albright Russell
Cox James
Duling David
Langville Amy N.
Meyer Carl D.
Publication venue
Publication date: 27/07/2014
Field of study

It is well known that good initializations can improve the speed and accuracy of the solutions of many nonnegative matrix factorization (NMF) algorithms. Many NMF algorithms are sensitive with respect to the initialization of W or H or both. This is especially true of algorithms of the alternating least squares (ALS) type, including the two new ALS algorithms that we present in this paper. We compare the results of six initialization procedures (two standard and four new) on our ALS algorithms. Lastly, we discuss the practical issue of choosing an appropriate convergence criterion

arXiv.org e-Print Archive

Fast Prediction with SVM Models Containing RBF Kernels

Author: Claesen Marc
De Moor Bart
De Smet Frank
Suykens Johan A. K.
Publication venue
Publication date: 03/10/2014
Field of study

We present an approximation scheme for support vector machine models that use an RBF kernel. A second-order Maclaurin series approximation is used for exponentials of inner products between support vectors and test instances. The approximation is applicable to all kernel methods featuring sums of kernel evaluations and makes no assumptions regarding data normalization. The prediction speed of approximated models no longer relates to the amount of support vectors but is quadratic in terms of the number of input dimensions. If the number of input dimensions is small compared to the amount of support vectors, the approximated model is significantly faster in prediction and has a smaller memory footprint. An optimized C++ implementation was made to assess the gain in prediction speed in a set of practical tests. We additionally provide a method to verify the approximation accuracy, prior to training models or during run-time, to ensure the loss in accuracy remains acceptable and within known bounds.Comment: 9 pages, 1 figure, 3 table

arXiv.org e-Print Archive

Sparse Hierarchical Regression with Polynomials

Author: Bertsimas Dimitris
Van Parys Bart
Publication venue
Publication date: 28/09/2017
Field of study

We present a novel method for exact hierarchical sparse polynomial regression. Our regressor is that degree

r

polynomial which depends on at most

k

inputs, counting at most

\ell

monomial terms, which minimizes the sum of the squares of its prediction errors. The previous hierarchical sparse specification aligns well with modern big data settings where many inputs are not relevant for prediction purposes and the functional complexity of the regressor needs to be controlled as to avoid overfitting. We present a two-step approach to this hierarchical sparse regression problem. First, we discard irrelevant inputs using an extremely fast input ranking heuristic. Secondly, we take advantage of modern cutting plane methods for integer optimization to solve our resulting reduced hierarchical

(k, \ell)

-sparse problem exactly. The ability of our method to identify all

k

relevant inputs and all

\ell

monomial terms is shown empirically to experience a phase transition. Crucially, the same transition also presents itself in our ability to reject all irrelevant features and monomials as well. In the regime where our method is statistically powerful, its computational complexity is interestingly on par with Lasso based heuristics. The presented work fills a void in terms of a lack of powerful disciplined nonlinear sparse regression methods in high-dimensional settings. Our method is shown empirically to scale to regression problems with

n\approx 10,000

observations for input dimension

p\approx 1,000

arXiv.org e-Print Archive

Iterative hessian sketch in input sparsity time

Author: Cormode Graham
Dickens Charlie
Publication venue
Publication date
Field of study

Warwick Research Archives Portal Repository

LOCO: Distributing Ridge Regression with Random Projections

Author: Heinze Christina
Krummenacher Gabriel
McWilliams Brian
Meinshausen Nicolai
Publication venue
Publication date: 08/06/2015
Field of study

We propose LOCO, an algorithm for large-scale ridge regression which distributes the features across workers on a cluster. Important dependencies between variables are preserved using structured random projections which are cheap to compute and must only be communicated once. We show that LOCO obtains a solution which is close to the exact ridge regression solution in the fixed design setting. We verify this experimentally in a simulation study as well as an application to climate prediction. Furthermore, we show that LOCO achieves significant speedups compared with a state-of-the-art distributed algorithm on a large-scale regression problem.Comment: 37 page

arXiv.org e-Print Archive

Effective Resistances, Statistical Leverage, and Applications to Linear Equation Solving

Author: Drineas Petros
Mahoney Michael W.
Publication venue
Publication date: 18/05/2010
Field of study

Recent work in theoretical computer science and scientific computing has focused on nearly-linear-time algorithms for solving systems of linear equations. While introducing several novel theoretical perspectives, this work has yet to lead to practical algorithms. In an effort to bridge this gap, we describe in this paper two related results. Our first and main result is a simple algorithm to approximate the solution to a set of linear equations defined by a Laplacian (for a graph

G

with

n

nodes and

m \le n^2

edges) constraint matrix. The algorithm is a non-recursive algorithm; even though it runs in O(n^2 \cdot \polylog(n)) time rather than

O(m \cdot polylog(n))

time (given an oracle for the so-called statistical leverage scores), it is extremely simple; and it can be used to compute an approximate solution with a direct solver. In light of this result, our second result is a straightforward connection between the concept of graph resistance (which has proven useful in recent algorithms for linear equation solvers) and the concept of statistical leverage (which has proven useful in numerically-implementable randomized algorithms for large matrix problems and which has a natural data-analytic interpretation).Comment: 16 page

arXiv.org e-Print Archive

A Field Guide to Forward-Backward Splitting with a FASTA Implementation

Author: Baraniuk Richard
Goldstein Tom
Studer Christoph
Publication venue
Publication date: 27/12/2016
Field of study

Non-differentiable and constrained optimization play a key role in machine learning, signal and image processing, communications, and beyond. For high-dimensional minimization problems involving large datasets or many unknowns, the forward-backward splitting method provides a simple, practical solver. Despite its apparently simplicity, the performance of the forward-backward splitting is highly sensitive to implementation details. This article is an introductory review of forward-backward splitting with a special emphasis on practical implementation concerns. Issues like stepsize selection, acceleration, stopping conditions, and initialization are considered. Numerical experiments are used to compare the effectiveness of different approaches. Many variations of forward-backward splitting are implemented in the solver FASTA (short for Fast Adaptive Shrinkage/Thresholding Algorithm). FASTA provides a simple interface for applying forward-backward splitting to a broad range of problems

arXiv.org e-Print Archive

A Block Decomposition Algorithm for Sparse Optimization

Author: Shen Li
Yuan Ganzhao
Zheng Wei-Shi
Publication venue
Publication date: 28/06/2020
Field of study

Sparse optimization is a central problem in machine learning and computer vision. However, this problem is inherently NP-hard and thus difficult to solve in general. Combinatorial search methods find the global optimal solution but are confined to small-sized problems, while coordinate descent methods are efficient but often suffer from poor local minima. This paper considers a new block decomposition algorithm that combines the effectiveness of combinatorial search methods and the efficiency of coordinate descent methods. Specifically, we consider a random strategy or/and a greedy strategy to select a subset of coordinates as the working set, and then perform a global combinatorial search over the working set based on the original objective function. We show that our method finds stronger stationary points than Amir Beck et al.'s coordinate-wise optimization method. In addition, we establish the convergence rate of our algorithm. Our experiments on solving sparse regularized and sparsity constrained least squares optimization problems demonstrate that our method achieves state-of-the-art performance in terms of accuracy. For example, our method generally outperforms the well-known greedy pursuit method.Comment: to appear in SIGKDD 202

arXiv.org e-Print Archive

On learning with shift-invariant structures

Author: Rusu Cristian
Publication venue
Publication date: 28/06/2019
Field of study

We describe new results and algorithms for two different, but related, problems which deal with circulant matrices: learning shift-invariant components from training data and calculating the shift (or alignment) between two given signals. In the first instance, we deal with the shift-invariant dictionary learning problem while the latter bears the name of (compressive) shift retrieval. We formulate these problems using circulant and convolutional matrices (including unions of such matrices), define optimization problems that describe our goals and propose efficient ways to solve them. Based on these findings, we also show how to learn a wavelet-like dictionary from training data. We connect our work with various previous results from the literature and we show the effectiveness of our proposed algorithms using synthetic, ECG signals and images

arXiv.org e-Print Archive

Weighted SGD for $\ell_p$ Regression with Randomized Preconditioning

Author: Chow Yin-Lam
Mahoney Michael W.
Ré Christopher
Yang Jiyan
Publication venue
Publication date: 10/07/2017
Field of study

In recent years, stochastic gradient descent (SGD) methods and randomized linear algebra (RLA) algorithms have been applied to many large-scale problems in machine learning and data analysis. We aim to bridge the gap between these two methods in solving constrained overdetermined linear regression problems---e.g.,

\ell_2

and

\ell_1

regression problems. We propose a hybrid algorithm named pwSGD that uses RLA techniques for preconditioning and constructing an importance sampling distribution, and then performs an SGD-like iterative process with weighted sampling on the preconditioned system. We prove that pwSGD inherits faster convergence rates that only depend on the lower dimension of the linear system, while maintaining low computation complexity. Particularly, when solving

\ell_1

regression with size

n

d

, pwSGD returns an approximate solution with

\epsilon

relative error in the objective value in

\mathcal{O}(\log n \cdot \text{nnz}(A) + \text{poly}(d)/\epsilon^2)

time. This complexity is uniformly better than that of RLA methods in terms of both

\epsilon

and

d

when the problem is unconstrained. For

\ell_2

regression, pwSGD returns an approximate solution with

\epsilon

relative error in the objective value and the solution vector measured in prediction norm in

\mathcal{O}(\log n \cdot \text{nnz}(A) + \text{poly}(d) \log(1/\epsilon) /\epsilon)

time. We also provide lower bounds on the coreset complexity for more general regression problems, indicating that still new ideas will be needed to extend similar RLA preconditioning ideas to weighted SGD algorithms for more general regression problems. Finally, the effectiveness of such algorithms is illustrated numerically on both synthetic and real datasets.Comment: A conference version of this paper appears under the same title in Proceedings of ACM-SIAM Symposium on Discrete Algorithms, Arlington, VA, 201

arXiv.org e-Print Archive