14,861 research outputs found

    Algorithms, Initializations, and Convergence for the Nonnegative Matrix Factorization

    Full text link
    It is well known that good initializations can improve the speed and accuracy of the solutions of many nonnegative matrix factorization (NMF) algorithms. Many NMF algorithms are sensitive with respect to the initialization of W or H or both. This is especially true of algorithms of the alternating least squares (ALS) type, including the two new ALS algorithms that we present in this paper. We compare the results of six initialization procedures (two standard and four new) on our ALS algorithms. Lastly, we discuss the practical issue of choosing an appropriate convergence criterion

    Fast Prediction with SVM Models Containing RBF Kernels

    Full text link
    We present an approximation scheme for support vector machine models that use an RBF kernel. A second-order Maclaurin series approximation is used for exponentials of inner products between support vectors and test instances. The approximation is applicable to all kernel methods featuring sums of kernel evaluations and makes no assumptions regarding data normalization. The prediction speed of approximated models no longer relates to the amount of support vectors but is quadratic in terms of the number of input dimensions. If the number of input dimensions is small compared to the amount of support vectors, the approximated model is significantly faster in prediction and has a smaller memory footprint. An optimized C++ implementation was made to assess the gain in prediction speed in a set of practical tests. We additionally provide a method to verify the approximation accuracy, prior to training models or during run-time, to ensure the loss in accuracy remains acceptable and within known bounds.Comment: 9 pages, 1 figure, 3 table

    Sparse Hierarchical Regression with Polynomials

    Full text link
    We present a novel method for exact hierarchical sparse polynomial regression. Our regressor is that degree rr polynomial which depends on at most kk inputs, counting at most β„“\ell monomial terms, which minimizes the sum of the squares of its prediction errors. The previous hierarchical sparse specification aligns well with modern big data settings where many inputs are not relevant for prediction purposes and the functional complexity of the regressor needs to be controlled as to avoid overfitting. We present a two-step approach to this hierarchical sparse regression problem. First, we discard irrelevant inputs using an extremely fast input ranking heuristic. Secondly, we take advantage of modern cutting plane methods for integer optimization to solve our resulting reduced hierarchical (k,β„“)(k, \ell)-sparse problem exactly. The ability of our method to identify all kk relevant inputs and all β„“\ell monomial terms is shown empirically to experience a phase transition. Crucially, the same transition also presents itself in our ability to reject all irrelevant features and monomials as well. In the regime where our method is statistically powerful, its computational complexity is interestingly on par with Lasso based heuristics. The presented work fills a void in terms of a lack of powerful disciplined nonlinear sparse regression methods in high-dimensional settings. Our method is shown empirically to scale to regression problems with nβ‰ˆ10,000n\approx 10,000 observations for input dimension pβ‰ˆ1,000p\approx 1,000

    LOCO: Distributing Ridge Regression with Random Projections

    Full text link
    We propose LOCO, an algorithm for large-scale ridge regression which distributes the features across workers on a cluster. Important dependencies between variables are preserved using structured random projections which are cheap to compute and must only be communicated once. We show that LOCO obtains a solution which is close to the exact ridge regression solution in the fixed design setting. We verify this experimentally in a simulation study as well as an application to climate prediction. Furthermore, we show that LOCO achieves significant speedups compared with a state-of-the-art distributed algorithm on a large-scale regression problem.Comment: 37 page

    Effective Resistances, Statistical Leverage, and Applications to Linear Equation Solving

    Full text link
    Recent work in theoretical computer science and scientific computing has focused on nearly-linear-time algorithms for solving systems of linear equations. While introducing several novel theoretical perspectives, this work has yet to lead to practical algorithms. In an effort to bridge this gap, we describe in this paper two related results. Our first and main result is a simple algorithm to approximate the solution to a set of linear equations defined by a Laplacian (for a graph GG with nn nodes and m≀n2m \le n^2 edges) constraint matrix. The algorithm is a non-recursive algorithm; even though it runs in O(n^2 \cdot \polylog(n)) time rather than O(mβ‹…polylog(n))O(m \cdot polylog(n)) time (given an oracle for the so-called statistical leverage scores), it is extremely simple; and it can be used to compute an approximate solution with a direct solver. In light of this result, our second result is a straightforward connection between the concept of graph resistance (which has proven useful in recent algorithms for linear equation solvers) and the concept of statistical leverage (which has proven useful in numerically-implementable randomized algorithms for large matrix problems and which has a natural data-analytic interpretation).Comment: 16 page

    A Field Guide to Forward-Backward Splitting with a FASTA Implementation

    Full text link
    Non-differentiable and constrained optimization play a key role in machine learning, signal and image processing, communications, and beyond. For high-dimensional minimization problems involving large datasets or many unknowns, the forward-backward splitting method provides a simple, practical solver. Despite its apparently simplicity, the performance of the forward-backward splitting is highly sensitive to implementation details. This article is an introductory review of forward-backward splitting with a special emphasis on practical implementation concerns. Issues like stepsize selection, acceleration, stopping conditions, and initialization are considered. Numerical experiments are used to compare the effectiveness of different approaches. Many variations of forward-backward splitting are implemented in the solver FASTA (short for Fast Adaptive Shrinkage/Thresholding Algorithm). FASTA provides a simple interface for applying forward-backward splitting to a broad range of problems

    A Block Decomposition Algorithm for Sparse Optimization

    Full text link
    Sparse optimization is a central problem in machine learning and computer vision. However, this problem is inherently NP-hard and thus difficult to solve in general. Combinatorial search methods find the global optimal solution but are confined to small-sized problems, while coordinate descent methods are efficient but often suffer from poor local minima. This paper considers a new block decomposition algorithm that combines the effectiveness of combinatorial search methods and the efficiency of coordinate descent methods. Specifically, we consider a random strategy or/and a greedy strategy to select a subset of coordinates as the working set, and then perform a global combinatorial search over the working set based on the original objective function. We show that our method finds stronger stationary points than Amir Beck et al.'s coordinate-wise optimization method. In addition, we establish the convergence rate of our algorithm. Our experiments on solving sparse regularized and sparsity constrained least squares optimization problems demonstrate that our method achieves state-of-the-art performance in terms of accuracy. For example, our method generally outperforms the well-known greedy pursuit method.Comment: to appear in SIGKDD 202

    On learning with shift-invariant structures

    Full text link
    We describe new results and algorithms for two different, but related, problems which deal with circulant matrices: learning shift-invariant components from training data and calculating the shift (or alignment) between two given signals. In the first instance, we deal with the shift-invariant dictionary learning problem while the latter bears the name of (compressive) shift retrieval. We formulate these problems using circulant and convolutional matrices (including unions of such matrices), define optimization problems that describe our goals and propose efficient ways to solve them. Based on these findings, we also show how to learn a wavelet-like dictionary from training data. We connect our work with various previous results from the literature and we show the effectiveness of our proposed algorithms using synthetic, ECG signals and images

    Weighted SGD for β„“p\ell_p Regression with Randomized Preconditioning

    Full text link
    In recent years, stochastic gradient descent (SGD) methods and randomized linear algebra (RLA) algorithms have been applied to many large-scale problems in machine learning and data analysis. We aim to bridge the gap between these two methods in solving constrained overdetermined linear regression problems---e.g., β„“2\ell_2 and β„“1\ell_1 regression problems. We propose a hybrid algorithm named pwSGD that uses RLA techniques for preconditioning and constructing an importance sampling distribution, and then performs an SGD-like iterative process with weighted sampling on the preconditioned system. We prove that pwSGD inherits faster convergence rates that only depend on the lower dimension of the linear system, while maintaining low computation complexity. Particularly, when solving β„“1\ell_1 regression with size nn by dd, pwSGD returns an approximate solution with Ο΅\epsilon relative error in the objective value in O(log⁑nβ‹…nnz(A)+poly(d)/Ο΅2)\mathcal{O}(\log n \cdot \text{nnz}(A) + \text{poly}(d)/\epsilon^2) time. This complexity is uniformly better than that of RLA methods in terms of both Ο΅\epsilon and dd when the problem is unconstrained. For β„“2\ell_2 regression, pwSGD returns an approximate solution with Ο΅\epsilon relative error in the objective value and the solution vector measured in prediction norm in O(log⁑nβ‹…nnz(A)+poly(d)log⁑(1/Ο΅)/Ο΅)\mathcal{O}(\log n \cdot \text{nnz}(A) + \text{poly}(d) \log(1/\epsilon) /\epsilon) time. We also provide lower bounds on the coreset complexity for more general regression problems, indicating that still new ideas will be needed to extend similar RLA preconditioning ideas to weighted SGD algorithms for more general regression problems. Finally, the effectiveness of such algorithms is illustrated numerically on both synthetic and real datasets.Comment: A conference version of this paper appears under the same title in Proceedings of ACM-SIAM Symposium on Discrete Algorithms, Arlington, VA, 201
    • …
    corecore