14,861 research outputs found
Algorithms, Initializations, and Convergence for the Nonnegative Matrix Factorization
It is well known that good initializations can improve the speed and accuracy
of the solutions of many nonnegative matrix factorization (NMF) algorithms.
Many NMF algorithms are sensitive with respect to the initialization of W or H
or both. This is especially true of algorithms of the alternating least squares
(ALS) type, including the two new ALS algorithms that we present in this paper.
We compare the results of six initialization procedures (two standard and four
new) on our ALS algorithms. Lastly, we discuss the practical issue of choosing
an appropriate convergence criterion
Fast Prediction with SVM Models Containing RBF Kernels
We present an approximation scheme for support vector machine models that use
an RBF kernel. A second-order Maclaurin series approximation is used for
exponentials of inner products between support vectors and test instances. The
approximation is applicable to all kernel methods featuring sums of kernel
evaluations and makes no assumptions regarding data normalization. The
prediction speed of approximated models no longer relates to the amount of
support vectors but is quadratic in terms of the number of input dimensions. If
the number of input dimensions is small compared to the amount of support
vectors, the approximated model is significantly faster in prediction and has a
smaller memory footprint. An optimized C++ implementation was made to assess
the gain in prediction speed in a set of practical tests. We additionally
provide a method to verify the approximation accuracy, prior to training models
or during run-time, to ensure the loss in accuracy remains acceptable and
within known bounds.Comment: 9 pages, 1 figure, 3 table
Sparse Hierarchical Regression with Polynomials
We present a novel method for exact hierarchical sparse polynomial
regression. Our regressor is that degree polynomial which depends on at
most inputs, counting at most monomial terms, which minimizes the
sum of the squares of its prediction errors. The previous hierarchical sparse
specification aligns well with modern big data settings where many inputs are
not relevant for prediction purposes and the functional complexity of the
regressor needs to be controlled as to avoid overfitting. We present a two-step
approach to this hierarchical sparse regression problem. First, we discard
irrelevant inputs using an extremely fast input ranking heuristic. Secondly, we
take advantage of modern cutting plane methods for integer optimization to
solve our resulting reduced hierarchical -sparse problem exactly.
The ability of our method to identify all relevant inputs and all
monomial terms is shown empirically to experience a phase transition.
Crucially, the same transition also presents itself in our ability to reject
all irrelevant features and monomials as well. In the regime where our method
is statistically powerful, its computational complexity is interestingly on par
with Lasso based heuristics. The presented work fills a void in terms of a lack
of powerful disciplined nonlinear sparse regression methods in high-dimensional
settings. Our method is shown empirically to scale to regression problems with
observations for input dimension
LOCO: Distributing Ridge Regression with Random Projections
We propose LOCO, an algorithm for large-scale ridge regression which
distributes the features across workers on a cluster. Important dependencies
between variables are preserved using structured random projections which are
cheap to compute and must only be communicated once. We show that LOCO obtains
a solution which is close to the exact ridge regression solution in the fixed
design setting. We verify this experimentally in a simulation study as well as
an application to climate prediction. Furthermore, we show that LOCO achieves
significant speedups compared with a state-of-the-art distributed algorithm on
a large-scale regression problem.Comment: 37 page
Effective Resistances, Statistical Leverage, and Applications to Linear Equation Solving
Recent work in theoretical computer science and scientific computing has
focused on nearly-linear-time algorithms for solving systems of linear
equations. While introducing several novel theoretical perspectives, this work
has yet to lead to practical algorithms. In an effort to bridge this gap, we
describe in this paper two related results. Our first and main result is a
simple algorithm to approximate the solution to a set of linear equations
defined by a Laplacian (for a graph with nodes and edges)
constraint matrix. The algorithm is a non-recursive algorithm; even though it
runs in O(n^2 \cdot \polylog(n)) time rather than
time (given an oracle for the so-called statistical leverage scores), it is
extremely simple; and it can be used to compute an approximate solution with a
direct solver. In light of this result, our second result is a straightforward
connection between the concept of graph resistance (which has proven useful in
recent algorithms for linear equation solvers) and the concept of statistical
leverage (which has proven useful in numerically-implementable randomized
algorithms for large matrix problems and which has a natural data-analytic
interpretation).Comment: 16 page
A Field Guide to Forward-Backward Splitting with a FASTA Implementation
Non-differentiable and constrained optimization play a key role in machine
learning, signal and image processing, communications, and beyond. For
high-dimensional minimization problems involving large datasets or many
unknowns, the forward-backward splitting method provides a simple, practical
solver. Despite its apparently simplicity, the performance of the
forward-backward splitting is highly sensitive to implementation details.
This article is an introductory review of forward-backward splitting with a
special emphasis on practical implementation concerns. Issues like stepsize
selection, acceleration, stopping conditions, and initialization are
considered. Numerical experiments are used to compare the effectiveness of
different approaches.
Many variations of forward-backward splitting are implemented in the solver
FASTA (short for Fast Adaptive Shrinkage/Thresholding Algorithm). FASTA
provides a simple interface for applying forward-backward splitting to a broad
range of problems
A Block Decomposition Algorithm for Sparse Optimization
Sparse optimization is a central problem in machine learning and computer
vision. However, this problem is inherently NP-hard and thus difficult to solve
in general. Combinatorial search methods find the global optimal solution but
are confined to small-sized problems, while coordinate descent methods are
efficient but often suffer from poor local minima. This paper considers a new
block decomposition algorithm that combines the effectiveness of combinatorial
search methods and the efficiency of coordinate descent methods. Specifically,
we consider a random strategy or/and a greedy strategy to select a subset of
coordinates as the working set, and then perform a global combinatorial search
over the working set based on the original objective function. We show that our
method finds stronger stationary points than Amir Beck et al.'s coordinate-wise
optimization method. In addition, we establish the convergence rate of our
algorithm. Our experiments on solving sparse regularized and sparsity
constrained least squares optimization problems demonstrate that our method
achieves state-of-the-art performance in terms of accuracy. For example, our
method generally outperforms the well-known greedy pursuit method.Comment: to appear in SIGKDD 202
On learning with shift-invariant structures
We describe new results and algorithms for two different, but related,
problems which deal with circulant matrices: learning shift-invariant
components from training data and calculating the shift (or alignment) between
two given signals. In the first instance, we deal with the shift-invariant
dictionary learning problem while the latter bears the name of (compressive)
shift retrieval. We formulate these problems using circulant and convolutional
matrices (including unions of such matrices), define optimization problems that
describe our goals and propose efficient ways to solve them. Based on these
findings, we also show how to learn a wavelet-like dictionary from training
data. We connect our work with various previous results from the literature and
we show the effectiveness of our proposed algorithms using synthetic, ECG
signals and images
Weighted SGD for Regression with Randomized Preconditioning
In recent years, stochastic gradient descent (SGD) methods and randomized
linear algebra (RLA) algorithms have been applied to many large-scale problems
in machine learning and data analysis. We aim to bridge the gap between these
two methods in solving constrained overdetermined linear regression
problems---e.g., and regression problems. We propose a hybrid
algorithm named pwSGD that uses RLA techniques for preconditioning and
constructing an importance sampling distribution, and then performs an SGD-like
iterative process with weighted sampling on the preconditioned system. We prove
that pwSGD inherits faster convergence rates that only depend on the lower
dimension of the linear system, while maintaining low computation complexity.
Particularly, when solving regression with size by , pwSGD
returns an approximate solution with relative error in the objective
value in
time. This complexity is uniformly better than that of RLA methods in terms of
both and when the problem is unconstrained. For
regression, pwSGD returns an approximate solution with relative
error in the objective value and the solution vector measured in prediction
norm in time. We also provide lower bounds on the coreset
complexity for more general regression problems, indicating that still new
ideas will be needed to extend similar RLA preconditioning ideas to weighted
SGD algorithms for more general regression problems. Finally, the effectiveness
of such algorithms is illustrated numerically on both synthetic and real
datasets.Comment: A conference version of this paper appears under the same title in
Proceedings of ACM-SIAM Symposium on Discrete Algorithms, Arlington, VA, 201
- β¦