15,322 research outputs found
Basis Learning as an Algorithmic Primitive *
Abstract A number of important problems in theoretical computer science and machine learning can be interpreted as recovering a certain basis. These include symmetric matrix eigendecomposition, certain tensor decompositions, Independent Component Analysis (ICA), spectral clustering and Gaussian mixture learning. Each of these problems reduces to an instance of our general model, which we call a "Basis Encoding Function" (BEF). We show that learning a basis within this model can then be provably and efficiently achieved using a first order iteration algorithm (gradient iteration). Our algorithm goes beyond tensor methods while generalizing a number of existing algorithms-e.g., the power method for symmetric matrices, the tensor power iteration for orthogonal decomposable tensors, and cumulant-based FastICA-all within a broader function-based dynamical systems framework. Our framework also unifies the unusual phenomenon observed in these domains that they can be solved using efficient non-convex optimization. Specifically, we describe a class of BEFs such that their local maxima on the unit sphere are in one-to-one correspondence with the basis elements. This description relies on a certain "hidden convexity" property of these functions. We provide a complete theoretical analysis of the gradient iteration even when the BEF is perturbed. We show convergence and complexity bounds polynomial in dimension and other relevant parameters, such as perturbation size. Our perturbation results can be considered as a nonlinear version of the classical Davis-Kahan theorem for perturbations of eigenvectors of symmetric matrices. In addition we show that our algorithm exhibits fast (superlinear) convergence and relate the speed of convergence to the properties of the BEF. Moreover, the gradient iteration algorithm can be easily and efficiently implemented in practice
Tight Lower Bounds for Multiplicative Weights Algorithmic Families
We study the fundamental problem of prediction with expert advice and develop
regret lower bounds for a large family of algorithms for this problem. We
develop simple adversarial primitives, that lend themselves to various
combinations leading to sharp lower bounds for many algorithmic families. We
use these primitives to show that the classic Multiplicative Weights Algorithm
(MWA) has a regret of , there by completely closing
the gap between upper and lower bounds. We further show a regret lower bound of
for a much more general family of
algorithms than MWA, where the learning rate can be arbitrarily varied over
time, or even picked from arbitrary distributions over time. We also use our
primitives to construct adversaries in the geometric horizon setting for MWA to
precisely characterize the regret at for the case
of experts and a lower bound of
for the case of arbitrary number of experts
List decoding of noisy Reed-Muller-like codes
First- and second-order Reed-Muller (RM(1) and RM(2), respectively) codes are
two fundamental error-correcting codes which arise in communication as well as
in probabilistically-checkable proofs and learning. In this paper, we take the
first steps toward extending the quick randomized decoding tools of RM(1) into
the realm of quadratic binary and, equivalently, Z_4 codes. Our main
algorithmic result is an extension of the RM(1) techniques from Goldreich-Levin
and Kushilevitz-Mansour algorithms to the Hankel code, a code between RM(1) and
RM(2). That is, given signal s of length N, we find a list that is a superset
of all Hankel codewords phi with dot product to s at least (1/sqrt(k)) times
the norm of s, in time polynomial in k and log(N). We also give a new and
simple formulation of a known Kerdock code as a subcode of the Hankel code. As
a corollary, we can list-decode Kerdock, too. Also, we get a quick algorithm
for finding a sparse Kerdock approximation. That is, for k small compared with
1/sqrt{N} and for epsilon > 0, we find, in time polynomial in (k
log(N)/epsilon), a k-Kerdock-term approximation s~ to s with Euclidean error at
most the factor (1+epsilon+O(k^2/sqrt{N})) times that of the best such
approximation
Simultaneous Codeword Optimization (SimCO) for Dictionary Update and Learning
We consider the data-driven dictionary learning problem. The goal is to seek
an over-complete dictionary from which every training signal can be best
approximated by a linear combination of only a few codewords. This task is
often achieved by iteratively executing two operations: sparse coding and
dictionary update. In the literature, there are two benchmark mechanisms to
update a dictionary. The first approach, such as the MOD algorithm, is
characterized by searching for the optimal codewords while fixing the sparse
coefficients. In the second approach, represented by the K-SVD method, one
codeword and the related sparse coefficients are simultaneously updated while
all other codewords and coefficients remain unchanged. We propose a novel
framework that generalizes the aforementioned two methods. The unique feature
of our approach is that one can update an arbitrary set of codewords and the
corresponding sparse coefficients simultaneously: when sparse coefficients are
fixed, the underlying optimization problem is similar to that in the MOD
algorithm; when only one codeword is selected for update, it can be proved that
the proposed algorithm is equivalent to the K-SVD method; and more importantly,
our method allows us to update all codewords and all sparse coefficients
simultaneously, hence the term simultaneous codeword optimization (SimCO).
Under the proposed framework, we design two algorithms, namely, primitive and
regularized SimCO. We implement these two algorithms based on a simple gradient
descent mechanism. Simulations are provided to demonstrate the performance of
the proposed algorithms, as compared with two baseline algorithms MOD and
K-SVD. Results show that regularized SimCO is particularly appealing in terms
of both learning performance and running speed.Comment: 13 page
Differentiable Programming Tensor Networks
Differentiable programming is a fresh programming paradigm which composes
parameterized algorithmic components and trains them using automatic
differentiation (AD). The concept emerges from deep learning but is not only
limited to training neural networks. We present theory and practice of
programming tensor network algorithms in a fully differentiable way. By
formulating the tensor network algorithm as a computation graph, one can
compute higher order derivatives of the program accurately and efficiently
using AD. We present essential techniques to differentiate through the tensor
networks contractions, including stable AD for tensor decomposition and
efficient backpropagation through fixed point iterations. As a demonstration,
we compute the specific heat of the Ising model directly by taking the second
order derivative of the free energy obtained in the tensor renormalization
group calculation. Next, we perform gradient based variational optimization of
infinite projected entangled pair states for quantum antiferromagnetic
Heisenberg model and obtain start-of-the-art variational energy and
magnetization with moderate efforts. Differentiable programming removes
laborious human efforts in deriving and implementing analytical gradients for
tensor network programs, which opens the door to more innovations in tensor
network algorithms and applications.Comment: Typos corrected, discussion and refs added; revised version accepted
for publication in PRX. Source code available at
https://github.com/wangleiphy/tensorgra
Recommended from our members
Commentary on “Toward an Anthropology of Computer-Mediated, Algorithmic Forms of Sociality” (Eitan Wilf, author). With Nick Seaver.
- …