6 research outputs found
Finding the spectral radius of a nonnegative irreducible symmetric tensor via DC programming
The Perron-Frobenius theorem says that the spectral radius of an irreducible
nonnegative tensor is the unique positive eigenvalue corresponding to a
positive eigenvector. With this in mind, the purpose of this paper is to find
the spectral radius and its corresponding positive eigenvector of an
irreducible nonnegative symmetric tensor. By transferring the eigenvalue
problem into an equivalent problem of minimizing a concave function on a closed
convex set, which is typically a DC (difference of convex functions)
programming, we derive a simpler and cheaper iterative method. The proposed
method is well-defined. Furthermore, we show that both sequences of the
eigenvalue estimates and the eigenvector evaluations generated by the method
-linearly converge to the spectral radius and its corresponding eigenvector,
respectively. To accelerate the method, we introduce a line search technique.
The improved method retains the same convergence property as the original
version. Preliminary numerical results show that the improved method performs
quite well
Computing the Least Fixed Point of Positive Polynomial Systems
We consider equation systems of the form X_1 = f_1(X_1, ..., X_n), ..., X_n =
f_n(X_1, ..., X_n) where f_1, ..., f_n are polynomials with positive real
coefficients. In vector form we denote such an equation system by X = f(X) and
call f a system of positive polynomials, short SPP. Equation systems of this
kind appear naturally in the analysis of stochastic models like stochastic
context-free grammars (with numerous applications to natural language
processing and computational biology), probabilistic programs with procedures,
web-surfing models with back buttons, and branching processes. The least
nonnegative solution mu f of an SPP equation X = f(X) is of central interest
for these models. Etessami and Yannakakis have suggested a particular version
of Newton's method to approximate mu f.
We extend a result of Etessami and Yannakakis and show that Newton's method
starting at 0 always converges to mu f. We obtain lower bounds on the
convergence speed of the method. For so-called strongly connected SPPs we prove
the existence of a threshold k_f such that for every i >= 0 the (k_f+i)-th
iteration of Newton's method has at least i valid bits of mu f. The proof
yields an explicit bound for k_f depending only on syntactic parameters of f.
We further show that for arbitrary SPP equations Newton's method still
converges linearly: there are k_f>=0 and alpha_f>0 such that for every i>=0 the
(k_f+alpha_f i)-th iteration of Newton's method has at least i valid bits of mu
f. The proof yields an explicit bound for alpha_f; the bound is exponential in
the number of equations, but we also show that it is essentially optimal.
Constructing a bound for k_f is still an open problem. Finally, we also provide
a geometric interpretation of Newton's method for SPPs.Comment: This is a technical report that goes along with an article to appear
in SIAM Journal on Computing
The Principles of Deep Learning Theory
This book develops an effective theory approach to understanding deep neural
networks of practical relevance. Beginning from a first-principles
component-level picture of networks, we explain how to determine an accurate
description of the output of trained networks by solving layer-to-layer
iteration equations and nonlinear learning dynamics. A main result is that the
predictions of networks are described by nearly-Gaussian distributions, with
the depth-to-width aspect ratio of the network controlling the deviations from
the infinite-width Gaussian description. We explain how these effectively-deep
networks learn nontrivial representations from training and more broadly
analyze the mechanism of representation learning for nonlinear models. From a
nearly-kernel-methods perspective, we find that the dependence of such models'
predictions on the underlying learning algorithm can be expressed in a simple
and universal way. To obtain these results, we develop the notion of
representation group flow (RG flow) to characterize the propagation of signals
through the network. By tuning networks to criticality, we give a practical
solution to the exploding and vanishing gradient problem. We further explain
how RG flow leads to near-universal behavior and lets us categorize networks
built from different activation functions into universality classes.
Altogether, we show that the depth-to-width ratio governs the effective model
complexity of the ensemble of trained networks. By using information-theoretic
techniques, we estimate the optimal aspect ratio at which we expect the network
to be practically most useful and show how residual connections can be used to
push this scale to arbitrary depths. With these tools, we can learn in detail
about the inductive bias of architectures, hyperparameters, and optimizers.Comment: 451 pages, to be published by Cambridge University Pres