6 research outputs found

    Finding the spectral radius of a nonnegative irreducible symmetric tensor via DC programming

    Full text link
    The Perron-Frobenius theorem says that the spectral radius of an irreducible nonnegative tensor is the unique positive eigenvalue corresponding to a positive eigenvector. With this in mind, the purpose of this paper is to find the spectral radius and its corresponding positive eigenvector of an irreducible nonnegative symmetric tensor. By transferring the eigenvalue problem into an equivalent problem of minimizing a concave function on a closed convex set, which is typically a DC (difference of convex functions) programming, we derive a simpler and cheaper iterative method. The proposed method is well-defined. Furthermore, we show that both sequences of the eigenvalue estimates and the eigenvector evaluations generated by the method QQ-linearly converge to the spectral radius and its corresponding eigenvector, respectively. To accelerate the method, we introduce a line search technique. The improved method retains the same convergence property as the original version. Preliminary numerical results show that the improved method performs quite well

    Computing the Least Fixed Point of Positive Polynomial Systems

    Full text link
    We consider equation systems of the form X_1 = f_1(X_1, ..., X_n), ..., X_n = f_n(X_1, ..., X_n) where f_1, ..., f_n are polynomials with positive real coefficients. In vector form we denote such an equation system by X = f(X) and call f a system of positive polynomials, short SPP. Equation systems of this kind appear naturally in the analysis of stochastic models like stochastic context-free grammars (with numerous applications to natural language processing and computational biology), probabilistic programs with procedures, web-surfing models with back buttons, and branching processes. The least nonnegative solution mu f of an SPP equation X = f(X) is of central interest for these models. Etessami and Yannakakis have suggested a particular version of Newton's method to approximate mu f. We extend a result of Etessami and Yannakakis and show that Newton's method starting at 0 always converges to mu f. We obtain lower bounds on the convergence speed of the method. For so-called strongly connected SPPs we prove the existence of a threshold k_f such that for every i >= 0 the (k_f+i)-th iteration of Newton's method has at least i valid bits of mu f. The proof yields an explicit bound for k_f depending only on syntactic parameters of f. We further show that for arbitrary SPP equations Newton's method still converges linearly: there are k_f>=0 and alpha_f>0 such that for every i>=0 the (k_f+alpha_f i)-th iteration of Newton's method has at least i valid bits of mu f. The proof yields an explicit bound for alpha_f; the bound is exponential in the number of equations, but we also show that it is essentially optimal. Constructing a bound for k_f is still an open problem. Finally, we also provide a geometric interpretation of Newton's method for SPPs.Comment: This is a technical report that goes along with an article to appear in SIAM Journal on Computing

    The Principles of Deep Learning Theory

    Full text link
    This book develops an effective theory approach to understanding deep neural networks of practical relevance. Beginning from a first-principles component-level picture of networks, we explain how to determine an accurate description of the output of trained networks by solving layer-to-layer iteration equations and nonlinear learning dynamics. A main result is that the predictions of networks are described by nearly-Gaussian distributions, with the depth-to-width aspect ratio of the network controlling the deviations from the infinite-width Gaussian description. We explain how these effectively-deep networks learn nontrivial representations from training and more broadly analyze the mechanism of representation learning for nonlinear models. From a nearly-kernel-methods perspective, we find that the dependence of such models' predictions on the underlying learning algorithm can be expressed in a simple and universal way. To obtain these results, we develop the notion of representation group flow (RG flow) to characterize the propagation of signals through the network. By tuning networks to criticality, we give a practical solution to the exploding and vanishing gradient problem. We further explain how RG flow leads to near-universal behavior and lets us categorize networks built from different activation functions into universality classes. Altogether, we show that the depth-to-width ratio governs the effective model complexity of the ensemble of trained networks. By using information-theoretic techniques, we estimate the optimal aspect ratio at which we expect the network to be practically most useful and show how residual connections can be used to push this scale to arbitrary depths. With these tools, we can learn in detail about the inductive bias of architectures, hyperparameters, and optimizers.Comment: 451 pages, to be published by Cambridge University Pres
    corecore