349 research outputs found

    LIPIcs, Volume 251, ITCS 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 251, ITCS 2023, Complete Volum

    LIPIcs, Volume 261, ICALP 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 261, ICALP 2023, Complete Volum

    Point spread function approximation of high rank Hessians with locally supported non-negative integral kernels

    Full text link
    We present an efficient matrix-free point spread function (PSF) method for approximating operators that have locally supported non-negative integral kernels. The method computes impulse responses of the operator at scattered points, and interpolates these impulse responses to approximate integral kernel entries. Impulse responses are computed by applying the operator to Dirac comb batches of point sources, which are chosen by solving an ellipsoid packing problem. Evaluation of kernel entries allows us to construct a hierarchical matrix (H-matrix) approximation of the operator. Further matrix computations are performed with H-matrix methods. We use the method to build preconditioners for the Hessian operator in two inverse problems governed by partial differential equations (PDEs): inversion for the basal friction coefficient in an ice sheet flow problem and for the initial condition in an advective-diffusive transport problem. While for many ill-posed inverse problems the Hessian of the data misfit term exhibits a low rank structure, and hence a low rank approximation is suitable, for many problems of practical interest the numerical rank of the Hessian is still large. But Hessian impulse responses typically become more local as the numerical rank increases, which benefits the PSF method. Numerical results reveal that the PSF preconditioner clusters the spectrum of the preconditioned Hessian near one, yielding roughly 5x-10x reductions in the required number of PDE solves, as compared to regularization preconditioning and no preconditioning. We also present a numerical study for the influence of various parameters (that control the shape of the impulse responses) on the effectiveness of the advection-diffusion Hessian approximation. The results show that the PSF-based preconditioners are able to form good approximations of high-rank Hessians using a small number of operator applications

    Stochastic Distributed Optimization under Average Second-order Similarity: Algorithms and Analysis

    Full text link
    We study finite-sum distributed optimization problems involving a master node and n1n-1 local nodes under the popular δ\delta-similarity and μ\mu-strong convexity conditions. We propose two new algorithms, SVRS and AccSVRS, motivated by previous works. The non-accelerated SVRS method combines the techniques of gradient sliding and variance reduction and achieves a better communication complexity of O~(n+nδ/μ)\tilde{\mathcal{O}}(n {+} \sqrt{n}\delta/\mu) compared to existing non-accelerated algorithms. Applying the framework proposed in Katyusha X, we also develop a directly accelerated version named AccSVRS with the O~(n+n3/4δ/μ)\tilde{\mathcal{O}}(n {+} n^{3/4}\sqrt{\delta/\mu}) communication complexity. In contrast to existing results, our complexity bounds are entirely smoothness-free and exhibit superiority in ill-conditioned cases. Furthermore, we establish a nearly matched lower bound to verify the tightness of our AccSVRS method.Comment: Camera-ready version for NeurIPS 202

    Stochastic Optimization for Non-convex Problem with Inexact Hessian Matrix, Gradient, and Function

    Full text link
    Trust-region (TR) and adaptive regularization using cubics (ARC) have proven to have some very appealing theoretical properties for non-convex optimization by concurrently computing function value, gradient, and Hessian matrix to obtain the next search direction and the adjusted parameters. Although stochastic approximations help largely reduce the computational cost, it is challenging to theoretically guarantee the convergence rate. In this paper, we explore a family of stochastic TR and ARC methods that can simultaneously provide inexact computations of the Hessian matrix, gradient, and function values. Our algorithms require much fewer propagations overhead per iteration than TR and ARC. We prove that the iteration complexity to achieve ϵ\epsilon-approximate second-order optimality is of the same order as the exact computations demonstrated in previous studies. Additionally, the mild conditions on inexactness can be met by leveraging a random sampling technology in the finite-sum minimization problem. Numerical experiments with a non-convex problem support these findings and demonstrate that, with the same or a similar number of iterations, our algorithms require less computational overhead per iteration than current second-order methods.Comment: arXiv admin note: text overlap with arXiv:1809.0985

    Deep networks training and generalization: insights from linearization

    Full text link
    Bien qu'ils soient capables de représenter des fonctions très complexes, les réseaux de neurones profonds sont entraînés à l'aide de variations autour de la descente de gradient, un algorithme qui est basé sur une simple linéarisation de la fonction de coût à chaque itération lors de l'entrainement. Dans cette thèse, nous soutenons qu'une approche prometteuse pour élaborer une théorie générale qui expliquerait la généralisation des réseaux de neurones, est de s'inspirer d'une analogie avec les modèles linéaires, en étudiant le développement de Taylor au premier ordre qui relie des pas dans l'espace des paramètres à des modifications dans l'espace des fonctions. Cette thèse par article comprend 3 articles ainsi qu'une bibliothèque logicielle. La bibliothèque NNGeometry (chapitre 3) sert de fil rouge à l'ensemble des projets, et introduit une Interface de Programmation Applicative (API) simple pour étudier la dynamique d'entrainement linéarisée de réseaux de neurones, en exploitant des méthodes récentes ainsi que de nouvelles accélérations algorithmiques. Dans l'article EKFAC (chapitre 4), nous proposons une approchée de la Matrice d'Information de Fisher (FIM), utilisée dans l'algorithme d'optimisation du gradient naturel. Dans l'article Lazy vs Hasty (chapitre 5), nous comparons la fonction obtenue par dynamique d'entrainement linéarisée (par exemple dans le régime limite du noyau tangent (NTK) à largeur infinie), au régime d'entrainement réel, en utilisant des groupes d'exemples classés selon différentes notions de difficulté. Dans l'article NTK alignment (chapitre 6), nous révélons un effet de régularisation implicite qui découle de l'alignement du NTK au noyau cible, au fur et à mesure que l'entrainement progresse.Despite being able to represent very complex functions, deep artificial neural networks are trained using variants of the basic gradient descent algorithm, which relies on linearization of the loss at each iteration during training. In this thesis, we argue that a promising way to tackle the challenge of elaborating a comprehensive theory explaining generalization in deep networks, is to take advantage of an analogy with linear models, by studying the first order Taylor expansion that maps parameter space updates to function space progress. This thesis by publication is made of 3 papers and a software library. The library NNGeometry (chapter 3) serves as a common thread for all projects, and introduces a simple Application Programming Interface (API) to study the linearized training dynamics of deep networks using recent methods and contributed algorithmic accelerations. In the EKFAC paper (chapter 4), we propose an approximate to the Fisher Information Matrix (FIM), used in the natural gradient optimization algorithm. In the Lazy vs Hasty paper (chapter 5), we compare the function obtained while training using a linearized dynamics (e.g. in the infinite width Neural Tangent Kernel (NTK) limit regime), to the actual training regime, by means of examples grouped using different notions of difficulty. In the NTK alignment paper (chapter 6), we reveal an implicit regularization effect arising from the alignment of the NTK to the target kernel as training progresses

    A Scalable Two-Level Domain Decomposition Eigensolver for Periodic Schr\"odinger Eigenstates in Anisotropically Expanding Domains

    Full text link
    Accelerating iterative eigenvalue algorithms is often achieved by employing a spectral shifting strategy. Unfortunately, improved shifting typically leads to a smaller eigenvalue for the resulting shifted operator, which in turn results in a high condition number of the underlying solution matrix, posing a major challenge for iterative linear solvers. This paper introduces a two-level domain decomposition preconditioner that addresses this issue for the linear Schr\"odinger eigenvalue problem, even in the presence of a vanishing eigenvalue gap in non-uniform, expanding domains. Since the quasi-optimal shift, which is already available as the solution to a spectral cell problem, is required for the eigenvalue solver, it is logical to also use its associated eigenfunction as a generator to construct a coarse space. We analyze the resulting two-level additive Schwarz preconditioner and obtain a condition number bound that is independent of the domain's anisotropy, despite the need for only one basis function per subdomain for the coarse solver. Several numerical examples are presented to illustrate its flexibility and efficiency.Comment: 30 pages, 7 figures, 2 table

    Iterative Methods for Neutron and Thermal Radiation Transport Problems

    Get PDF
    We develop, analyze, and test iterative methods for three kinds of multigroup transport problems: (1) k-eigenvalue neutronics, (2) thermal radiation transport, and (3) problems with “upscattering,” in which particles can gain energy from collisions. For k-eigenvalue problems, many widely used methods to accelerate power iteration use “low-order” equations that contain nonlinear functionals of the transport solution. The nonlinear functionals require that the transport discretization produce strictly positive solutions, and the low-order problems are often more difficult to solve than simple diffusion problems. Similar iterative methods have been proposed that avoid nonlinearities and employ simple diffusion operators in their low-order problems. However, due partly to theoretical concerns, such methods have been largely overlooked by the reactor analysis community. To address theoretical questions, we present analyses showing that a power-like iteration process applied to the linear low-order problem (which looks like a k-eigenvalue problem with a fixed source) provides rapid acceleration and produces the correct transport eigenvalue and eigenvector. We also provide numerical results that support the existing body of evidence that these methods give rapid iterative convergence, similar to methods that use nonlinear functionals. Thermal-radiation problems solve for radiation intensity and material temperature using coupled equations that are nonlinear in temperature. Some of the most powerful iterative methods in use today solve the coupled equations using a low-order equation in place of the transport equation, where the low-order equation contains nonlinear functionals of the transport solution. The nonlinear functionals need to be updated only a few times before the system converges. We develop, analyze, and test a new method that works in the same way but employs a simple diffusion low-order operator without nonlinear functionals. Our analysis and results show rapid iterative convergence, comparable to methods that use nonlinear functionals in more complicated low-order equations. For problems with upscattering, we have investigated the importance of linearly anisotropic scattering for problems dominated by scattering in Graphite. Our results show that the linearly anisotropic scattering encountered in problems of practical interest does not degrade the effec-tiveness of the iterative acceleration method. Additionally, we have tested a method devised by Hanuš and Ragusa using the semi-consistent Continuous/Discontinuous Finite Element Method (CDFEM) diffusion discretization we have devised, in place of the Modified Interior Penalty (MIP) discretization they employed. Our results with CDFEM show an increased number of transport iterations compared to MIP when there are cells with high-aspect ratio, but a reduction in overall runtime due to reduced degrees of freedom of the CDFEM operator compared to the MIP operator

    LIPIcs, Volume 274, ESA 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 274, ESA 2023, Complete Volum

    Community Detection in the Hypergraph SBM: Exact Recovery Given the Similarity Matrix

    Full text link
    Community detection is a fundamental problem in network science. In this paper, we consider community detection in hypergraphs drawn from the hypergraphhypergraph stochasticstochastic blockblock modelmodel (HSBM), with a focus on exact community recovery. We study the performance of polynomial-time algorithms which operate on the similaritysimilarity matrixmatrix WW, where WijW_{ij} reports the number of hyperedges containing both ii and jj. Under this information model, while the precise information-theoretic limit is unknown, Kim, Bandeira, and Goemans derived a sharp threshold up to which the natural min-bisection estimator on WW succeeds. As min-bisection is NP-hard in the worst case, they additionally proposed a semidefinite programming (SDP) relaxation and conjectured that it achieves the same recovery threshold as the min-bisection estimator. In this paper, we confirm this conjecture. We also design a simple and highly efficient spectral algorithm with nearly linear runtime and show that it achieves the min-bisection threshold. Moreover, the spectral algorithm also succeeds in denser regimes and is considerably more efficient than previous approaches, establishing it as the method of choice. Our analysis of the spectral algorithm crucially relies on strong entrywiseentrywise bounds on the eigenvectors of WW. Our bounds are inspired by the work of Abbe, Fan, Wang, and Zhong, who developed entrywise bounds for eigenvectors of symmetric matrices with independent entries. Despite the complex dependency structure in similarity matrices, we prove similar entrywise guarantees.Comment: To appear at the Conference on Learning Theory (COLT) 2023. Error in footnote page
    corecore