349 research outputs found
LIPIcs, Volume 251, ITCS 2023, Complete Volume
LIPIcs, Volume 251, ITCS 2023, Complete Volum
LIPIcs, Volume 261, ICALP 2023, Complete Volume
LIPIcs, Volume 261, ICALP 2023, Complete Volum
Point spread function approximation of high rank Hessians with locally supported non-negative integral kernels
We present an efficient matrix-free point spread function (PSF) method for
approximating operators that have locally supported non-negative integral
kernels. The method computes impulse responses of the operator at scattered
points, and interpolates these impulse responses to approximate integral kernel
entries. Impulse responses are computed by applying the operator to Dirac comb
batches of point sources, which are chosen by solving an ellipsoid packing
problem. Evaluation of kernel entries allows us to construct a hierarchical
matrix (H-matrix) approximation of the operator. Further matrix computations
are performed with H-matrix methods. We use the method to build preconditioners
for the Hessian operator in two inverse problems governed by partial
differential equations (PDEs): inversion for the basal friction coefficient in
an ice sheet flow problem and for the initial condition in an
advective-diffusive transport problem. While for many ill-posed inverse
problems the Hessian of the data misfit term exhibits a low rank structure, and
hence a low rank approximation is suitable, for many problems of practical
interest the numerical rank of the Hessian is still large. But Hessian impulse
responses typically become more local as the numerical rank increases, which
benefits the PSF method. Numerical results reveal that the PSF preconditioner
clusters the spectrum of the preconditioned Hessian near one, yielding roughly
5x-10x reductions in the required number of PDE solves, as compared to
regularization preconditioning and no preconditioning. We also present a
numerical study for the influence of various parameters (that control the shape
of the impulse responses) on the effectiveness of the advection-diffusion
Hessian approximation. The results show that the PSF-based preconditioners are
able to form good approximations of high-rank Hessians using a small number of
operator applications
Stochastic Distributed Optimization under Average Second-order Similarity: Algorithms and Analysis
We study finite-sum distributed optimization problems involving a master node
and local nodes under the popular -similarity and -strong
convexity conditions. We propose two new algorithms, SVRS and AccSVRS,
motivated by previous works. The non-accelerated SVRS method combines the
techniques of gradient sliding and variance reduction and achieves a better
communication complexity of
compared to existing non-accelerated algorithms. Applying the framework
proposed in Katyusha X, we also develop a directly accelerated version named
AccSVRS with the
communication complexity. In contrast to existing results, our complexity
bounds are entirely smoothness-free and exhibit superiority in ill-conditioned
cases. Furthermore, we establish a nearly matched lower bound to verify the
tightness of our AccSVRS method.Comment: Camera-ready version for NeurIPS 202
Stochastic Optimization for Non-convex Problem with Inexact Hessian Matrix, Gradient, and Function
Trust-region (TR) and adaptive regularization using cubics (ARC) have proven
to have some very appealing theoretical properties for non-convex optimization
by concurrently computing function value, gradient, and Hessian matrix to
obtain the next search direction and the adjusted parameters. Although
stochastic approximations help largely reduce the computational cost, it is
challenging to theoretically guarantee the convergence rate. In this paper, we
explore a family of stochastic TR and ARC methods that can simultaneously
provide inexact computations of the Hessian matrix, gradient, and function
values. Our algorithms require much fewer propagations overhead per iteration
than TR and ARC. We prove that the iteration complexity to achieve
-approximate second-order optimality is of the same order as the
exact computations demonstrated in previous studies. Additionally, the mild
conditions on inexactness can be met by leveraging a random sampling technology
in the finite-sum minimization problem. Numerical experiments with a non-convex
problem support these findings and demonstrate that, with the same or a similar
number of iterations, our algorithms require less computational overhead per
iteration than current second-order methods.Comment: arXiv admin note: text overlap with arXiv:1809.0985
Deep networks training and generalization: insights from linearization
Bien qu'ils soient capables de représenter des fonctions très complexes, les réseaux de neurones profonds sont entraînés à l'aide de variations autour de la descente de gradient, un algorithme qui est basé sur une simple linéarisation de la fonction de coût à chaque itération lors de l'entrainement. Dans cette thèse, nous soutenons qu'une approche prometteuse pour élaborer une théorie générale qui expliquerait la généralisation des réseaux de neurones, est de s'inspirer d'une analogie avec les modèles linéaires, en étudiant le développement de Taylor au premier ordre qui relie des pas dans l'espace des paramètres à des modifications dans l'espace des fonctions.
Cette thèse par article comprend 3 articles ainsi qu'une bibliothèque logicielle. La bibliothèque NNGeometry (chapitre 3) sert de fil rouge à l'ensemble des projets, et introduit une Interface de Programmation Applicative (API) simple pour étudier la dynamique d'entrainement linéarisée de réseaux de neurones, en exploitant des méthodes récentes ainsi que de nouvelles accélérations algorithmiques. Dans l'article EKFAC (chapitre 4), nous proposons une approchée de la Matrice d'Information de Fisher (FIM), utilisée dans l'algorithme d'optimisation du gradient naturel. Dans l'article Lazy vs Hasty (chapitre 5), nous comparons la fonction obtenue par dynamique d'entrainement linéarisée (par exemple dans le régime limite du noyau tangent (NTK) à largeur infinie), au régime d'entrainement réel, en utilisant des groupes d'exemples classés selon différentes notions de difficulté. Dans l'article NTK alignment (chapitre 6), nous révélons un effet de régularisation implicite qui découle de l'alignement du NTK au noyau cible, au fur et à mesure que l'entrainement progresse.Despite being able to represent very complex functions, deep artificial neural networks are trained using variants of the basic gradient descent algorithm, which relies on linearization of the loss at each iteration during training. In this thesis, we argue that a promising way to tackle the challenge of elaborating a comprehensive theory explaining generalization in deep networks, is to take advantage of an analogy with linear models, by studying the first order Taylor expansion that maps parameter space updates to function space progress.
This thesis by publication is made of 3 papers and a software library. The library NNGeometry (chapter 3) serves as a common thread for all projects, and introduces a simple Application Programming Interface (API) to study the linearized training dynamics of deep networks using recent methods and contributed algorithmic accelerations. In the EKFAC paper (chapter 4), we propose an approximate to the Fisher Information Matrix (FIM), used in the natural gradient optimization algorithm. In the Lazy vs Hasty paper (chapter 5), we compare the function obtained while training using a linearized dynamics (e.g. in the infinite width Neural Tangent Kernel (NTK) limit regime), to the actual training regime, by means of examples grouped using different notions of difficulty. In the NTK alignment paper (chapter 6), we reveal an implicit regularization effect arising from the alignment of the NTK to the target kernel as training progresses
A Scalable Two-Level Domain Decomposition Eigensolver for Periodic Schr\"odinger Eigenstates in Anisotropically Expanding Domains
Accelerating iterative eigenvalue algorithms is often achieved by employing a
spectral shifting strategy. Unfortunately, improved shifting typically leads to
a smaller eigenvalue for the resulting shifted operator, which in turn results
in a high condition number of the underlying solution matrix, posing a major
challenge for iterative linear solvers. This paper introduces a two-level
domain decomposition preconditioner that addresses this issue for the linear
Schr\"odinger eigenvalue problem, even in the presence of a vanishing
eigenvalue gap in non-uniform, expanding domains. Since the quasi-optimal
shift, which is already available as the solution to a spectral cell problem,
is required for the eigenvalue solver, it is logical to also use its associated
eigenfunction as a generator to construct a coarse space. We analyze the
resulting two-level additive Schwarz preconditioner and obtain a condition
number bound that is independent of the domain's anisotropy, despite the need
for only one basis function per subdomain for the coarse solver. Several
numerical examples are presented to illustrate its flexibility and efficiency.Comment: 30 pages, 7 figures, 2 table
Iterative Methods for Neutron and Thermal Radiation Transport Problems
We develop, analyze, and test iterative methods for three kinds of multigroup transport problems: (1) k-eigenvalue neutronics, (2) thermal radiation transport, and (3) problems with “upscattering,” in which particles can gain energy from collisions.
For k-eigenvalue problems, many widely used methods to accelerate power iteration use “low-order” equations that contain nonlinear functionals of the transport solution. The nonlinear functionals require that the transport discretization produce strictly positive solutions, and the low-order problems are often more difficult to solve than simple diffusion problems. Similar iterative methods have been proposed that avoid nonlinearities and employ simple diffusion operators in their low-order problems. However, due partly to theoretical concerns, such methods have been largely overlooked by the reactor analysis community. To address theoretical questions, we present analyses showing that a power-like iteration process applied to the linear low-order problem (which looks like a k-eigenvalue problem with a fixed source) provides rapid acceleration and produces the correct transport eigenvalue and eigenvector. We also provide numerical results that support the existing body of evidence that these methods give rapid iterative convergence, similar to methods that use nonlinear functionals.
Thermal-radiation problems solve for radiation intensity and material temperature using coupled equations that are nonlinear in temperature. Some of the most powerful iterative methods in use today solve the coupled equations using a low-order equation in place of the transport equation, where the low-order equation contains nonlinear functionals of the transport solution. The nonlinear functionals need to be updated only a few times before the system converges. We develop, analyze, and test a new method that works in the same way but employs a simple diffusion low-order operator without nonlinear functionals. Our analysis and results show rapid iterative convergence, comparable to methods that use nonlinear functionals in more complicated low-order equations.
For problems with upscattering, we have investigated the importance of linearly anisotropic scattering for problems dominated by scattering in Graphite. Our results show that the linearly anisotropic scattering encountered in problems of practical interest does not degrade the effec-tiveness of the iterative acceleration method. Additionally, we have tested a method devised by Hanuš and Ragusa using the semi-consistent Continuous/Discontinuous Finite Element Method (CDFEM) diffusion discretization we have devised, in place of the Modified Interior Penalty (MIP) discretization they employed. Our results with CDFEM show an increased number of transport iterations compared to MIP when there are cells with high-aspect ratio, but a reduction in overall runtime due to reduced degrees of freedom of the CDFEM operator compared to the MIP operator
LIPIcs, Volume 274, ESA 2023, Complete Volume
LIPIcs, Volume 274, ESA 2023, Complete Volum
Community Detection in the Hypergraph SBM: Exact Recovery Given the Similarity Matrix
Community detection is a fundamental problem in network science. In this
paper, we consider community detection in hypergraphs drawn from the
(HSBM), with a focus on exact
community recovery. We study the performance of polynomial-time algorithms
which operate on the , where reports the
number of hyperedges containing both and . Under this information model,
while the precise information-theoretic limit is unknown, Kim, Bandeira, and
Goemans derived a sharp threshold up to which the natural min-bisection
estimator on succeeds. As min-bisection is NP-hard in the worst case, they
additionally proposed a semidefinite programming (SDP) relaxation and
conjectured that it achieves the same recovery threshold as the min-bisection
estimator.
In this paper, we confirm this conjecture. We also design a simple and highly
efficient spectral algorithm with nearly linear runtime and show that it
achieves the min-bisection threshold. Moreover, the spectral algorithm also
succeeds in denser regimes and is considerably more efficient than previous
approaches, establishing it as the method of choice. Our analysis of the
spectral algorithm crucially relies on strong bounds on the
eigenvectors of . Our bounds are inspired by the work of Abbe, Fan, Wang,
and Zhong, who developed entrywise bounds for eigenvectors of symmetric
matrices with independent entries. Despite the complex dependency structure in
similarity matrices, we prove similar entrywise guarantees.Comment: To appear at the Conference on Learning Theory (COLT) 2023. Error in
footnote page
- …