14 research outputs found
Scalable Parallel Factorizations of SDD Matrices and Efficient Sampling for Gaussian Graphical Models
Motivated by a sampling problem basic to computational statistical inference,
we develop a nearly optimal algorithm for a fundamental problem in spectral
graph theory and numerical analysis. Given an SDDM matrix , and a constant , our algorithm gives efficient
access to a sparse linear operator such that
The
solution is based on factoring into a product of simple and
sparse matrices using squaring and spectral sparsification. For
with non-zero entries, our algorithm takes work nearly-linear in , and
polylogarithmic depth on a parallel machine with processors. This gives the
first sampling algorithm that only requires nearly linear work and i.i.d.
random univariate Gaussian samples to generate i.i.d. random samples for
-dimensional Gaussian random fields with SDDM precision matrices. For
sampling this natural subclass of Gaussian random fields, it is optimal in the
randomness and nearly optimal in the work and parallel complexity. In addition,
our sampling algorithm can be directly extended to Gaussian random fields with
SDD precision matrices
An Efficient Parallel Algorithm for Spectral Sparsification of Laplacian and SDDM Matrix Polynomials
For "large" class of continuous probability density functions
(p.d.f.), we demonstrate that for every there is mixture of
discrete Binomial distributions (MDBD) with
distinct Binomial distributions that -approximates a
discretized p.d.f. for all , where
. Also, we give two efficient parallel
algorithms to find such MDBD.
Moreover, we propose a sequential algorithm that on input MDBD with
for that induces a discretized p.d.f. ,
that is either Laplacian or SDDM matrix and parameter ,
outputs in time a spectral
sparsifier of a matrix-polynomial, where
notation hides factors.
This improves the Cheng et al.'s [CCLPT15] algorithm whose run time is
.
Furthermore, our algorithm is parallelizable and runs in work
and depth . Our main algorithmic contribution is to
propose the first efficient parallel algorithm that on input continuous p.d.f.
, matrix as above, outputs a spectral sparsifier of
matrix-polynomial whose coefficients approximate component-wise the discretized
p.d.f. .
Our results yield the first efficient and parallel algorithm that runs in
nearly linear work and poly-logarithmic depth and analyzes the long term
behaviour of Markov chains in non-trivial settings. In addition, we strengthen
the Spielman and Peng's [PS14] parallel SDD solver
Book of Abstracts of the Sixth SIAM Workshop on Combinatorial Scientific Computing
Book of Abstracts of CSC14 edited by Bora UçarInternational audienceThe Sixth SIAM Workshop on Combinatorial Scientific Computing, CSC14, was organized at the Ecole Normale Supérieure de Lyon, France on 21st to 23rd July, 2014. This two and a half day event marked the sixth in a series that started ten years ago in San Francisco, USA. The CSC14 Workshop's focus was on combinatorial mathematics and algorithms in high performance computing, broadly interpreted. The workshop featured three invited talks, 27 contributed talks and eight poster presentations. All three invited talks were focused on two interesting fields of research specifically: randomized algorithms for numerical linear algebra and network analysis. The contributed talks and the posters targeted modeling, analysis, bisection, clustering, and partitioning of graphs, applied in the context of networks, sparse matrix factorizations, iterative solvers, fast multi-pole methods, automatic differentiation, high-performance computing, and linear programming. The workshop was held at the premises of the LIP laboratory of ENS Lyon and was generously supported by the LABEX MILYON (ANR-10-LABX-0070, Université de Lyon, within the program ''Investissements d'Avenir'' ANR-11-IDEX-0007 operated by the French National Research Agency), and by SIAM
Recommended from our members
Foundations of Node Representation Learning
Low-dimensional node representations, also called node embeddings, are a cornerstone in the modeling and analysis of complex networks. In recent years, advances in deep learning have spurred development of novel neural network-inspired methods for learning node representations which have largely surpassed classical \u27spectral\u27 embeddings in performance. Yet little work asks the central questions of this thesis: Why do these novel deep methods outperform their classical predecessors, and what are their limitations?
We pursue several paths to answering these questions. To further our understanding of deep embedding methods, we explore their relationship with spectral methods, which are better understood, and show that some popular deep methods are equivalent to spectral methods in a certain natural limit. We also introduce the problem of inverting node embeddings in order to probe what information they contain. Further, we propose a simple, non-deep method for node representation learning, and find it to often be competitive with modern deep graph networks in downstream performance.
To better understand the limitations of node embeddings, we prove some upper and lower bounds on their capabilities. Most notably, we prove that node embeddings are capable of exact low-dimensional representation of networks with bounded max degree or arboricity, and we further show that a simple algorithm can find such exact embeddings for real-world networks. By contrast, we also prove inherent bounds on random graph models, including those derived from node embeddings, to capture key structural properties of networks without simply memorizing a given graph
On non-linear network embedding methods
As a linear method, spectral clustering is the only network embedding algorithm that offers both a provably fast computation and an advanced theoretical understanding. The accuracy of spectral clustering depends on the Cheeger ratio defined as the ratio between the graph conductance and the 2nd smallest eigenvalue of its normalizedLaplacian. In several graph families whose Cheeger ratio reaches its upper bound of Theta(n), the approximation power of spectral clustering is proven to perform poorly. Moreover, recent non-linear network embedding methods have surpassed spectral clustering by state-of-the-art performance with little to no theoretical understanding to back them.
The dissertation includes work that: (1) extends the theory of spectral clustering in order to address its weakness and provide ground for a theoretical understanding of existing non-linear network embedding methods.; (2) provides non-linear extensions of spectral clustering with theoretical guarantees, e.g., via different spectral modification algorithms; (3) demonstrates the potentials of this approach on different types and sizes of graphs from industrial applications; and (4)makes a theory-informed use of artificial networks
AVATAR - Machine Learning Pipeline Evaluation Using Surrogate Model
© 2020, The Author(s). The evaluation of machine learning (ML) pipelines is essential during automatic ML pipeline composition and optimisation. The previous methods such as Bayesian-based and genetic-based optimisation, which are implemented in Auto-Weka, Auto-sklearn and TPOT, evaluate pipelines by executing them. Therefore, the pipeline composition and optimisation of these methods requires a tremendous amount of time that prevents them from exploring complex pipelines to find better predictive models. To further explore this research challenge, we have conducted experiments showing that many of the generated pipelines are invalid, and it is unnecessary to execute them to find out whether they are good pipelines. To address this issue, we propose a novel method to evaluate the validity of ML pipelines using a surrogate model (AVATAR). The AVATAR enables to accelerate automatic ML pipeline composition and optimisation by quickly ignoring invalid pipelines. Our experiments show that the AVATAR is more efficient in evaluating complex pipelines in comparison with the traditional evaluation approaches requiring their execution
LIPIcs, Volume 261, ICALP 2023, Complete Volume
LIPIcs, Volume 261, ICALP 2023, Complete Volum