14 research outputs found

    Scalable Parallel Factorizations of SDD Matrices and Efficient Sampling for Gaussian Graphical Models

    Full text link
    Motivated by a sampling problem basic to computational statistical inference, we develop a nearly optimal algorithm for a fundamental problem in spectral graph theory and numerical analysis. Given an n×nn\times n SDDM matrix M{\bf \mathbf{M}}, and a constant −1≀p≀1-1 \leq p \leq 1, our algorithm gives efficient access to a sparse n×nn\times n linear operator C~\tilde{\mathbf{C}} such that Mp≈C~C~⊀.{\mathbf{M}}^{p} \approx \tilde{\mathbf{C}} \tilde{\mathbf{C}}^\top. The solution is based on factoring M{\bf \mathbf{M}} into a product of simple and sparse matrices using squaring and spectral sparsification. For M{\mathbf{M}} with mm non-zero entries, our algorithm takes work nearly-linear in mm, and polylogarithmic depth on a parallel machine with mm processors. This gives the first sampling algorithm that only requires nearly linear work and nn i.i.d. random univariate Gaussian samples to generate i.i.d. random samples for nn-dimensional Gaussian random fields with SDDM precision matrices. For sampling this natural subclass of Gaussian random fields, it is optimal in the randomness and nearly optimal in the work and parallel complexity. In addition, our sampling algorithm can be directly extended to Gaussian random fields with SDD precision matrices

    An Efficient Parallel Algorithm for Spectral Sparsification of Laplacian and SDDM Matrix Polynomials

    Full text link
    For "large" class C\mathcal{C} of continuous probability density functions (p.d.f.), we demonstrate that for every w∈Cw\in\mathcal{C} there is mixture of discrete Binomial distributions (MDBD) with T≄Nϕw/ÎŽT\geq N\sqrt{\phi_{w}/\delta} distinct Binomial distributions B(⋅,N)B(\cdot,N) that ÎŽ\delta-approximates a discretized p.d.f. w^(i/N)≜w(i/N)/[∑ℓ=0Nw(ℓ/N)]\widehat{w}(i/N)\triangleq w(i/N)/[\sum_{\ell=0}^{N}w(\ell/N)] for all i∈[3:N−3]i\in[3:N-3], where ϕw≄max⁥x∈[0,1]∣w(x)∣\phi_{w}\geq\max_{x\in[0,1]}|w(x)|. Also, we give two efficient parallel algorithms to find such MDBD. Moreover, we propose a sequential algorithm that on input MDBD with N=2kN=2^k for k∈N+k\in\mathbb{N}_{+} that induces a discretized p.d.f. ÎČ\beta, B=D−MB=D-M that is either Laplacian or SDDM matrix and parameter ϔ∈(0,1)\epsilon\in(0,1), outputs in O^(ϔ−2m+ϔ−4nT)\widehat{O}(\epsilon^{-2}m + \epsilon^{-4}nT) time a spectral sparsifier D−M^N≈ϔD−D∑i=0NÎČi(D−1M)iD-\widehat{M}_{N} \approx_{\epsilon} D-D\sum_{i=0}^{N}\beta_{i}(D^{-1} M)^i of a matrix-polynomial, where O^(⋅)\widehat{O}(\cdot) notation hides poly(log⁥n,log⁥N)\mathrm{poly}(\log n,\log N) factors. This improves the Cheng et al.'s [CCLPT15] algorithm whose run time is O^(ϔ−2mN2+NT)\widehat{O}(\epsilon^{-2} m N^2 + NT). Furthermore, our algorithm is parallelizable and runs in work O^(ϔ−2m+ϔ−4nT)\widehat{O}(\epsilon^{-2}m + \epsilon^{-4}nT) and depth O(log⁥N⋅poly(log⁥n)+log⁥T)O(\log N\cdot\mathrm{poly}(\log n)+\log T). Our main algorithmic contribution is to propose the first efficient parallel algorithm that on input continuous p.d.f. w∈Cw\in\mathcal{C}, matrix B=D−MB=D-M as above, outputs a spectral sparsifier of matrix-polynomial whose coefficients approximate component-wise the discretized p.d.f. w^\widehat{w}. Our results yield the first efficient and parallel algorithm that runs in nearly linear work and poly-logarithmic depth and analyzes the long term behaviour of Markov chains in non-trivial settings. In addition, we strengthen the Spielman and Peng's [PS14] parallel SDD solver

    Book of Abstracts of the Sixth SIAM Workshop on Combinatorial Scientific Computing

    Get PDF
    Book of Abstracts of CSC14 edited by Bora UçarInternational audienceThe Sixth SIAM Workshop on Combinatorial Scientific Computing, CSC14, was organized at the Ecole Normale Supérieure de Lyon, France on 21st to 23rd July, 2014. This two and a half day event marked the sixth in a series that started ten years ago in San Francisco, USA. The CSC14 Workshop's focus was on combinatorial mathematics and algorithms in high performance computing, broadly interpreted. The workshop featured three invited talks, 27 contributed talks and eight poster presentations. All three invited talks were focused on two interesting fields of research specifically: randomized algorithms for numerical linear algebra and network analysis. The contributed talks and the posters targeted modeling, analysis, bisection, clustering, and partitioning of graphs, applied in the context of networks, sparse matrix factorizations, iterative solvers, fast multi-pole methods, automatic differentiation, high-performance computing, and linear programming. The workshop was held at the premises of the LIP laboratory of ENS Lyon and was generously supported by the LABEX MILYON (ANR-10-LABX-0070, Université de Lyon, within the program ''Investissements d'Avenir'' ANR-11-IDEX-0007 operated by the French National Research Agency), and by SIAM

    On non-linear network embedding methods

    Get PDF
    As a linear method, spectral clustering is the only network embedding algorithm that offers both a provably fast computation and an advanced theoretical understanding. The accuracy of spectral clustering depends on the Cheeger ratio defined as the ratio between the graph conductance and the 2nd smallest eigenvalue of its normalizedLaplacian. In several graph families whose Cheeger ratio reaches its upper bound of Theta(n), the approximation power of spectral clustering is proven to perform poorly. Moreover, recent non-linear network embedding methods have surpassed spectral clustering by state-of-the-art performance with little to no theoretical understanding to back them. The dissertation includes work that: (1) extends the theory of spectral clustering in order to address its weakness and provide ground for a theoretical understanding of existing non-linear network embedding methods.; (2) provides non-linear extensions of spectral clustering with theoretical guarantees, e.g., via different spectral modification algorithms; (3) demonstrates the potentials of this approach on different types and sizes of graphs from industrial applications; and (4)makes a theory-informed use of artificial networks

    AVATAR - Machine Learning Pipeline Evaluation Using Surrogate Model

    Get PDF
    © 2020, The Author(s). The evaluation of machine learning (ML) pipelines is essential during automatic ML pipeline composition and optimisation. The previous methods such as Bayesian-based and genetic-based optimisation, which are implemented in Auto-Weka, Auto-sklearn and TPOT, evaluate pipelines by executing them. Therefore, the pipeline composition and optimisation of these methods requires a tremendous amount of time that prevents them from exploring complex pipelines to find better predictive models. To further explore this research challenge, we have conducted experiments showing that many of the generated pipelines are invalid, and it is unnecessary to execute them to find out whether they are good pipelines. To address this issue, we propose a novel method to evaluate the validity of ML pipelines using a surrogate model (AVATAR). The AVATAR enables to accelerate automatic ML pipeline composition and optimisation by quickly ignoring invalid pipelines. Our experiments show that the AVATAR is more efficient in evaluating complex pipelines in comparison with the traditional evaluation approaches requiring their execution

    LIPIcs, Volume 261, ICALP 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 261, ICALP 2023, Complete Volum
    corecore