Search CORE

447 research outputs found

Parallelized preconditioned model building algorithm for matrix factorization

Author: Birbil İlker
Gohari Amir
Kaya Kamer
Öztürk Mehmet Kaan
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/07/2017
Field of study

Matrix factorization is a common task underlying several machine learning applications such as recommender systems, topic modeling, or compressed sensing. Given a large and possibly sparse matrix A, we seek two smaller matrices W and H such that their product is as close to A as possible. The objective is minimizing the sum of square errors in the approximation. Typically such problems involve hundreds of thousands of unknowns, so an optimizer must be exceptionally efficient. In this study, a new algorithm, Preconditioned Model Building is adapted to factorize matrices composed of movie ratings in the MovieLens data sets with 1, 10, and 20 million entries. We present experiments that compare the sequential MATLAB implementation of the PMB algorithm with other algorithms in the minFunc package. We also employ a lock-free sparse matrix factorization algorithm and provide a scalable shared-memory parallel implementation. We show that (a) the optimization performance of the PMB algorithm is comparable to the best algorithms in common use, and (b) the computational performance can be significantly increased with parallelizatio

Sabanci University Research Database

Efficient parallelization strategy for real-time FE simulations

Author: Courtecuisse Hadrien
Zeng Ziqiu
Publication venue
Publication date: 09/06/2023
Field of study

This paper introduces an efficient and generic framework for finite-element simulations under an implicit time integration scheme. Being compatible with generic constitutive models, a fast matrix assembly method exploits the fact that system matrices are created in a deterministic way as long as the mesh topology remains constant. Using the sparsity pattern of the assembled system brings about significant optimizations on the assembly stage. As a result, developed techniques of GPU-based parallelization can be directly applied with the assembled system. Moreover, an asynchronous Cholesky precondition scheme is used to improve the convergence of the system solver. On this basis, a GPU-based Cholesky preconditioner is developed, significantly reducing the data transfer between the CPU/GPU during the solving stage. We evaluate the performance of our method with different mesh elements and hyperelastic models and compare it with typical approaches on the CPU and the GPU

arXiv.org e-Print Archive

Domain decomposition methods for the parallel computation of reacting flows

Author: Keyes David E.
Publication venue
Publication date
Field of study

Domain decomposition is a natural route to parallel computing for partial differential equation solvers. Subdomains of which the original domain of definition is comprised are assigned to independent processors at the price of periodic coordination between processors to compute global parameters and maintain the requisite degree of continuity of the solution at the subdomain interfaces. In the domain-decomposed solution of steady multidimensional systems of PDEs by finite difference methods using a pseudo-transient version of Newton iteration, the only portion of the computation which generally stands in the way of efficient parallelization is the solution of the large, sparse linear systems arising at each Newton step. For some Jacobian matrices drawn from an actual two-dimensional reacting flow problem, comparisons are made between relaxation-based linear solvers and also preconditioned iterative methods of Conjugate Gradient and Chebyshev type, focusing attention on both iteration count and global inner product count. The generalized minimum residual method with block-ILU preconditioning is judged the best serial method among those considered, and parallel numerical experiments on the Encore Multimax demonstrate for it approximately 10-fold speedup on 16 processors

NASA Technical Reports Server