Search CORE

2,165 research outputs found

Exact Sparse Matrix-Vector Multiplication on GPU's and Multicore Architectures

Author: Boyer Brice
Dumas Jean-Guillaume
Giorgi Pascal
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2010
Field of study

We propose different implementations of the sparse matrix--dense vector multiplication (\spmv{}) for finite fields and rings \Zb/m\Zb. We take advantage of graphic card processors (GPU) and multi-core architectures. Our aim is to improve the speed of \spmv{} in the \linbox library, and henceforth the speed of its black box algorithms. Besides, we use this and a new parallelization of the sigma-basis algorithm in a parallel block Wiedemann rank implementation over finite fields

arXiv.org e-Print Archive

CiteSeerX

Crossref

Hal - Université Grenoble Alpes

HAL Descartes

Parallel Integer Polynomial Multiplication

Author: Chen Changbo
Covanov Svyatoslav
Mansouri Farnam
Maza Marc Moreno
Xie Ning
Xie Yuzhen
Publication venue
Publication date: 24/09/2016
Field of study

We propose a new algorithm for multiplying dense polynomials with integer coefficients in a parallel fashion, targeting multi-core processor architectures. Complexity estimates and experimental comparisons demonstrate the advantages of this new approach

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

Generic design of Chinese remaindering schemes

Author: Dumas Jean-Guillaume
Gautier Thierry
Roch Jean-Louis
Publication venue
Publication date: 01/01/2010
Field of study

We propose a generic design for Chinese remainder algorithms. A Chinese remainder computation consists in reconstructing an integer value from its residues modulo non coprime integers. We also propose an efficient linear data structure, a radix ladder, for the intermediate storage and computations. Our design is structured into three main modules: a black box residue computation in charge of computing each residue; a Chinese remaindering controller in charge of launching the computation and of the termination decision; an integer builder in charge of the reconstruction computation. We then show that this design enables many different forms of Chinese remaindering (e.g. deterministic, early terminated, distributed, etc.), easy comparisons between these forms and e.g. user-transparent parallelism at different parallel grains

arXiv.org e-Print Archive

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Highly Scalable Multiplication for Distributed Sparse Multivariate Polynomials on Many-core Systems

Author: C. Augonnet
E. Horowitz
F. Biscani
J. Reinders
M. Frigo
M. Gastineau
M. Gastineau
M. Monagan
M. Monagan
M. Monagan
P.S. Wang
R. Fateman
R.D. Blumofe
S.C. Johnson
Publication venue
Publication date: 01/01/2013
Field of study

We present a highly scalable algorithm for multiplying sparse multivariate polynomials represented in a distributed format. This algo- rithm targets not only the shared memory multicore computers, but also computers clusters or specialized hardware attached to a host computer, such as graphics processing units or many-core coprocessors. The scal- ability on the large number of cores is ensured by the lacks of synchro- nizations, locks and false-sharing during the main parallel step.Comment: 15 pages, 5 figure

arXiv.org e-Print Archive

Crossref

HAL-INSU

HAL-OBSPM

Resolution of Linear Algebra for the Discrete Logarithm Problem Using GPU and Multi-core Architectures

Author: A. Joux
B. Schmidt
C. Lanczos
C. Pomerance
D. Coppersmith
D.H. Wiedemann
E. Thomé
K. Aoki
L.M. Adleman
R. Barbulescu
R. Barbulescu
T. ElGamal
T. Kleinjung
V. Strassen
W. Diffie
Publication venue
Publication date: 01/01/2014
Field of study

In cryptanalysis, solving the discrete logarithm problem (DLP) is key to assessing the security of many public-key cryptosystems. The index-calculus methods, that attack the DLP in multiplicative subgroups of finite fields, require solving large sparse systems of linear equations modulo large primes. This article deals with how we can run this computation on GPU- and multi-core-based clusters, featuring InfiniBand networking. More specifically, we present the sparse linear algebra algorithms that are proposed in the literature, in particular the block Wiedemann algorithm. We discuss the parallelization of the central matrix--vector product operation from both algorithmic and practical points of view, and illustrate how our approach has contributed to the recent record-sized DLP computation in GF(

2^{809}

).Comment: Euro-Par 2014 Parallel Processing, Aug 2014, Porto, Portugal. \<http://europar2014.dcc.fc.up.pt/\&gt

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server