Search CORE

3,625 research outputs found

Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication

Author: Azad Ariful
Ballard Grey
Buluc Aydin
Demmel James
Grigori Laura
Schwartz Oded
Toledo Sivan
Williams Samuel
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2016
Field of study

Sparse matrix-matrix multiplication (or SpGEMM) is a key primitive for many high-performance graph algorithms as well as for some linear solvers, such as algebraic multigrid. The scaling of existing parallel implementations of SpGEMM is heavily bound by communication. Even though 3D (or 2.5D) algorithms have been proposed and theoretically analyzed in the flat MPI model on Erdos-Renyi matrices, those algorithms had not been implemented in practice and their complexities had not been analyzed for the general case. In this work, we present the first ever implementation of the 3D SpGEMM formulation that also exploits multiple (intra-node and inter-node) levels of parallelism, achieving significant speedups over the state-of-the-art publicly available codes at all levels of concurrencies. We extensively evaluate our implementation and identify bottlenecks that should be subject to further research

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

eScholarship - University of California

Hal-Diderot

Efficient Big Integer Multiplication and Squaring Algorithms for Cryptographic Applications

Author: Jahani Shahram
Samsudin Azman
Subramanian Kumbakonam Govindarajan
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2014
Field of study

Public-key cryptosystems are broadly employed to provide security for digital information. Improving the efficiency of public-key cryptosystem through speeding up calculation and using fewer resources are among themain goals of cryptography research. In this paper, we introduce new symbols extracted from binary representation of integers called Big-ones.We present a modified version of the classicalmultiplication and squaring algorithms based on the Big-ones to improve the efficiency of big integermultiplication and squaring in number theory based cryptosystems. Compared to the adopted classical and Karatsuba multiplication algorithms for squaring, the proposed squaring algorithm is 2 to 3.7 and 7.9 to 2.5 times faster for squaring 32-bit and 8-Kbit numbers, respectively. The proposed multiplication algorithm is also 2.3 to 3.9 and 7 to 2.4 times faster for multiplying 32-bit and 8-Kbit numbers, respectively.The number theory based cryptosystems, which are operating in the range of 1-Kbit to 4-Kbit integers, are directly benefited from the proposed method since multiplication and squaring are the main operations in most of these systems

Crossref

Directory of Open Access Journals

Repository@USM

New Structured Matrix Methods for Real and Complex Polynomial Root-finding

Author: Pan Victor Y.
Zheng Ai-Long
Publication venue
Publication date: 23/11/2013
Field of study

We combine the known methods for univariate polynomial root-finding and for computations in the Frobenius matrix algebra with our novel techniques to advance numerical solution of a univariate polynomial equation, and in particular numerical approximation of the real roots of a polynomial. Our analysis and experiments show efficiency of the resulting algorithms.Comment: 18 page

arXiv.org e-Print Archive

CiteSeerX

Analysis of Internally Bandlimited Multistage Cubic-Term Generators for RF Receivers

Author: Hajimiri Ali
Keehr Edward A.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/08/2009
Field of study

Adaptive feedforward error cancellation applied to correct distortion arising from third-order nonlinearities in RF receivers requires low-noise low-power reference cubic nonidealities. Multistage cubic-term generators utilizing cascaded nonlinear operations are ideal in this regard, but the frequency response of the interstage circuitry can introduce errors into the cubing operation. In this paper, an overview of the use of cubic-term generators in receivers relative to other applications is presented. An interstage frequency response plan is presented for a receiver cubic-term generator and is shown to function for arbitrary three-signal third-order intermodulation generation. The noise of such circuits is also considered and is shown to depend on the total incoming signal power across a particular frequency band. Finally, the effects of the interstage group delay are quantified in the context of a relevant communication standard requirement

Caltech Authors