Search CORE

26,758 research outputs found

On Polynomial Multiplication in Chebyshev Basis

Author: Giorgi Pascal
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 09/09/2013
Field of study

In a recent paper Lima, Panario and Wang have provided a new method to multiply polynomials in Chebyshev basis which aims at reducing the total number of multiplication when polynomials have small degree. Their idea is to use Karatsuba's multiplication scheme to improve upon the naive method but without being able to get rid of its quadratic complexity. In this paper, we extend their result by providing a reduction scheme which allows to multiply polynomial in Chebyshev basis by using algorithms from the monomial basis case and therefore get the same asymptotic complexity estimate. Our reduction allows to use any of these algorithms without converting polynomials input to monomial basis which therefore provide a more direct reduction scheme then the one using conversions. We also demonstrate that our reduction is efficient in practice, and even outperform the performance of the best known algorithm for Chebyshev basis when polynomials have large degree. Finally, we demonstrate a linear time equivalence between the polynomial multiplication problem under monomial basis and under Chebyshev basis

arXiv.org e-Print Archive

Crossref

Parallel Integer Polynomial Multiplication

Author: Chen Changbo
Covanov Svyatoslav
Mansouri Farnam
Maza Marc Moreno
Xie Ning
Xie Yuzhen
Publication venue
Publication date: 24/09/2016
Field of study

We propose a new algorithm for multiplying dense polynomials with integer coefficients in a parallel fashion, targeting multi-core processor architectures. Complexity estimates and experimental comparisons demonstrate the advantages of this new approach

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

A low multiplicative complexity fast recursive DCT-2 algorithm

Author: Petrovsky Alexander
Vashkevich Maxim
Publication venue
Publication date: 26/07/2012
Field of study

A fast Discrete Cosine Transform (DCT) algorithm is introduced that can be of particular interest in image processing. The main features of the algorithm are regularity of the graph and very low arithmetic complexity. The 16-point version of the algorithm requires only 32 multiplications and 81 additions. The computational core of the algorithm consists of only 17 nontrivial multiplications, the rest 15 are scaling factors that can be compensated in the post-processing. The derivation of the algorithm is based on the algebraic signal processing theory (ASP).Comment: 4 pages, 2 figure

arXiv.org e-Print Archive

Exact Sparse Matrix-Vector Multiplication on GPU's and Multicore Architectures

Author: Boyer Brice
Dumas Jean-Guillaume
Giorgi Pascal
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2010
Field of study

We propose different implementations of the sparse matrix--dense vector multiplication (\spmv{}) for finite fields and rings \Zb/m\Zb. We take advantage of graphic card processors (GPU) and multi-core architectures. Our aim is to improve the speed of \spmv{} in the \linbox library, and henceforth the speed of its black box algorithms. Besides, we use this and a new parallelization of the sigma-basis algorithm in a parallel block Wiedemann rank implementation over finite fields

arXiv.org e-Print Archive

CiteSeerX

Crossref

Hal - Université Grenoble Alpes

HAL Descartes

GiMMiK - Generating Bespoke Matrix Multiplication Kernels for Accelerators: Application to High-Order Computational Fluid Dynamics

Author: Kelly PHJ
Russell FP
Vincent PE
Witherden FD
Wozniak BD
Publication venue: 'Elsevier BV'
Publication date: 21/12/2015
Field of study

Spiral - Imperial College Digital Repository

Analysis of Parallel Montgomery Multiplication in CUDA

Author: Liu Yuheng
Publication venue: SJSU ScholarWorks
Publication date: 01/04/2013
Field of study

For a given level of security, elliptic curve cryptography (ECC) offers improved efficiency over classic public key implementations. Point multiplication is the most common operation in ECC and, consequently, any significant improvement in perfor- mance will likely require accelerating point multiplication. In ECC, the Montgomery algorithm is widely used for point multiplication. The primary purpose of this project is to implement and analyze a parallel implementation of the Montgomery algorithm as it is used in ECC. Specifically, the performance of CPU-based Montgomery multiplication and a GPU-based implementation in CUDA are compared

SJSU ScholarWorks