Search CORE

1,547 research outputs found

Parallel multiplication and powering of polynomials

Author: Ponder Carl G.
Publication venue: Published by Elsevier Ltd.
Publication date: 30/04/1991
Field of study

AbstractThis paper examines the most efficient known serial and parallel algorithms for multiplying and powering polynomials. For sparse polynomials the Simp algorithm multiplies using a simple divide and conquer approach, and the NOMC algorithm computes powers using a multinomial expansion. For dense polynomials the FFT multiplies and powers by evaluating polynomials at a set of points, performing pointwise multiplication or powering, and interpolating a polynomial through the results. Practical issues of applying these algorithms in algebraic manipulation systems are discussed

Elsevier - Publisher Connector

Gate-Level Simulation of Quantum Circuits

Author: George F. Viamontes
Igor L. Markov
John
John P. Hayes
Manoj Rajagopalan
Publication venue
Publication date: 01/08/2002
Field of study

While thousands of experimental physicists and chemists are currently trying to build scalable quantum computers, it appears that simulation of quantum computation will be at least as critical as circuit simulation in classical VLSI design. However, since the work of Richard Feynman in the early 1980s little progress was made in practical quantum simulation. Most researchers focused on polynomial-time simulation of restricted types of quantum circuits that fall short of the full power of quantum computation. Simulating quantum computing devices and useful quantum algorithms on classical hardware now requires excessive computational resources, making many important simulation tasks infeasible. In this work we propose a new technique for gate-level simulation of quantum circuits which greatly reduces the difficulty and cost of such simulations. The proposed technique is implemented in a simulation tool called the Quantum Information Decision Diagram (QuIDD) and evaluated by simulating Grover's quantum search algorithm. The back-end of our package, QuIDD Pro, is based on Binary Decision Diagrams, well-known for their ability to efficiently represent many seemingly intractable combinatorial structures. This reliance on a well-established area of research allows us to take advantage of existing software for BDD manipulation and achieve unparalleled empirical results for quantum simulation

arXiv.org e-Print Archive

CiteSeerX

Crossref

CERN Document Server

FORM version 4.0

Author: Berlekamp
Blumlein
Char
Gailly
Geddes
Gorishnii
J. Kuipers
J. Vollinga
J.A.M. Vermaseren
Knuth
Kuipers
Lewis
Strubbe
T. Ueda
Tentyukov
Tentyukov
Tentyukov
Tentyukov
Vermaseren
Wang
Zassenhaus
Zippel
Publication venue: 'Elsevier BV'
Publication date: 29/03/2012
Field of study

We present version 4.0 of the symbolic manipulation system FORM. The most important new features are manipulation of rational polynomials and the factorization of expressions. Many other new functions and commands are also added; some of them are very general, while others are designed for building specific high level packages, such as one for Groebner bases. New is also the checkpoint facility, that allows for periodic backups during long calculations. Lastly, FORM 4.0 has become available as open source under the GNU General Public License version 3.Comment: 26 pages. Uses axodra

arXiv.org e-Print Archive

Crossref

High Performance Sparse Multivariate Polynomials: Fundamental Data Structures and Algorithms

Author: Brandt Alex
Publication venue: Scholarship@Western
Publication date: 24/08/2018
Field of study

Polynomials may be represented sparsely in an effort to conserve memory usage and provide a succinct and natural representation. Moreover, polynomials which are themselves sparse – have very few non-zero terms – will have wasted memory and computation time if represented, and operated on, densely. This waste is exacerbated as the number of variables increases. We provide practical implementations of sparse multivariate data structures focused on data locality and cache complexity. We look to develop high-performance algorithms and implementations of fundamental polynomial operations, using these sparse data structures, such as arithmetic (addition, subtraction, multiplication, and division) and interpolation. We revisit a sparse arithmetic scheme introduced by Johnson in 1974, adapting and optimizing these algorithms for modern computer architectures, with our implementations over the integers and rational numbers vastly outperforming the current wide-spread implementations. We develop a new algorithm for sparse pseudo-division based on the sparse polynomial division algorithm, with very encouraging results. Polynomial interpolation is explored through univariate, dense multivariate, and sparse multivariate methods. Arithmetic and interpolation together form a solid high-performance foundation from which many higher-level and more interesting algorithms can be built

Scholarship@Western

Towards Comprehensive Parametric Code Generation Targeting Graphics Processing Units in Support of Scientific Computation

Author: Xie Ning
Publication venue: Scholarship@Western
Publication date: 08/11/2016
Field of study

The most popular multithreaded languages based on the fork-join concurrency model (CIlkPlus, OpenMP) are currently being extended to support other forms of parallelism (vectorization, pipelining and single-instruction-multiple-data (SIMD)). In the SIMD case, the objective is to execute the corresponding code on a many-core device, like a GPGPU, for which the CUDA language is a natural choice. Since the programming concepts of CilkPlus and OpenMP are very different from those of CUDA, it is desirable to automatically generate optimized CUDA-like code from CilkPlus or OpenMP. In this thesis, we propose an accelerator model for annotated C/C++ code together with an implementation that allows the automatic generation of CUDA code. One of the key features of this CUDA code generator is that it supports the generation of CUDA kernel code where program parameters (like number of threads per block) and machine parameters (like shared memory size) are treated as unknown symbols. Hence, these parameters need not to be known at code-generation-time: machine parameters and program parameters can be respectively determined when the generated code is installed on the target machine. In addition, we show how these parametric CUDA programs can be optimized at compile-time in the form of a case discussion, where cases depend on the values of machine parameters (e.g. hardware resource limits) and program parameters (e.g. dimension sizes of thread-blocks). This generation of parametric CUDA kernels requires to deal with non-linear polynomial expressions during the dependence analysis and tiling phase. To achieve these algebraic calculations, we take advantage of techniques from computer algebra, in particular in the RegularChains library of Maple. Various illustrative examples are provided together with performance evaluation

Scholarship@Western

Enabling Massive Deep Neural Networks with the GraphBLAS

Author: Kepner Jeremy
Kumar Manoj
Moreira José
Pattnaik Pratap
Serrano Mauricio
Tufo Henry
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 08/08/2017
Field of study

Deep Neural Networks (DNNs) have emerged as a core tool for machine learning. The computations performed during DNN training and inference are dominated by operations on the weight matrices describing the DNN. As DNNs incorporate more stages and more nodes per stage, these weight matrices may be required to be sparse because of memory limitations. The GraphBLAS.org math library standard was developed to provide high performance manipulation of sparse weight matrices and input/output vectors. For sufficiently sparse matrices, a sparse matrix library requires significantly less memory than the corresponding dense matrix implementation. This paper provides a brief description of the mathematics underlying the GraphBLAS. In addition, the equations of a typical DNN are rewritten in a form designed to use the GraphBLAS. An implementation of the DNN is given using a preliminary GraphBLAS C library. The performance of the GraphBLAS implementation is measured relative to a standard dense linear algebra library implementation. For various sizes of DNN weight matrices, it is shown that the GraphBLAS sparse implementation outperforms a BLAS dense implementation as the weight matrix becomes sparser.Comment: 10 pages, 7 figures, to appear in the 2017 IEEE High Performance Extreme Computing (HPEC) conferenc

arXiv.org e-Print Archive

Crossref