Search CORE

1,812 research outputs found

Hyper-systolic matrix multiplication

Author: Lippert Th.
Palazzari P.
Petkov N.
Schilling K.
Publication venue
Publication date: 01/05/2001
Field of study

ARTS repository - University of Groningen

Hyper-systolic matrix multiplication

Author: Lippert Th.
Palazzari P.
Petkov N.
Schilling K.
Publication venue
Publication date: 01/05/2001
Field of study

University of Groningen

FFT for the APE Parallel Computer

Author: Davies C. T. H.
Federico Toschi
Katz G.
Klaus Schilling
Lippert Th.
Raffaele Tripiccione
Sven Trentmann
Thomas Lippert
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 01/01/1997
Field of study

We present a parallel FFT algorithm for SIMD systems following the `Transpose Algorithm' approach. The method is based on the assignment of the data field onto a 1-dimensional ring of systolic cells. The systolic array can be universally mapped onto any parallel system. In particular for systems with next-neighbour connectivity our method has the potential to improve the efficiency of matrix transposition by use of hyper-systolic communication. We have realized a scalable parallel FFT on the APE100/Quadrics massively parallel computer, where our implementation is part of a 2-dimensional hydrodynamics code for turbulence studies. A possible generalization to 4-dimensional FFT is presented, having in mind QCD applications.Comment: 17 pages, 13 figures, figures include

arXiv.org e-Print Archive

CiteSeerX

Crossref

Archivio istituzionale della ricerca - Università di Ferrara

Juelich Shared Electronic Resources

CERN Document Server

Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Networks

Author: Chandra Vikas
Chau Benson
Esmaeilzadeh Hadi
Kim Joon Kyung
Lai Liangzhen
Park Jongse
Sharma Hardik
Suda Naveen
Publication venue
Publication date: 30/05/2018
Field of study

Fully realizing the potential of acceleration for Deep Neural Networks (DNNs) requires understanding and leveraging algorithmic properties. This paper builds upon the algorithmic insight that bitwidth of operations in DNNs can be reduced without compromising their classification accuracy. However, to prevent accuracy loss, the bitwidth varies significantly across DNNs and it may even be adjusted for each layer. Thus, a fixed-bitwidth accelerator would either offer limited benefits to accommodate the worst-case bitwidth requirements, or lead to a degradation in final accuracy. To alleviate these deficiencies, this work introduces dynamic bit-level fusion/decomposition as a new dimension in the design of DNN accelerators. We explore this dimension by designing Bit Fusion, a bit-flexible accelerator, that constitutes an array of bit-level processing elements that dynamically fuse to match the bitwidth of individual DNN layers. This flexibility in the architecture enables minimizing the computation and the communication at the finest granularity possible with no loss in accuracy. We evaluate the benefits of BitFusion using eight real-world feed-forward and recurrent DNNs. The proposed microarchitecture is implemented in Verilog and synthesized in 45 nm technology. Using the synthesis results and cycle accurate simulation, we compare the benefits of Bit Fusion to two state-of-the-art DNN accelerators, Eyeriss and Stripes. In the same area, frequency, and process technology, BitFusion offers 3.9x speedup and 5.1x energy savings over Eyeriss. Compared to Stripes, BitFusion provides 2.6x speedup and 3.9x energy reduction at 45 nm node when BitFusion area and frequency are set to those of Stripes. Scaling to GPU technology node of 16 nm, BitFusion almost matches the performance of a 250-Watt Titan Xp, which uses 8-bit vector instructions, while BitFusion merely consumes 895 milliwatts of power

arXiv.org e-Print Archive

Crossref

A bibliography on parallel and vector numerical algorithms

Author: Ortega J. M.
Voigt R. G.
Publication venue
Publication date
Field of study

This is a bibliography of numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are listed also

NASA Technical Reports Server