Search CORE

957 research outputs found

Computing the fast Fourier transform on SIMD microprocessors

Author: Blake Anthony Martin
Publication venue: 'University of Waikato'
Publication date: 18/06/2012
Field of study

This thesis describes how to compute the fast Fourier transform (FFT) of a power-of-two length signal on single-instruction, multiple-data (SIMD) microprocessors faster than or very close to the speed of state of the art libraries such as FFTW (“Fastest Fourier Transform in the West”), SPIRAL and Intel Integrated Performance Primitives (IPP). The conjugate-pair algorithm has advantages in terms of memory bandwidth, and three implementations of this algorithm, which incorporate latency and spatial locality optimizations, are automatically vectorized at the algorithm level of abstraction. Performance results on 2- way, 4-way and 8-way SIMD machines show that the performance scales much better than FFTW or SPIRAL. The implementations presented in this thesis are compiled into a high-performance FFT library called SFFT (“Streaming Fast Fourier Trans- form”), and benchmarked against FFTW, SPIRAL, Intel IPP and Apple Accelerate on sixteen x86 machines and two ARM NEON machines, and shown to be, in many cases, faster than these state of the art libraries, but without having to perform extensive machine specific calibration, thus demonstrating that there are good heuristics for predicting the performance of the FFT on SIMD microprocessors (i.e., the need for empirical optimization may be overstated)

Research Commons@Waikato

The cosmological simulation code GADGET-2

Author: Abel
Appel
Ascasibar
Bagla
Bagla
Balsara
Barnes
Barnes
Bate
Bode
Bode
Bonnell
Boss
Bryan
Burkert
Cen
Cen
Cen
Couchman
Couchman
Cox
Cuadra
Davé
Davé
Dehnen
Di Matteo
Dolag
Dolag
Dolag
Dolag
Dubinski
Dubinski
Duncan
Efstathiou
Evrard
Evrard
Frenk
Fryxell
Fukushige
Gao
Gingold
Gnedin
Hairer
Heitmann
Hernquist
Hernquist
Hernquist
Hernquist
Hernquist
Hockney
Hut
Jenkins
Jenkins
Jernigan
Jubelgas
Kang
Katz
Kay
Klein
Klypin
Knebe
Kravtsov
Kravtsov
Kravtsov
Linder
Lucy
Makino
Makino
Makino
Marri
Monaghan
Monaghan
Monaghan
Monaghan
Motl
Navarro
Navarro
Norman
O'Shea
O'Shea
Owen
Pen
Poludnenko
Power
Quilis
Quinn
Rasio
Refregier
Saha
Salmon
Scannapieco
Serna
Serna
Sommer-Larsen
Springel
Springel
Springel
Springel
Springel
Springel
Springel
Springel
Stadel
Steinmetz
Steinmetz
Teyssier
Tissera
Tormen
Tornatore
Tornatore
Van Den Bosch
Volker Springel
Wadsley
Warren
Warren
White
White
White
Whitehurst
Xu
Yepes
Yoshida
Yoshida
Publication venue: 'Wiley'
Publication date: 01/01/2005
Field of study

We discuss the cosmological simulation code GADGET-2, a new massively parallel TreeSPH code, capable of following a collisionless fluid with the N-body method, and an ideal gas by means of smoothed particle hydrodynamics (SPH). Our implementation of SPH manifestly conserves energy and entropy in regions free of dissipation, while allowing for fully adaptive smoothing lengths. Gravitational forces are computed with a hierarchical multipole expansion, which can optionally be applied in the form of a TreePM algorithm, where only short-range forces are computed with the `tree'-method while long-range forces are determined with Fourier techniques. Time integration is based on a quasi-symplectic scheme where long-range and short-range forces can be integrated with different timesteps. Individual and adaptive short-range timesteps may also be employed. The domain decomposition used in the parallelisation algorithm is based on a space-filling curve, resulting in high flexibility and tree force errors that do not depend on the way the domains are cut. The code is efficient in terms of memory consumption and required communication bandwidth. It has been used to compute the first cosmological N-body simulation with more than 10^10 dark matter particles, reaching a homogeneous spatial dynamic range of 10^5 per dimension in a 3D box. It has also been used to carry out very large cosmological SPH simulations that account for radiative cooling and star formation, reaching total particle numbers of more than 250 million. We present the algorithms used by the code and discuss their accuracy and performance using a number of test problems. GADGET-2 is publicly released to the research community.Comment: submitted to MNRAS, 31 pages, 20 figures (reduced resolution), code available at http://www.mpa-garching.mpg.de/gadge

arXiv.org e-Print Archive

Distributed watermarking for secure control of microgrids under replay attacks

Author: Cavraro
Chen
Cheng
Ferrari
Guerrero
Meng
Meng
Miao
Mo
Sandberg
Tucci
Tucci
Zhao
Publication venue
Publication date: 01/01/2018
Field of study

The problem of replay attacks in the communication network between Distributed Generation Units (DGUs) of a DC microgrid is examined. The DGUs are regulated through a hierarchical control architecture, and are networked to achieve secondary control objectives. Following analysis of the detectability of replay attacks by a distributed monitoring scheme previously proposed, the need for a watermarking signal is identified. Hence, conditions are given on the watermark in order to guarantee detection of replay attacks, and such a signal is designed. Simulations are then presented to demonstrate the effectiveness of the technique

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Archivio Istituzionale della Ricerca - Università degli Studi di Pavia

UCL Discovery

A mathematical approach to a low power FFT architecture

Author: Stevens Kenneth
Suter Bruce W.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1998
Field of study

Journal ArticleArchitecture and circuit design are the two most effective means of reducing power in CMOS VLSI. Mathematical manipulations have been applied to create a power efficient architecture of an FFT. This architecture has been implemented in asynchronous circuit technology that achieves significant power reduction over other FFT architectures. Multirate signal processing concepts are applied to the FFT to localize communication and remove the need for globally shared results in the FFT computation. A novel architecture is produced from the polyphase components that is mapped to an synchronous implementation. The asynchronous design continues the localization of communication and can be designed using standard cell libraries such as radiation-tolerant libraries for space electronics. We present a methodology based on multirate signal processing techniques and asynchronous design style that supports significant reduction in power over conventional design practices. A test chip implementing part of this design has been fabricated and power comparisons have been made

The University of Utah: J. Willard Marriott Digital Library

Determining an Out-of-Core FFT Decomposition Strategy for Parallel Disks by Dynamic Programming

Author: Cormen Thomas H
Publication venue: Dartmouth Digital Commons
Publication date: 01/09/1996
Field of study

We present an out-of-core FFT algorithm based on the in-core FFT method developed by Swarztrauber. Our algorithm uses a recursive divide-and-conquer strategy, and each stage in the recursion presents several possibilities for how to split the problem into subproblems. We give a recurrence for the algorithm\u27s I/O complexity on the Parallel Disk Model and show how to use dynamic programming to determine optimal splits at each recursive stage. The algorithm to determine the optimal splits takes only Theta(lg^2 N) time for an N-point FFT, and it is practical. The out-of-core FFT algorithm itself takes considerably longer

Dartmouth Digital Commons (Dartmouth College)