1,267 research outputs found
FFT for the APE Parallel Computer
We present a parallel FFT algorithm for SIMD systems following the `Transpose
Algorithm' approach. The method is based on the assignment of the data field
onto a 1-dimensional ring of systolic cells. The systolic array can be
universally mapped onto any parallel system. In particular for systems with
next-neighbour connectivity our method has the potential to improve the
efficiency of matrix transposition by use of hyper-systolic communication. We
have realized a scalable parallel FFT on the APE100/Quadrics massively parallel
computer, where our implementation is part of a 2-dimensional hydrodynamics
code for turbulence studies. A possible generalization to 4-dimensional FFT is
presented, having in mind QCD applications.Comment: 17 pages, 13 figures, figures include
Chipmunk: A Systolically Scalable 0.9 mm, 3.08 Gop/s/mW @ 1.2 mW Accelerator for Near-Sensor Recurrent Neural Network Inference
Recurrent neural networks (RNNs) are state-of-the-art in voice
awareness/understanding and speech recognition. On-device computation of RNNs
on low-power mobile and wearable devices would be key to applications such as
zero-latency voice-based human-machine interfaces. Here we present Chipmunk, a
small (<1 mm) hardware accelerator for Long-Short Term Memory RNNs in UMC
65 nm technology capable to operate at a measured peak efficiency up to 3.08
Gop/s/mW at 1.24 mW peak power. To implement big RNN models without incurring
in huge memory transfer overhead, multiple Chipmunk engines can cooperate to
form a single systolic array. In this way, the Chipmunk architecture in a 75
tiles configuration can achieve real-time phoneme extraction on a demanding RNN
topology proposed by Graves et al., consuming less than 13 mW of average power
From 4D medical images (CT, MRI, and Ultrasound) to 4D structured mesh models of the left ventricular endocardium for patient-specific simulations
With cardiovascular disease (CVD) remaining the primary cause of death worldwide, early detection of CVDs becomes essential. The intracardiac flow is an important component of ventricular function, motion kinetics, wash-out of ventricular chambers, and ventricular energetics. Coupling between Computational Fluid Dynamics (CFD) simulations and medical images can play a fundamental role in terms of patient-specific diagnostic tools. From a technical perspective, CFD simulations with moving boundaries could easily lead to negative volumes errors and the sudden failure of the simulation. The generation of high-quality 4D meshes (3D in space + time) with 1-to-l vertex becomes essential to perform a CFD simulation with moving boundaries. In this context, we developed a semiautomatic morphing tool able to create 4D high-quality structured meshes starting from a segmented 4D dataset. To prove the versatility and efficiency, the method was tested on three different 4D datasets (Ultrasound, MRI, and CT) by evaluating the quality and accuracy of the resulting 4D meshes. Furthermore, an estimation of some physiological quantities is accomplished for the 4D CT reconstruction. Future research will aim at extending the region of interest, further automation of the meshing algorithm, and generating structured hexahedral mesh models both for the blood and myocardial volume
- …