1,267 research outputs found

    FFT for the APE Parallel Computer

    Get PDF
    We present a parallel FFT algorithm for SIMD systems following the `Transpose Algorithm' approach. The method is based on the assignment of the data field onto a 1-dimensional ring of systolic cells. The systolic array can be universally mapped onto any parallel system. In particular for systems with next-neighbour connectivity our method has the potential to improve the efficiency of matrix transposition by use of hyper-systolic communication. We have realized a scalable parallel FFT on the APE100/Quadrics massively parallel computer, where our implementation is part of a 2-dimensional hydrodynamics code for turbulence studies. A possible generalization to 4-dimensional FFT is presented, having in mind QCD applications.Comment: 17 pages, 13 figures, figures include

    Chipmunk: A Systolically Scalable 0.9 mm2{}^2, 3.08 Gop/s/mW @ 1.2 mW Accelerator for Near-Sensor Recurrent Neural Network Inference

    Get PDF
    Recurrent neural networks (RNNs) are state-of-the-art in voice awareness/understanding and speech recognition. On-device computation of RNNs on low-power mobile and wearable devices would be key to applications such as zero-latency voice-based human-machine interfaces. Here we present Chipmunk, a small (<1 mm2{}^2) hardware accelerator for Long-Short Term Memory RNNs in UMC 65 nm technology capable to operate at a measured peak efficiency up to 3.08 Gop/s/mW at 1.24 mW peak power. To implement big RNN models without incurring in huge memory transfer overhead, multiple Chipmunk engines can cooperate to form a single systolic array. In this way, the Chipmunk architecture in a 75 tiles configuration can achieve real-time phoneme extraction on a demanding RNN topology proposed by Graves et al., consuming less than 13 mW of average power

    From 4D medical images (CT, MRI, and Ultrasound) to 4D structured mesh models of the left ventricular endocardium for patient-specific simulations

    Get PDF
    With cardiovascular disease (CVD) remaining the primary cause of death worldwide, early detection of CVDs becomes essential. The intracardiac flow is an important component of ventricular function, motion kinetics, wash-out of ventricular chambers, and ventricular energetics. Coupling between Computational Fluid Dynamics (CFD) simulations and medical images can play a fundamental role in terms of patient-specific diagnostic tools. From a technical perspective, CFD simulations with moving boundaries could easily lead to negative volumes errors and the sudden failure of the simulation. The generation of high-quality 4D meshes (3D in space + time) with 1-to-l vertex becomes essential to perform a CFD simulation with moving boundaries. In this context, we developed a semiautomatic morphing tool able to create 4D high-quality structured meshes starting from a segmented 4D dataset. To prove the versatility and efficiency, the method was tested on three different 4D datasets (Ultrasound, MRI, and CT) by evaluating the quality and accuracy of the resulting 4D meshes. Furthermore, an estimation of some physiological quantities is accomplished for the 4D CT reconstruction. Future research will aim at extending the region of interest, further automation of the meshing algorithm, and generating structured hexahedral mesh models both for the blood and myocardial volume

    Nové knihy

    Get PDF
    corecore