Search CORE

3,586 research outputs found

High-level synthesis optimization for blocked floating-point matrix multiplication

Author: D'Hollander Erik
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2016
Field of study

In the last decade floating-point matrix multiplication on FPGAs has been studied extensively and efficient architectures as well as detailed performance models have been developed. By design these IP cores take a fixed footprint which not necessarily optimizes the use of all available resources. Moreover, the low-level architectures are not easily amenable to a parameterized synthesis. In this paper high-level synthesis is used to fine-tune the configuration parameters in order to achieve the highest performance with maximal resource utilization. An\ exploration strategy is presented to optimize the use of critical resources (DSPs, memory) for any given FPGA. To account for the limited memory size on the FPGA, a block-oriented matrix multiplication is organized such that the block summation is done on the CPU while the block multiplication occurs on the logic fabric simultaneously. The communication overhead between the CPU and the FPGA is minimized by streaming the blocks in a Gray code ordering scheme which maximizes the data reuse for consecutive block matrix product calculations. Using high-level synthesis optimization, the programmable logic operates at 93% of the theoretical peak performance and the combined CPU-FPGA design achieves 76% of the available hardware processing speed for the floating-point multiplication of 2K by 2K matrices

Ghent University Academic Bibliography

Triggering BTeV

Author: Gottschalk Erik
Publication venue
Publication date: 01/01/1999
Field of study

BTeV is a collider experiment at Fermilab designed for precision studies of CP violation and mixing. Unlike most collider experiments, the BTeV detector has a forward geometry that is optimized for the measurement of B and charm decays in a high-rate environment. While the rate of B production gives BTeV an advantage of almost four orders of magnitude over e+e- B factories, the BTeV Level 1 trigger must be able to accept data at a rate of 100 Gigabytes per second, reconstruct tracks and vertices, trigger on B events with high efficiency, and reject minimum bias events by a factor of 100:1. An overview of the Level 1 trigger will be presented.Comment: 6 pages, 3 figures. Contribution to the Proceedings, APS-Division of Particles and Fields Conference, DPF99, UCLA, Los Angeles, CA, Jan. 5-9, 199

arXiv.org e-Print Archive

UNT Digital Library

CERN Document Server

Dynamical Symmetry and Quantum Information Processing with Electromagnetically Induced Transparency

Author: André
Ansari
Arimondo
Bennett
Chanelière
Coffman
Dantan
Dick
Eisaman
Fleischhauer
Fleischhauer
Fleischhauer
Hammerer
Harris
Hau
Hioe
Hioe
Hirota
Hui Jing
Jing
Josse
Juzeliūnas
Kash
Kien
Kuang
Laflamme
Li
Lidar
Liu
Liu
Liu
Liu
Lukin
Lukin
Lukin
Lukin
Mandel
Matsko
Matsukevich
Mewes
Mo-Lin Ge
Nielsen
Paternostro
Phillips
Raczyński
Scully
Shifman
Slosser
Sun
van der Wal
Wang
Wu
Wybourne
Xin Liu
Xiong-Jun Liu
Zibrov
Publication venue: 'Elsevier BV'
Publication date: 01/01/2006
Field of study

We study in detail the interesting dynamical symmetry and its applications in various atomic systems with electromagnetically induced transparency (EIT) in this paper. By discovering the symmetrical Lie group of various atomic systems, such as single-atomic-ensemble composed of complex

m

-level

(m>3)

atoms, and

two

-atomic-ensemble and even multi-atomic-ensemble system composed of of

three

-level atoms etc., one can obtain the general definition of dark-state polaritons (DSPs), and then the dark-states of these different systems. The symmetrical properties of the multi-level system and multi-atomic-ensemble system are shown to be dependent on some characteristic parameters of the EIT system. Furthermore, a controllable scheme to generate quantum entanglement between lights or atoms via quantized DSPs theory is discussed and the robustness of this scheme is analyzed by confirming the validity of adiabatic passage conditions in this paper.Comment: 14pages, 2figures, Phys. Lett. A, In prin

arXiv.org e-Print Archive

CiteSeerX

Crossref

Recommended from our members

Real-time adaptive filtering of dental drill noise using a digital signal processor

Author: Atherton MA
Kaymak E
Millar B
Rotter KRG
Publication venue: The Royal Institute of Technology
Publication date: 01/01/2006
Field of study

The application of noise reduction methods requires the integration of acoustics engineering and digital signal processing, which is well served by a mechatronic approach as described in this paper. The Normalised Least Mean Square (NLMS) algorithm is implemented on the Texas Instruments TMS320C6713 DSK Digital Signal Processor (DSP) as an adaptive digital filter for dental drill noise. Blocks within the Matlab/Simulink Signal Processing Blockset and the Embedded Target for TI C6000 DSP family are used. A working model of the algorithm is then transferred to the Code Composer Studio (CCS), where the desired code can be linked and transferred to the target DSP. The experimental rig comprises a noise reference microphone, a microphone for the desired signal, the DSK and loudspeakers. Different load situations of the dental drill are considered as the noise characteristics change when the drill load changes. The result is that annoying drill noise peaks, which occur in a frequency range from 1.5 kHz to 10 kHz, are filtered out adaptively by the DSP. Additionally a schematic design for its implementation in a dentist’s surgery will also be presented

Brunel University Research Archive

Overview of Parallel Platforms for Common High Performance Computing

Author: Adamec Filip
Fryza Tomas
Marsalek Roman
Prokopec Jan
Svobodova Jitka
Publication venue: Společnost pro radioelektronické inženýrství
Publication date: 01/04/2012
Field of study

The paper deals with various parallel platforms used for high performance computing in the signal processing domain. More precisely, the methods exploiting the multicores central processing units such as message passing interface and OpenMP are taken into account. The properties of the programming methods are experimentally proved in the application of a fast Fourier transform and a discrete cosine transform and they are compared with the possibilities of MATLAB's built-in functions and Texas Instruments digital signal processors with very long instruction word architectures. New FFT and DCT implementations were proposed and tested. The implementation phase was compared with CPU based computing methods and with possibilities of the Texas Instruments digital signal processing library on C6747 floating-point DSPs. The optimal combination of computing methods in the signal processing domain and new, fast routines' implementation is proposed as well

Directory of Open Access Journals

Digital library of Brno University of Technology

Efficient Neural Network Implementations on Parallel Embedded Platforms Applied to Real-Time Torque-Vectoring Optimization Using Predictions for Multi-Motor Electric Vehicles

Author: Cosco Francesco
Dendaluce Jahnke Martin
Gomez-Garay Vicente
Novickis Rihards
Pérez Rastelli Joshué
Publication venue: 'MDPI AG'
Publication date: 01/01/2019
Field of study

The combination of machine learning and heterogeneous embedded platforms enables new potential for developing sophisticated control concepts which are applicable to the field of vehicle dynamics and ADAS. This interdisciplinary work provides enabler solutions -ultimately implementing fast predictions using neural networks (NNs) on field programmable gate arrays (FPGAs) and graphical processing units (GPUs)- while applying them to a challenging application: Torque Vectoring on a multi-electric-motor vehicle for enhanced vehicle dynamics. The foundation motivating this work is provided by discussing multiple domains of the technological context as well as the constraints related to the automotive field, which contrast with the attractiveness of exploiting the capabilities of new embedded platforms to apply advanced control algorithms for complex control problems. In this particular case we target enhanced vehicle dynamics on a multi-motor electric vehicle benefiting from the greater degrees of freedom and controllability offered by such powertrains. Considering the constraints of the application and the implications of the selected multivariable optimization challenge, we propose a NN to provide batch predictions for real-time optimization. This leads to the major contribution of this work: efficient NN implementations on two intrinsically parallel embedded platforms, a GPU and a FPGA, following an analysis of theoretical and practical implications of their different operating paradigms, in order to efficiently harness their computing potential while gaining insight into their peculiarities. The achieved results exceed the expectations and additionally provide a representative illustration of the strengths and weaknesses of each kind of platform. Consequently, having shown the applicability of the proposed solutions, this work contributes valuable enablers also for further developments following similar fundamental principles.Some of the results presented in this work are related to activities within the 3Ccar project, which has received funding from ECSEL Joint Undertaking under grant agreement No. 662192. This Joint Undertaking received support from the European Union’s Horizon 2020 research and innovation programme and Germany, Austria, Czech Republic, Romania, Belgium, United Kingdom, France, Netherlands, Latvia, Finland, Spain, Italy, Lithuania. This work was also partly supported by the project ENABLES3, which received funding from ECSEL Joint Undertaking under grant agreement No. 692455-2

Multidisciplinary Digital Publishing Institute

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Directory of Open Access Journals

TECNALIA Publications

Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions

Author: Bouganis Christos-Savvas
Kouris Alexandros
Venieris Stylianos I.
Publication venue
Publication date: 19/02/2018
Field of study

In the past decade, Convolutional Neural Networks (CNNs) have demonstrated state-of-the-art performance in various Artificial Intelligence tasks. To accelerate the experimentation and development of CNNs, several software frameworks have been released, primarily targeting power-hungry CPUs and GPUs. In this context, reconfigurable hardware in the form of FPGAs constitutes a potential alternative platform that can be integrated in the existing deep learning ecosystem to provide a tunable balance between performance, power consumption and programmability. In this paper, a survey of the existing CNN-to-FPGA toolflows is presented, comprising a comparative study of their key characteristics which include the supported applications, architectural choices, design space exploration methods and achieved performance. Moreover, major challenges and objectives introduced by the latest trends in CNN algorithmic research are identified and presented. Finally, a uniform evaluation methodology is proposed, aiming at the comprehensive, complete and in-depth evaluation of CNN-to-FPGA toolflows.Comment: Accepted for publication at the ACM Computing Surveys (CSUR) journal, 201

arXiv.org e-Print Archive

Spiral - Imperial College Digital Repository