3,586 research outputs found
High-level synthesis optimization for blocked floating-point matrix multiplication
In the last decade floating-point matrix multiplication on FPGAs has been studied extensively and efficient architectures as well as detailed performance models have been developed. By design these IP cores take a fixed footprint which not necessarily optimizes the use of all available resources. Moreover, the low-level architectures are not easily amenable to a parameterized synthesis. In this paper high-level synthesis is used to fine-tune the configuration parameters in order to achieve the highest performance with maximal resource utilization. An\ exploration strategy is presented to optimize the use of critical resources (DSPs, memory) for any given FPGA. To account for the limited memory size on the FPGA, a block-oriented matrix multiplication is organized such that the block summation is done on the CPU while the block multiplication occurs on the logic fabric simultaneously. The communication overhead between the CPU and the FPGA is minimized by streaming the blocks in a Gray code ordering scheme which maximizes the data reuse for consecutive block matrix product calculations. Using high-level synthesis optimization, the programmable logic operates at 93% of the theoretical peak performance and the combined CPU-FPGA design achieves 76% of the available hardware processing speed for the floating-point multiplication of 2K by 2K matrices
Triggering BTeV
BTeV is a collider experiment at Fermilab designed for precision studies of
CP violation and mixing. Unlike most collider experiments, the BTeV detector
has a forward geometry that is optimized for the measurement of B and charm
decays in a high-rate environment. While the rate of B production gives BTeV an
advantage of almost four orders of magnitude over e+e- B factories, the BTeV
Level 1 trigger must be able to accept data at a rate of 100 Gigabytes per
second, reconstruct tracks and vertices, trigger on B events with high
efficiency, and reject minimum bias events by a factor of 100:1. An overview of
the Level 1 trigger will be presented.Comment: 6 pages, 3 figures. Contribution to the Proceedings, APS-Division of
Particles and Fields Conference, DPF99, UCLA, Los Angeles, CA, Jan. 5-9, 199
Dynamical Symmetry and Quantum Information Processing with Electromagnetically Induced Transparency
We study in detail the interesting dynamical symmetry and its applications in
various atomic systems with electromagnetically induced transparency (EIT) in
this paper. By discovering the symmetrical Lie group of various atomic systems,
such as single-atomic-ensemble composed of complex -level atoms, and
-atomic-ensemble and even multi-atomic-ensemble system composed of of
-level atoms etc., one can obtain the general definition of dark-state
polaritons (DSPs), and then the dark-states of these different systems. The
symmetrical properties of the multi-level system and multi-atomic-ensemble
system are shown to be dependent on some characteristic parameters of the EIT
system. Furthermore, a controllable scheme to generate quantum entanglement
between lights or atoms via quantized DSPs theory is discussed and the
robustness of this scheme is analyzed by confirming the validity of adiabatic
passage conditions in this paper.Comment: 14pages, 2figures, Phys. Lett. A, In prin
Recommended from our members
Real-time adaptive filtering of dental drill noise using a digital signal processor
The application of noise reduction methods requires the integration of acoustics engineering and digital signal processing, which is well served by a mechatronic approach as described in this paper. The Normalised Least Mean Square (NLMS) algorithm is implemented on the Texas Instruments TMS320C6713 DSK Digital Signal Processor (DSP) as an adaptive digital filter for dental drill noise. Blocks within the Matlab/Simulink Signal Processing Blockset and the Embedded Target for TI C6000 DSP family are used. A working model of the algorithm is then transferred to the Code Composer Studio (CCS), where the desired code can be linked and transferred to the target DSP. The experimental rig comprises a noise reference microphone, a microphone for the desired signal, the DSK and loudspeakers. Different load situations of the dental drill are considered as the noise characteristics change when the drill load changes. The result is that annoying drill noise peaks, which occur in a frequency range from 1.5 kHz to 10 kHz, are filtered out adaptively by the DSP. Additionally a schematic design for its implementation in a dentist’s surgery will also be presented
Overview of Parallel Platforms for Common High Performance Computing
The paper deals with various parallel platforms used for high performance computing in the signal processing domain. More precisely, the methods exploiting the multicores central processing units such as message passing interface and OpenMP are taken into account. The properties of the programming methods are experimentally proved in the application of a fast Fourier transform and a discrete cosine transform and they are compared with the possibilities of MATLAB's built-in functions and Texas Instruments digital signal processors with very long instruction word architectures. New FFT and DCT implementations were proposed and tested. The implementation phase was compared with CPU based computing methods and with possibilities of the Texas Instruments digital signal processing library on C6747 floating-point DSPs. The optimal combination of computing methods in the signal processing domain and new, fast routines' implementation is proposed as well
Efficient Neural Network Implementations on Parallel Embedded Platforms Applied to Real-Time Torque-Vectoring Optimization Using Predictions for Multi-Motor Electric Vehicles
The combination of machine learning and heterogeneous embedded platforms enables new potential for developing sophisticated control concepts which are applicable to the field of vehicle dynamics and ADAS. This interdisciplinary work provides enabler solutions -ultimately implementing fast predictions using neural networks (NNs) on field programmable gate arrays (FPGAs) and graphical processing units (GPUs)- while applying them to a challenging application: Torque Vectoring on a multi-electric-motor vehicle for enhanced vehicle dynamics. The foundation motivating this work is provided by discussing multiple domains of the technological context as well as the constraints related to the automotive field, which contrast with the attractiveness of exploiting the capabilities of new embedded platforms to apply advanced control algorithms for complex control problems. In this particular case we target enhanced vehicle dynamics on a multi-motor electric vehicle benefiting from the greater degrees of freedom and controllability offered by such powertrains. Considering the constraints of the application and the implications of the selected multivariable optimization challenge, we propose a NN to provide batch predictions for real-time optimization. This leads to the major contribution of this work: efficient NN implementations on two intrinsically parallel embedded platforms, a GPU and a FPGA, following an analysis of theoretical and practical implications of their different operating paradigms, in order to efficiently harness their computing potential while gaining insight into their peculiarities. The achieved results exceed the expectations and additionally provide a representative illustration of the strengths and weaknesses of each kind of platform. Consequently, having shown the applicability of the proposed solutions, this work contributes valuable enablers also for further developments following similar fundamental principles.Some of the results presented in this work are related to activities within the 3Ccar project, which has
received funding from ECSEL Joint Undertaking under grant agreement No. 662192. This Joint Undertaking
received support from the European Union’s Horizon 2020 research and innovation programme and Germany,
Austria, Czech Republic, Romania, Belgium, United Kingdom, France, Netherlands, Latvia, Finland, Spain, Italy,
Lithuania. This work was also partly supported by the project ENABLES3, which received funding from ECSEL
Joint Undertaking under grant agreement No. 692455-2
Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions
In the past decade, Convolutional Neural Networks (CNNs) have demonstrated
state-of-the-art performance in various Artificial Intelligence tasks. To
accelerate the experimentation and development of CNNs, several software
frameworks have been released, primarily targeting power-hungry CPUs and GPUs.
In this context, reconfigurable hardware in the form of FPGAs constitutes a
potential alternative platform that can be integrated in the existing deep
learning ecosystem to provide a tunable balance between performance, power
consumption and programmability. In this paper, a survey of the existing
CNN-to-FPGA toolflows is presented, comprising a comparative study of their key
characteristics which include the supported applications, architectural
choices, design space exploration methods and achieved performance. Moreover,
major challenges and objectives introduced by the latest trends in CNN
algorithmic research are identified and presented. Finally, a uniform
evaluation methodology is proposed, aiming at the comprehensive, complete and
in-depth evaluation of CNN-to-FPGA toolflows.Comment: Accepted for publication at the ACM Computing Surveys (CSUR) journal,
201
- …