477 research outputs found

    High throughput spatial convolution filters on FPGAs

    Get PDF
    Digital signal processing (DSP) on field- programmable gate arrays (FPGAs) has long been appealing because of the inherent parallelism in these computations that can be easily exploited to accelerate such algorithms. FPGAs have evolved significantly to further enhance the mapping of these algorithms, included additional hard blocks, such as the DSP blocks found in modern FPGAs. Although these DSP blocks can offer more efficient mapping of DSP computations, they are primarily designed for 1-D filter structures. We present a study on spatial convolutional filter implementations on FPGAs, optimizing around the structure of the DSP blocks to offer high throughput while maintaining the coefficient flexibility that other published architectures usually sacrifice. We show that it is possible to implement large filters for large 4K resolution image frames at frame rates of 30–60 FPS, while maintaining functional flexibility

    Implementation of a Real-Time Beamforming System on Field Programmable Gate Array

    Get PDF
    Beamforming is an important technique in array signal processing and wireless communication systems. In this project, we investigate the Minimum Variance Distortionless Response (MVDR) beamforming technique and its implementation. The QR-RLS algorithm is chosen because of its advantages of numerical stability and systolic array architecture. The team successfully implemented the real-time beamforming of a linear array with 3 receiving antennas on a Xilinx Virtex-5 FPGA platform. Both the simulation and hardware implementation results are presented in this report

    Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions

    Get PDF
    In the past decade, Convolutional Neural Networks (CNNs) have demonstrated state-of-the-art performance in various Artificial Intelligence tasks. To accelerate the experimentation and development of CNNs, several software frameworks have been released, primarily targeting power-hungry CPUs and GPUs. In this context, reconfigurable hardware in the form of FPGAs constitutes a potential alternative platform that can be integrated in the existing deep learning ecosystem to provide a tunable balance between performance, power consumption and programmability. In this paper, a survey of the existing CNN-to-FPGA toolflows is presented, comprising a comparative study of their key characteristics which include the supported applications, architectural choices, design space exploration methods and achieved performance. Moreover, major challenges and objectives introduced by the latest trends in CNN algorithmic research are identified and presented. Finally, a uniform evaluation methodology is proposed, aiming at the comprehensive, complete and in-depth evaluation of CNN-to-FPGA toolflows.Comment: Accepted for publication at the ACM Computing Surveys (CSUR) journal, 201

    Implementation of Directional Median Filtering using Field Programmable Gate Arrays

    Get PDF
    Median filtering is a non-linear filtering technique which is effective in removing impulsive noise from data. In this thesis, directional median filtering has been implemented using cumulative histogram of samples in several directions. Different methods to implement directional median filtering have been proposed. The filtered images are smoothed along the direction of the filtering window. All implementations aimed to generate outputs in the least amount of time, while reducing the resource utilization on hardware. The implementation methods were designed for Xilinx Virtex 5 FPGA devices but were also attempted on Spartan 3E. The proposed methods used less than 30% of the resources on Virtex 5 FPGA but the resource utilization on Spartan 3E exceeded the number of available resources. After an initial delay, methods 1 and 2 generate a new output for every 5 clock cycles while method 3 generates an output for every 1.5 clock cycles

    Acceleration Techniques for Sparse Recovery Based Plane-wave Decomposition of a Sound Field

    Get PDF
    Plane-wave decomposition by sparse recovery is a reliable and accurate technique for plane-wave decomposition which can be used for source localization, beamforming, etc. In this work, we introduce techniques to accelerate the plane-wave decomposition by sparse recovery. The method consists of two main algorithms which are spherical Fourier transformation (SFT) and sparse recovery. Comparing the two algorithms, the sparse recovery is the most computationally intensive. We implement the SFT on an FPGA and the sparse recovery on a multithreaded computing platform. Then the multithreaded computing platform could be fully utilized for the sparse recovery. On the other hand, implementing the SFT on an FPGA helps to flexibly integrate the microphones and improve the portability of the microphone array. For implementing the SFT on an FPGA, we develop a scalable FPGA design model that enables the quick design of the SFT architecture on FPGAs. The model considers the number of microphones, the number of SFT channels and the cost of the FPGA and provides the design of a resource optimized and cost-effective FPGA architecture as the output. Then we investigate the performance of the sparse recovery algorithm executed on various multithreaded computing platforms (i.e., chip-multiprocessor, multiprocessor, GPU, manycore). Finally, we investigate the influence of modifying the dictionary size on the computational performance and the accuracy of the sparse recovery algorithms. We introduce novel sparse-recovery techniques which use non-uniform dictionaries to improve the performance of the sparse recovery on a parallel architecture

    Digital Communication System with High Security and High Immunity

    Get PDF
    Today, security issues are increased due to huge data transmissions over communication media such as mobile phones, TV cables, online games, Wi-Fi and satellite transmission etc. for uses such as medical, military or entertainment. This creates a challenge for government and commercial companies to keep these data transmissions secure. Traditional secure ciphers, either block ciphers such as Advanced Encryption Standard (AES) or stream ciphers, are not fast or completely secure. However, the unique properties of a chaotic system, such as structure complexity, deterministic dynamics, random output response and extreme sensitivity to the initial condition, make it motivating for researchers in the field of communication system security. These properties establish an increased relationship between chaos and cryptography that create strong and fast cipher compared to conventional algorithms, which are weak and slow ciphers. Additionally, chaotic synchronisation has sparked many studies on the application of chaos in communication security, for example, the chaotic synchronisation between two different systems in which the transmitter (master system) is driving the receiver (slave system) by its output signal. For this reason, it is essential to design a secure communication system for data transmission in noisy environments that robust to different types of attacks (such as a brute force attack). In this thesis, a digital communication system with high immunity and security, based on a Lorenz stream cipher chaotic signal, has been perfectly applied. A new cryptosystem approach based on Lorenz chaotic systems was designed for secure data transmission. The system uses a stream cipher, in which the encryption key varies continuously in a chaotic manner. Furthermore, one or more of the parameters of the Lorenz generator is controlled by an auxiliary chaotic generator for increased security. In this thesis, the two Lorenz chaotic systems are called the Main Lorenz Generator and the Auxiliary Lorenz Generator. The system was designed using the SIMULINK tool. The system performance in the presence of noise was tested, and the simulation results are provided. Then, the clock-recovery technique is presented, with real-time results of the clock recovery. The receiver demonstrated its ability to recover and lock the clock successfully. Furthermore, the technique for synchronisation between two separate FPGA boards (transmitter and receiver) is detailed, in which the master system transmits specific data to trigger a slave system in order to run synchronously. The real-time results are provided, which show the achieved synchronisation. The receiver was able to recover user data without error, and the real-time results are listed. The randomness test (NIST) results of the Lorenz chaotic signals are also given. Finally, the security analysis determined the system to have a high degree of security compared to other communication systems
    • …
    corecore