3,576 research outputs found

    The performance and limitations of FPGA-based digital servos for atomic, molecular, and optical physics experiments

    Full text link
    In this work we address the advantages, limitations, and technical subtleties of employing FPGA-based digital servos for high-bandwidth feedback control of lasers in atomic, molecular, and optical (AMO) physics experiments. Specifically, we provide the results of benchmark performance tests in experimental setups including noise, bandwidth, and dynamic range for two digital servos built with low and mid-range priced FPGA development platforms. The digital servo results are compared to results obtained from a commercially available state-of-the-art analog servo using the same plant for control (intensity stabilization). The digital servos have feedback bandwidths of 2.5 MHz, limited by the total signal latency, and we demonstrate improvements beyond the transfer function offered by the analog servo including a three pole filter and a two pole filter with phase compensation to suppress resonances. We also discuss limitations of our FPGA-servo implementation and general considerations when designing and using digital servos

    On-Chip Implementation of Pipeline Digit-Slicing Multiplier-Less Butterfly for Fast Fourier Transform Architecture

    Full text link
    The need for wireless communication has driven the communication systems to high performance. However, the main bottleneck that affects the communication capability is the Fast Fourier Transform (FFT), which is the core of most modulators. This study presents an on-chip implementation of pipeline digit-slicing multiplier-less butterfly for FFT structure. The approach is taken, in order to reduce computation complexity in the butterfly, digit-slicing multiplier-less single constant technique was utilized in the critical path of Radix-2 Decimation In Time (DIT) FFT structure. The proposed design focused on the trade-off between the speed and active silicon area for the chip implementation. The new architecture was investigated and simulated with MATLAB software. The Verilog HDL code in Xilinx ISE environment was derived to describe the FFT Butterfly functionality and was downloaded to Virtex II FPGA board. Consequently, the Virtex-II FG456 Proto board was used to implement and test the design on the real hardware. As a result, from the findings, the synthesis report indicates the maximum clock frequency of 549.75 MHz with the total equivalent gate count of 31,159 is a marked and significant improvement over Radix 2 FFT butterfly. In comparison with the conventional butterfly architecture, the design that can only run at a maximum clock frequency of 198.987 MHz and the conventional multiplier can only run at a maximum clock frequency of 220.160 MHz, the proposed system exhibits better results. The resulting maximum clock frequency increases by about 276.28% for the FFT butterfly and about 277.06% for the multiplier. It can be concluded that on-chip implementation of pipeline digit-slicing multiplier-less butterfly for FFT structure is an enabler in solving problems that affect communications capability in FFT and possesses huge potentials for future related works and research areas.Comment: arXiv admin note: substantial text overlap with arXiv:1806.0457

    On the Quantization of Cellular Neural Networks for Cyber-Physical Systems

    Full text link
    Cyber-Physical Systems (CPSs) have been pervasive including smart grid, autonomous automobile systems, medical monitoring, process control systems, robotics systems, and automatic pilot avionics. As usually implemented on embedded devices, CPS is typically constrained by computation capacity and energy consumption. In some CPS applications such as telemedicine and advanced driving assistance system (ADAS), data processing on the embedded devices is preferred due to security/safety and real-time requirement. Therefore, high efficiency is highly desirable for such CPS applications. In this paper we present CeNN quantization for high-efficient processing for CPS applications, particularly telemedicine and ADAS applications. We systematically put forward powers-of-two based incremental quantization of CeNNs for efficient hardware implementation. The incremental quantization contains iterative procedures including parameter partition, parameter quantization, and re-training. We propose five different strategies including random strategy, pruning inspired strategy, weighted pruning inspired strategy, nearest neighbor strategy, and weighted nearest neighbor strategy. Experimental results show that our approach can achieve a speedup up to 7.8x with no performance loss compared with the state-of-the-art FPGA solutions for CeNNs.Comment: 14 pages,10 figure

    High Throughput 2D Spatial Image Filters on FPGAs

    Full text link
    FPGAs are well established in the signal processing domain, where their fine-grained programmable nature allows the inherent parallelism in these applications to be exploited for enhanced performance. As architectures have evolved, FPGA vendors have added more heterogeneous resources to allow often-used functions to be implemented with higher performance, at lower power and using less area. DSP blocks, for example, have evolved from basic multipliers to support the multiply-accumulate operations that are the core of many signal processing tasks. While more features were added to DSP blocks, their structure and connectivity has been optimised primarily for one-dimensional signal processing. Basic operations in image processing are similar, but performed in a two-dimensional structure, and hence, many of the optimisations in newer DSP blocks are not exploited when mapping image processing algorithms to them. We present a detailed study of two-dimensional spatial filter implementation on FPGAs, showing how to maximise performance through exploitation of DSP block capabilities, while also presenting a lean border pixel management policy

    The Effect of the Digit Slicing Architecture on the FFT Butterfly

    Full text link
    Most communications systems tend to achieve bandwidth, power and cost efficiencies to capable to describe modulation scheme. Hence for signal modulation, orthogonal frequency division multiplexing (OFDM) transceiver is introduced to cover communications demand in four generation. However high-performance Fast Fourier Transforms (FFT) as a main heart of OFDM acts beyond the view. In order to achieve capable FFT, design, and realization of its efficient internal structure is key issues of this research work. In this paper implementation of a high-performance butterfly for FFT by applying digit slicing technique is presented. The proposed design focused on the trade-off between the speed and active silicon area for the chip implementation. The new architecture was investigated and simulated with the MATLAB software. The Verilog HDL code in Xilinx ISE environment was derived to describe the FFT Butterfly functionality and was downloaded to Virtex II FPGA board.Comment: arXiv admin note: substantial text overlap with arXiv:1808.02521, arXiv:1806.0457

    VLSI Computational Architectures for the Arithmetic Cosine Transform

    Full text link
    The discrete cosine transform (DCT) is a widely-used and important signal processing tool employed in a plethora of applications. Typical fast algorithms for nearly-exact computation of DCT require floating point arithmetic, are multiplier intensive, and accumulate round-off errors. Recently proposed fast algorithm arithmetic cosine transform (ACT) calculates the DCT exactly using only additions and integer constant multiplications, with very low area complexity, for null mean input sequences. The ACT can also be computed non-exactly for any input sequence, with low area complexity and low power consumption, utilizing the novel architecture described. However, as a trade-off, the ACT algorithm requires 10 non-uniformly sampled data points to calculate the 8-point DCT. This requirement can easily be satisfied for applications dealing with spatial signals such as image sensors and biomedical sensor arrays, by placing sensor elements in a non-uniform grid. In this work, a hardware architecture for the computation of the null mean ACT is proposed, followed by a novel architectures that extend the ACT for non-null mean signals. All circuits are physically implemented and tested using the Xilinx XC6VLX240T FPGA device and synthesized for 45 nm TSMC standard-cell library for performance assessment.Comment: 8 pages, 2 figures, 6 table

    Towards Programmable Network Dynamics: A Chemistry-Inspired Abstraction for Hardware Design

    Full text link
    Chemical algorithms are statistical algorithms described and represented as chemical reaction networks. They are particularly attractive for traffic shaping and general control of network dynamics; they are analytically tractable, they reinforce a strict state-to-dynamics relationship, they have configurable stability properties, and they are directly implemented in state-space using a high-level (graphical) representation. In this paper, we present a direct implementation of chemical algorithms on FPGA hardware. Besides substantially improving performance, we have achieved hardware-level programmability and re-configurability of these algorithms at runtime (not interrupting servicing) and in realtime (with sub-second latency). This opens an interesting perspective for expanding the currently limited scope of software defined networking and network virtualisation solutions, to include programmable control of network dynamics.Comment: 14 pages, non accepted version submitted to IEEE/ACM Transactions on Networking on May 2015 (after first submission on May 2014

    BOTDA Fiber Sensor System Based on FPGA Accelerated Support Vector Regression

    Full text link
    Brillouin optical time domain analyzer (BOTDA) fiber sensors have shown strong capability in static long haul distributed temperature/strain sensing. However, in applications such as structural health monitoring and leakage detection, real-time measurement is quite necessary. The measurement time of temperature/strain in a BOTDA system includes data acquisition time and post-processing time. In this work, we propose to use hardware accelerated support vector regression (SVR) for the post-processing of the collected BOTDA data. Ideal Lorentzian curves under different temperatures with different linewidths are used to train the SVR model to determine the linear SVR decision function. The performance of SVR is evaluated under different signal-to-noise ratios (SNRs) experimentally. After the model coefficients are determined, algorithm-specific hardware accelerators based on field programmable gate arrays (FPGAs) are used to realize SVR decision function. During the implementation, hardware optimization techniques based on loop dependence analysis and batch processing are proposed to reduce the execution latency. Our FPGA implementations can achieve up to 42x speedup compared with software implementation on an i7-5960x computer. The post-processing time for 96,100 BGSs along 38.44-km FUT is only 0.46 seconds with FPGA board ZCU104, making the post-processing time no longer a limiting factor for dynamic sensing. Moreover, the energy efficiency of our FPGA implementation can reach up to 226.1x higher than software implementation based on CPU.Comment: 8 pgae

    SYQ: Learning Symmetric Quantization For Efficient Deep Neural Networks

    Full text link
    Inference for state-of-the-art deep neural networks is computationally expensive, making them difficult to deploy on constrained hardware environments. An efficient way to reduce this complexity is to quantize the weight parameters and/or activations during training by approximating their distributions with a limited entry codebook. For very low-precisions, such as binary or ternary networks with 1-8-bit activations, the information loss from quantization leads to significant accuracy degradation due to large gradient mismatches between the forward and backward functions. In this paper, we introduce a quantization method to reduce this loss by learning a symmetric codebook for particular weight subgroups. These subgroups are determined based on their locality in the weight matrix, such that the hardware simplicity of the low-precision representations is preserved. Empirically, we show that symmetric quantization can substantially improve accuracy for networks with extremely low-precision weights and activations. We also demonstrate that this representation imposes minimal or no hardware implications to more coarse-grained approaches. Source code is available at https://www.github.com/julianfaraone/SYQ.Comment: Published as a conference paper at the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR

    FPGA-based Acceleration of FT Convolution for Pulsar Search Using OpenCL

    Full text link
    The Square Kilometre Array (SKA) project will be the world largest radio telescope array. With its large number of antennas, the number of signals that need to be processed is dramatic. One important element of the SKA's Central Signal Processor package is pulsar search. This paper focuses on the FPGA-based acceleration of the Frequency-Domain Acceleration Search module, which is a part of SKA pulsar search engine. In this module, the frequency-domain input signals have to be processed by 85 Finite Impulse response (FIR) filters within a short period of limitation and for thousands of input arrays. Because of the large scale of the input length and FIR filter size, even high-end FPGA devices cannot parallelise the task completely. We start by investigating both time-domain FIR filter (TDFIR) and frequency-domain FIR filter (FDFIR) to tackle this task. We applied the overlap-add algorithm to split the coefficient array of TDFIR and the overlap-save algorithm to split the input signals of FDFIR. To achieve fast prototyping design, we employed OpenCL, which is a high-level FPGA development technique. The performance and power consumption are evaluated using multiple FPGA devices simultaneously and compared with GPU results, which is achieved by porting FPGA-based OpenCL kernels. The experimental evaluation shows that the FDFIR solution is very competitive in terms of performance, with a clear energy consumption advantage over the GPU solution.Comment: 25 page, 13 figure
    • …
    corecore