3,576 research outputs found
The performance and limitations of FPGA-based digital servos for atomic, molecular, and optical physics experiments
In this work we address the advantages, limitations, and technical subtleties
of employing FPGA-based digital servos for high-bandwidth feedback control of
lasers in atomic, molecular, and optical (AMO) physics experiments.
Specifically, we provide the results of benchmark performance tests in
experimental setups including noise, bandwidth, and dynamic range for two
digital servos built with low and mid-range priced FPGA development platforms.
The digital servo results are compared to results obtained from a commercially
available state-of-the-art analog servo using the same plant for control
(intensity stabilization). The digital servos have feedback bandwidths of 2.5
MHz, limited by the total signal latency, and we demonstrate improvements
beyond the transfer function offered by the analog servo including a three pole
filter and a two pole filter with phase compensation to suppress resonances. We
also discuss limitations of our FPGA-servo implementation and general
considerations when designing and using digital servos
On-Chip Implementation of Pipeline Digit-Slicing Multiplier-Less Butterfly for Fast Fourier Transform Architecture
The need for wireless communication has driven the communication systems to
high performance. However, the main bottleneck that affects the communication
capability is the Fast Fourier Transform (FFT), which is the core of most
modulators. This study presents an on-chip implementation of pipeline
digit-slicing multiplier-less butterfly for FFT structure. The approach is
taken, in order to reduce computation complexity in the butterfly,
digit-slicing multiplier-less single constant technique was utilized in the
critical path of Radix-2 Decimation In Time (DIT) FFT structure. The proposed
design focused on the trade-off between the speed and active silicon area for
the chip implementation. The new architecture was investigated and simulated
with MATLAB software. The Verilog HDL code in Xilinx ISE environment was
derived to describe the FFT Butterfly functionality and was downloaded to
Virtex II FPGA board. Consequently, the Virtex-II FG456 Proto board was used to
implement and test the design on the real hardware. As a result, from the
findings, the synthesis report indicates the maximum clock frequency of 549.75
MHz with the total equivalent gate count of 31,159 is a marked and significant
improvement over Radix 2 FFT butterfly. In comparison with the conventional
butterfly architecture, the design that can only run at a maximum clock
frequency of 198.987 MHz and the conventional multiplier can only run at a
maximum clock frequency of 220.160 MHz, the proposed system exhibits better
results. The resulting maximum clock frequency increases by about 276.28% for
the FFT butterfly and about 277.06% for the multiplier. It can be concluded
that on-chip implementation of pipeline digit-slicing multiplier-less butterfly
for FFT structure is an enabler in solving problems that affect communications
capability in FFT and possesses huge potentials for future related works and
research areas.Comment: arXiv admin note: substantial text overlap with arXiv:1806.0457
On the Quantization of Cellular Neural Networks for Cyber-Physical Systems
Cyber-Physical Systems (CPSs) have been pervasive including smart grid,
autonomous automobile systems, medical monitoring, process control systems,
robotics systems, and automatic pilot avionics. As usually implemented on
embedded devices, CPS is typically constrained by computation capacity and
energy consumption. In some CPS applications such as telemedicine and advanced
driving assistance system (ADAS), data processing on the embedded devices is
preferred due to security/safety and real-time requirement. Therefore, high
efficiency is highly desirable for such CPS applications. In this paper we
present CeNN quantization for high-efficient processing for CPS applications,
particularly telemedicine and ADAS applications. We systematically put forward
powers-of-two based incremental quantization of CeNNs for efficient hardware
implementation. The incremental quantization contains iterative procedures
including parameter partition, parameter quantization, and re-training. We
propose five different strategies including random strategy, pruning inspired
strategy, weighted pruning inspired strategy, nearest neighbor strategy, and
weighted nearest neighbor strategy. Experimental results show that our approach
can achieve a speedup up to 7.8x with no performance loss compared with the
state-of-the-art FPGA solutions for CeNNs.Comment: 14 pages,10 figure
High Throughput 2D Spatial Image Filters on FPGAs
FPGAs are well established in the signal processing domain, where their
fine-grained programmable nature allows the inherent parallelism in these
applications to be exploited for enhanced performance. As architectures have
evolved, FPGA vendors have added more heterogeneous resources to allow
often-used functions to be implemented with higher performance, at lower power
and using less area. DSP blocks, for example, have evolved from basic
multipliers to support the multiply-accumulate operations that are the core of
many signal processing tasks. While more features were added to DSP blocks,
their structure and connectivity has been optimised primarily for
one-dimensional signal processing. Basic operations in image processing are
similar, but performed in a two-dimensional structure, and hence, many of the
optimisations in newer DSP blocks are not exploited when mapping image
processing algorithms to them. We present a detailed study of two-dimensional
spatial filter implementation on FPGAs, showing how to maximise performance
through exploitation of DSP block capabilities, while also presenting a lean
border pixel management policy
The Effect of the Digit Slicing Architecture on the FFT Butterfly
Most communications systems tend to achieve bandwidth, power and cost
efficiencies to capable to describe modulation scheme. Hence for signal
modulation, orthogonal frequency division multiplexing (OFDM) transceiver is
introduced to cover communications demand in four generation. However
high-performance Fast Fourier Transforms (FFT) as a main heart of OFDM acts
beyond the view. In order to achieve capable FFT, design, and realization of
its efficient internal structure is key issues of this research work. In this
paper implementation of a high-performance butterfly for FFT by applying digit
slicing technique is presented. The proposed design focused on the trade-off
between the speed and active silicon area for the chip implementation. The new
architecture was investigated and simulated with the MATLAB software. The
Verilog HDL code in Xilinx ISE environment was derived to describe the FFT
Butterfly functionality and was downloaded to Virtex II FPGA board.Comment: arXiv admin note: substantial text overlap with arXiv:1808.02521,
arXiv:1806.0457
VLSI Computational Architectures for the Arithmetic Cosine Transform
The discrete cosine transform (DCT) is a widely-used and important signal
processing tool employed in a plethora of applications. Typical fast algorithms
for nearly-exact computation of DCT require floating point arithmetic, are
multiplier intensive, and accumulate round-off errors. Recently proposed fast
algorithm arithmetic cosine transform (ACT) calculates the DCT exactly using
only additions and integer constant multiplications, with very low area
complexity, for null mean input sequences. The ACT can also be computed
non-exactly for any input sequence, with low area complexity and low power
consumption, utilizing the novel architecture described. However, as a
trade-off, the ACT algorithm requires 10 non-uniformly sampled data points to
calculate the 8-point DCT. This requirement can easily be satisfied for
applications dealing with spatial signals such as image sensors and biomedical
sensor arrays, by placing sensor elements in a non-uniform grid. In this work,
a hardware architecture for the computation of the null mean ACT is proposed,
followed by a novel architectures that extend the ACT for non-null mean
signals. All circuits are physically implemented and tested using the Xilinx
XC6VLX240T FPGA device and synthesized for 45 nm TSMC standard-cell library for
performance assessment.Comment: 8 pages, 2 figures, 6 table
Towards Programmable Network Dynamics: A Chemistry-Inspired Abstraction for Hardware Design
Chemical algorithms are statistical algorithms described and represented as
chemical reaction networks. They are particularly attractive for traffic
shaping and general control of network dynamics; they are analytically
tractable, they reinforce a strict state-to-dynamics relationship, they have
configurable stability properties, and they are directly implemented in
state-space using a high-level (graphical) representation.
In this paper, we present a direct implementation of chemical algorithms on
FPGA hardware. Besides substantially improving performance, we have achieved
hardware-level programmability and re-configurability of these algorithms at
runtime (not interrupting servicing) and in realtime (with sub-second latency).
This opens an interesting perspective for expanding the currently limited scope
of software defined networking and network virtualisation solutions, to include
programmable control of network dynamics.Comment: 14 pages, non accepted version submitted to IEEE/ACM Transactions on
Networking on May 2015 (after first submission on May 2014
BOTDA Fiber Sensor System Based on FPGA Accelerated Support Vector Regression
Brillouin optical time domain analyzer (BOTDA) fiber sensors have shown
strong capability in static long haul distributed temperature/strain sensing.
However, in applications such as structural health monitoring and leakage
detection, real-time measurement is quite necessary. The measurement time of
temperature/strain in a BOTDA system includes data acquisition time and
post-processing time. In this work, we propose to use hardware accelerated
support vector regression (SVR) for the post-processing of the collected BOTDA
data. Ideal Lorentzian curves under different temperatures with different
linewidths are used to train the SVR model to determine the linear SVR decision
function. The performance of SVR is evaluated under different signal-to-noise
ratios (SNRs) experimentally. After the model coefficients are determined,
algorithm-specific hardware accelerators based on field programmable gate
arrays (FPGAs) are used to realize SVR decision function. During the
implementation, hardware optimization techniques based on loop dependence
analysis and batch processing are proposed to reduce the execution latency. Our
FPGA implementations can achieve up to 42x speedup compared with software
implementation on an i7-5960x computer. The post-processing time for 96,100
BGSs along 38.44-km FUT is only 0.46 seconds with FPGA board ZCU104, making the
post-processing time no longer a limiting factor for dynamic sensing. Moreover,
the energy efficiency of our FPGA implementation can reach up to 226.1x higher
than software implementation based on CPU.Comment: 8 pgae
SYQ: Learning Symmetric Quantization For Efficient Deep Neural Networks
Inference for state-of-the-art deep neural networks is computationally
expensive, making them difficult to deploy on constrained hardware
environments. An efficient way to reduce this complexity is to quantize the
weight parameters and/or activations during training by approximating their
distributions with a limited entry codebook. For very low-precisions, such as
binary or ternary networks with 1-8-bit activations, the information loss from
quantization leads to significant accuracy degradation due to large gradient
mismatches between the forward and backward functions. In this paper, we
introduce a quantization method to reduce this loss by learning a symmetric
codebook for particular weight subgroups. These subgroups are determined based
on their locality in the weight matrix, such that the hardware simplicity of
the low-precision representations is preserved. Empirically, we show that
symmetric quantization can substantially improve accuracy for networks with
extremely low-precision weights and activations. We also demonstrate that this
representation imposes minimal or no hardware implications to more
coarse-grained approaches. Source code is available at
https://www.github.com/julianfaraone/SYQ.Comment: Published as a conference paper at the 2018 IEEE Conference on
Computer Vision and Pattern Recognition (CVPR
FPGA-based Acceleration of FT Convolution for Pulsar Search Using OpenCL
The Square Kilometre Array (SKA) project will be the world largest radio
telescope array. With its large number of antennas, the number of signals that
need to be processed is dramatic. One important element of the SKA's Central
Signal Processor package is pulsar search. This paper focuses on the FPGA-based
acceleration of the Frequency-Domain Acceleration Search module, which is a
part of SKA pulsar search engine. In this module, the frequency-domain input
signals have to be processed by 85 Finite Impulse response (FIR) filters within
a short period of limitation and for thousands of input arrays. Because of the
large scale of the input length and FIR filter size, even high-end FPGA devices
cannot parallelise the task completely. We start by investigating both
time-domain FIR filter (TDFIR) and frequency-domain FIR filter (FDFIR) to
tackle this task. We applied the overlap-add algorithm to split the coefficient
array of TDFIR and the overlap-save algorithm to split the input signals of
FDFIR. To achieve fast prototyping design, we employed OpenCL, which is a
high-level FPGA development technique. The performance and power consumption
are evaluated using multiple FPGA devices simultaneously and compared with GPU
results, which is achieved by porting FPGA-based OpenCL kernels. The
experimental evaluation shows that the FDFIR solution is very competitive in
terms of performance, with a clear energy consumption advantage over the GPU
solution.Comment: 25 page, 13 figure
- …