12,380 research outputs found
A Scalable Correlator Architecture Based on Modular FPGA Hardware, Reuseable Gateware, and Data Packetization
A new generation of radio telescopes is achieving unprecedented levels of
sensitivity and resolution, as well as increased agility and field-of-view, by
employing high-performance digital signal processing hardware to phase and
correlate large numbers of antennas. The computational demands of these imaging
systems scale in proportion to BMN^2, where B is the signal bandwidth, M is the
number of independent beams, and N is the number of antennas. The
specifications of many new arrays lead to demands in excess of tens of PetaOps
per second.
To meet this challenge, we have developed a general purpose correlator
architecture using standard 10-Gbit Ethernet switches to pass data between
flexible hardware modules containing Field Programmable Gate Array (FPGA)
chips. These chips are programmed using open-source signal processing libraries
we have developed to be flexible, scalable, and chip-independent. This work
reduces the time and cost of implementing a wide range of signal processing
systems, with correlators foremost among them,and facilitates upgrading to new
generations of processing technology. We present several correlator
deployments, including a 16-antenna, 200-MHz bandwidth, 4-bit, full Stokes
parameter application deployed on the Precision Array for Probing the Epoch of
Reionization.Comment: Accepted to Publications of the Astronomy Society of the Pacific. 31
pages. v2: corrected typo, v3: corrected Fig. 1
Square-rich fixed point polynomial evaluation on FPGAs
Polynomial evaluation is important across a wide range of application domains, so significant work has been done on accelerating its computation. The conventional algorithm, referred to as Horner's rule, involves the least number of steps but can lead to increased latency due to serial computation. Parallel evaluation algorithms such as Estrin's method have shorter latency than Horner's rule, but achieve this at the expense of large hardware overhead. This paper presents an efficient polynomial evaluation algorithm, which reforms the evaluation process to include an increased number of squaring steps. By using a squarer design that is more efficient than general multiplication, this can result in polynomial evaluation with a 57.9% latency reduction over Horner's rule and 14.6% over Estrin's method, while consuming less area than Horner's rule, when implemented on a Xilinx Virtex 6 FPGA. When applied in fixed point function evaluation, where precision requirements limit the rounding of operands, it still achieves a 52.4% performance gain compared to Horner's rule with only a 4% area overhead in evaluating 5th degree polynomials
Timing verification of dynamically reconfigurable logic for Xilinx Virtex FPGA series
This paper reports on a method for extending existing VHDL design and verification software available for the Xilinx Virtex series of FPGAs. It allows the designer to apply standard hardware design and verification tools to the design of dynamically reconfigurable logic (DRL). The technique involves the conversion of a dynamic design into multiple static designs, suitable for input to standard synthesis and APR tools. For timing and functional verification after APR, the sections of the design can then be recombined into a single dynamic system. The technique has been automated by extending an existing DRL design tool named DCSTech, which is part of the Dynamic Circuit Switching (DCS) CAD framework. The principles behind the tools are generic and should be readily extensible to other architectures and CAD toolsets. Implementation of the dynamic system involves the production of partial configuration bitstreams to load sections of circuitry. The process of creating such bitstreams, the final stage of our design flow, is summarized
Digital Demodulator for BFSK waveform based upon Correlator and Differentiator Systems
The present article relates in general to digital demodulation of Binary Frequency Shift Keying (BFSK waveform) . New processing methods for demodulating the BFSK-signals are proposed here. Based on Sampler Correlator, the hardware consumption for the proposed techniques is reduced in comparison with other reported. Theoretical details concerning limits of applicability are also given by closed-form expressions. Simulation experiments are illustrated to validate the overall performance
Automated Circuit Approximation Method Driven by Data Distribution
We propose an application-tailored data-driven fully automated method for
functional approximation of combinational circuits. We demonstrate how an
application-level error metric such as the classification accuracy can be
translated to a component-level error metric needed for an efficient and fast
search in the space of approximate low-level components that are used in the
application. This is possible by employing a weighted mean error distance
(WMED) metric for steering the circuit approximation process which is conducted
by means of genetic programming. WMED introduces a set of weights (calculated
from the data distribution measured on a selected signal in a given
application) determining the importance of each input vector for the
approximation process. The method is evaluated using synthetic benchmarks and
application-specific approximate MAC (multiply-and-accumulate) units that are
designed to provide the best trade-offs between the classification accuracy and
power consumption of two image classifiers based on neural networks.Comment: Accepted for publication at Design, Automation and Test in Europe
(DATE 2019). Florence, Ital
Reconstruction from Periodic Nonlinearities, With Applications to HDR Imaging
We consider the problem of reconstructing signals and images from periodic
nonlinearities. For such problems, we design a measurement scheme that supports
efficient reconstruction; moreover, our method can be adapted to extend to
compressive sensing-based signal and image acquisition systems. Our techniques
can be potentially useful for reducing the measurement complexity of high
dynamic range (HDR) imaging systems, with little loss in reconstruction
quality. Several numerical experiments on real data demonstrate the
effectiveness of our approach
Finding the "truncated" polynomial that is closest to a function
When implementing regular enough functions (e.g., elementary or special
functions) on a computing system, we frequently use polynomial approximations.
In most cases, the polynomial that best approximates (for a given distance and
in a given interval) a function has coefficients that are not exactly
representable with a finite number of bits. And yet, the polynomial
approximations that are actually implemented do have coefficients that are
represented with a finite - and sometimes small - number of bits: this is due
to the finiteness of the floating-point representations (for software
implementations), and to the need to have small, hence fast and/or inexpensive,
multipliers (for hardware implementations). We then have to consider polynomial
approximations for which the degree- coefficient has at most
fractional bits (in other words, it is a rational number with denominator
). We provide a general method for finding the best polynomial
approximation under this constraint. Then, we suggest refinements than can be
used to accelerate our method.Comment: 14 pages, 1 figur
- …