3,493 research outputs found
On-Chip Implementation of Pipeline Digit-Slicing Multiplier-Less Butterfly for Fast Fourier Transform Architecture
The need for wireless communication has driven the communication systems to
high performance. However, the main bottleneck that affects the communication
capability is the Fast Fourier Transform (FFT), which is the core of most
modulators. This study presents an on-chip implementation of pipeline
digit-slicing multiplier-less butterfly for FFT structure. The approach is
taken, in order to reduce computation complexity in the butterfly,
digit-slicing multiplier-less single constant technique was utilized in the
critical path of Radix-2 Decimation In Time (DIT) FFT structure. The proposed
design focused on the trade-off between the speed and active silicon area for
the chip implementation. The new architecture was investigated and simulated
with MATLAB software. The Verilog HDL code in Xilinx ISE environment was
derived to describe the FFT Butterfly functionality and was downloaded to
Virtex II FPGA board. Consequently, the Virtex-II FG456 Proto board was used to
implement and test the design on the real hardware. As a result, from the
findings, the synthesis report indicates the maximum clock frequency of 549.75
MHz with the total equivalent gate count of 31,159 is a marked and significant
improvement over Radix 2 FFT butterfly. In comparison with the conventional
butterfly architecture, the design that can only run at a maximum clock
frequency of 198.987 MHz and the conventional multiplier can only run at a
maximum clock frequency of 220.160 MHz, the proposed system exhibits better
results. The resulting maximum clock frequency increases by about 276.28% for
the FFT butterfly and about 277.06% for the multiplier. It can be concluded
that on-chip implementation of pipeline digit-slicing multiplier-less butterfly
for FFT structure is an enabler in solving problems that affect communications
capability in FFT and possesses huge potentials for future related works and
research areas.Comment: arXiv admin note: substantial text overlap with arXiv:1806.0457
A Multi-GPU Programming Library for Real-Time Applications
We present MGPU, a C++ programming library targeted at single-node multi-GPU
systems. Such systems combine disproportionate floating point performance with
high data locality and are thus well suited to implement real-time algorithms.
We describe the library design, programming interface and implementation
details in light of this specific problem domain. The core concepts of this
work are a novel kind of container abstraction and MPI-like communication
methods for intra-system communication. We further demonstrate how MGPU is used
as a framework for porting existing GPU libraries to multi-device
architectures. Putting our library to the test, we accelerate an iterative
non-linear image reconstruction algorithm for real-time magnetic resonance
imaging using multiple GPUs. We achieve a speed-up of about 1.7 using 2 GPUs
and reach a final speed-up of 2.1 with 4 GPUs. These promising results lead us
to conclude that multi-GPU systems are a viable solution for real-time MRI
reconstruction as well as signal-processing applications in general.Comment: 15 pages, 10 figure
Efficient multiplier-less VLSI architectures for folded pipelined complex FFT core
Fast Fourier transform (FFT) has become ubiquitous in many engineering applications. FFT is one of the most employed blocks in many communication and signal processing systems. Efficient algorithms are being designed to improve the architecture of FFT. Higher radix FFT algorithms have the traditional advantage of using less number of computational elements and are more suitable for calculating FFT of long data sequence. Among the different proposed algorithms, the split-radix FFT has shown considerable improvement in terms of reducing hardware complexity of the architecture compared to radix-2 and radix-4 FFT algorithms. Here radix-4, radix-8, and split-radix algorithms have been used in the design of different proposed complex FFT cores. The growing popularity of adopting virtual instrumentation (modular, customizable, software-defined instrumentation) has only became possible due to the use of LabVIEW with a highly interactive process known as graphical system design. The CompactRIO programmable automation controller is an advanced embedded control and data acquisition system designed for applications that require high performance and reliability. The work explains the real-time implementation of 256-point FFT and finding the power spectrum using LabVIEW and CompactRIO. New distributed arithmetic (NEDA) is one of the most used techniques in implementing multiplier-less architectures of many digital systems. In this thesis, four architectures for different FFT cores have been proposed: • Real-time implementation of FFT using CompactRIO • 32-Point Complex FFT Core Using Split-Radix Algorithm • 64-Point Complex FFT Core Using Radix-4 Algorithm • 64-Point Complex FFT Core Using Radix-8 Algorithm The proposed designs have implemented in both FPGA as well as ASIC design flows. 180nm process technology is being used for ASIC implementation. The results show the improvements of proposed designs compared to the other existing architectures
FPGA Implementation of pipeline Digit-Slicing Multiplier-Less Radix 2 power of 2 DIF SDF Butterfly for Fourier Transform Structure
The need for wireless communication has driven the communication systems to
high performance. However, the main bottleneck that affects the communication
capability is the Fast Fourier Transform (FFT), which is the core of most
modulators. This paper presents FPGA implementation of pipeline digit-slicing
multiplier-less radix 22 DIF (Decimation In Frequency) SDF (single path delay
feedback) butterfly for FFT structure. The approach is taken, in order to
reduce computation complexity in butterfly multiplier, the digit-slicing
multiplier-less technique was utilized in the critical path of pipeline
Radix-22 DIF SDF FFT structure. The proposed design focused on the trade-off
between the speed and active silicon area for the chip implementation. The
multiplier input data was sliced into four blocks each one with four bits to
process at the same time in parallel. The new architecture was investigated and
simulated with MATLAB software. The Verilog HDL code in Xilinx ISE environment
was derived to describe the FFT Butterfly functionality and was downloaded to
Virtex II FPGA board. Consequently, the Virtex-II FG456 Proto board was used to
implement and test the design on the real hardware. As a result, from the
findings, the synthesis report indicates the maximum clock frequency of 555.75
MHz with the total equivalent gate count of 32,146 is a marked and significant
improvement over Radix 22 DIF SDF FFT butterfly. In comparison with the
conventional butterfly architecture design which can only run at a maximum
clock frequency of 200.102 MHz and the conventional multiplier can only run at
a maximum clock frequency of 221.140 MHz, the proposed system exhibits better
results.Comment: European Conference on Antennas and Propagation (EUCAP2011). Pp 4168-
417
A Scalable Correlator Architecture Based on Modular FPGA Hardware, Reuseable Gateware, and Data Packetization
A new generation of radio telescopes is achieving unprecedented levels of
sensitivity and resolution, as well as increased agility and field-of-view, by
employing high-performance digital signal processing hardware to phase and
correlate large numbers of antennas. The computational demands of these imaging
systems scale in proportion to BMN^2, where B is the signal bandwidth, M is the
number of independent beams, and N is the number of antennas. The
specifications of many new arrays lead to demands in excess of tens of PetaOps
per second.
To meet this challenge, we have developed a general purpose correlator
architecture using standard 10-Gbit Ethernet switches to pass data between
flexible hardware modules containing Field Programmable Gate Array (FPGA)
chips. These chips are programmed using open-source signal processing libraries
we have developed to be flexible, scalable, and chip-independent. This work
reduces the time and cost of implementing a wide range of signal processing
systems, with correlators foremost among them,and facilitates upgrading to new
generations of processing technology. We present several correlator
deployments, including a 16-antenna, 200-MHz bandwidth, 4-bit, full Stokes
parameter application deployed on the Precision Array for Probing the Epoch of
Reionization.Comment: Accepted to Publications of the Astronomy Society of the Pacific. 31
pages. v2: corrected typo, v3: corrected Fig. 1
Testing Implementation of FAMTAR: Adaptive Multipath Routing
Flow-Aware Multi-Topology Adaptive Routing (FAMTAR) is a new approach to
multipath and adaptive routing in IP networks which enables automatic use of
alternative paths when the primary one becomes congested. It provides more
efficient network resource utilization and higher quality of transmission
compared to standard IP routing. However, thus far it has only been evaluated
through simulations. In this paper we share our experiences from building a
real-time FAMTAR router and present results of its tests in a physical network.
The results are in line with those obtained previously through simulations and
they open the way to implementation of a production grade FAMTAR router
Fast versions of Shor's quantum factoring algorithm
We present fast and highly parallelized versions of Shor's algorithm. With a
sizable quantum computer it would then be possible to factor numbers with
millions of digits. The main algorithm presented here uses FFT-based fast
integer multiplication. The quick reader can just read the introduction and the
``Results'' section.Comment: 37 pages, LaTeX, 1 figur
CONFIGURABLE 2k/4k/8k FFT-IFFT CORE FOR DVB-T AND DVB-H
Modulation technique uses a modifier module IFFT signal data from frequency domain to time domain. While at the demodulation part, FFT module is used to change the return signal from the output of the IFFT and converted them from the time domain into the frequency domain. FFT�IFFT modules are made to support 2k/4k/8k FFT and IFFT algorithms. FFT�IFFT 2k/4k/8k Core are built using the radix 2, radix 4 and radix 8. Core is designed to be able to receive data continuously, without buffer (temporary data container). The FFT�IFFT 2k/4k/8k module designs started with the functional description in model. Then the design of hardware architecture is made based on functional design in model. Then the architecture design will be used in making model bit precision. Furthermore the model bit precision design is used as a foundation in designing RTL. The result of FFT�IFFT modules meet the standard specified by the DVB consortium, with a maximum test frequency of FFT�IFFT 2k/4k/8k Core is 69.36 MHz using FPGA STRATIX II EP2S60-F1020C3 that surpass the requirements in the standard DVB�T/DVB�H (40 MHz). In addition, the module has a high throughput with the average of 39.82 M sym /
Software-Defined Radio GNSS Instrumentation for Spoofing Mitigation: A Review and a Case Study
Recently, several global navigation satellite systems (GNSS) emerged
following the transformative technology impact of the first GNSS: US Global
Positioning System (GPS). The power level of GNSS signals as measured at the
earths surface is below the noise floor and is consequently vulnerable against
interference. Spoofers are smart GNSS-like interferers, which mislead the
receivers into generating false position and time information. While many
spoofing mitigation techniques exist, spoofers are continually evolving,
producing a cycle of new spoofing attacks and counter-measures against them.
Thus, upgradability of receivers becomes an important advantage for maintaining
their immunity against spoofing. Software-defined radio (SDR) implementations
of a GPS receiver address such flexibility but are challenged by demanding
computational requirements of both GNSS signal processing and spoofing
mitigation. Therefore, this paper reviews reported SDRs in the context of
instrumentation capabilities for both conventional and spoofing mitigation
modes. This separation is necessitated by significantly increased computational
loads when in spoofing domain. This is demonstrated by a case study budget
analysis
Reliable Linear, Sesquilinear and Bijective Operations On Integer Data Streams Via Numerical Entanglement
A new technique is proposed for fault-tolerant linear, sesquilinear and
bijective (LSB) operations on integer data streams (), such as:
scaling, additions/subtractions, inner or outer vector products, permutations
and convolutions. In the proposed method, the input integer data streams
are linearly superimposed to form numerically-entangled integer data
streams that are stored in-place of the original inputs. A series of LSB
operations can then be performed directly using these entangled data streams.
The results are extracted from the entangled output streams by additions
and arithmetic shifts. Any soft errors affecting any single disentangled output
stream are guaranteed to be detectable via a specific post-computation
reliability check. In addition, when utilizing a separate processor core for
each of the streams, the proposed approach can recover all outputs after
any single fail-stop failure. Importantly, unlike algorithm-based fault
tolerance (ABFT) methods, the number of operations required for the
entanglement, extraction and validation of the results is linearly related to
the number of the inputs and does not depend on the complexity of the performed
LSB operations. We have validated our proposal in an Intel processor (Haswell
architecture with AVX2 support) via fast Fourier transforms, circular
convolutions, and matrix multiplication operations. Our analysis and
experiments reveal that the proposed approach incurs between to
reduction in processing throughput for a wide variety of LSB operations. This
overhead is 5 to 1000 times smaller than that of the equivalent ABFT method
that uses a checksum stream. Thus, our proposal can be used in fault-generating
processor hardware or safety-critical applications, where high reliability is
required without the cost of ABFT or modular redundancy.Comment: to appear in IEEE Trans. on Signal Processing, 201
- …