3,493 research outputs found

    On-Chip Implementation of Pipeline Digit-Slicing Multiplier-Less Butterfly for Fast Fourier Transform Architecture

    Full text link
    The need for wireless communication has driven the communication systems to high performance. However, the main bottleneck that affects the communication capability is the Fast Fourier Transform (FFT), which is the core of most modulators. This study presents an on-chip implementation of pipeline digit-slicing multiplier-less butterfly for FFT structure. The approach is taken, in order to reduce computation complexity in the butterfly, digit-slicing multiplier-less single constant technique was utilized in the critical path of Radix-2 Decimation In Time (DIT) FFT structure. The proposed design focused on the trade-off between the speed and active silicon area for the chip implementation. The new architecture was investigated and simulated with MATLAB software. The Verilog HDL code in Xilinx ISE environment was derived to describe the FFT Butterfly functionality and was downloaded to Virtex II FPGA board. Consequently, the Virtex-II FG456 Proto board was used to implement and test the design on the real hardware. As a result, from the findings, the synthesis report indicates the maximum clock frequency of 549.75 MHz with the total equivalent gate count of 31,159 is a marked and significant improvement over Radix 2 FFT butterfly. In comparison with the conventional butterfly architecture, the design that can only run at a maximum clock frequency of 198.987 MHz and the conventional multiplier can only run at a maximum clock frequency of 220.160 MHz, the proposed system exhibits better results. The resulting maximum clock frequency increases by about 276.28% for the FFT butterfly and about 277.06% for the multiplier. It can be concluded that on-chip implementation of pipeline digit-slicing multiplier-less butterfly for FFT structure is an enabler in solving problems that affect communications capability in FFT and possesses huge potentials for future related works and research areas.Comment: arXiv admin note: substantial text overlap with arXiv:1806.0457

    A Multi-GPU Programming Library for Real-Time Applications

    Full text link
    We present MGPU, a C++ programming library targeted at single-node multi-GPU systems. Such systems combine disproportionate floating point performance with high data locality and are thus well suited to implement real-time algorithms. We describe the library design, programming interface and implementation details in light of this specific problem domain. The core concepts of this work are a novel kind of container abstraction and MPI-like communication methods for intra-system communication. We further demonstrate how MGPU is used as a framework for porting existing GPU libraries to multi-device architectures. Putting our library to the test, we accelerate an iterative non-linear image reconstruction algorithm for real-time magnetic resonance imaging using multiple GPUs. We achieve a speed-up of about 1.7 using 2 GPUs and reach a final speed-up of 2.1 with 4 GPUs. These promising results lead us to conclude that multi-GPU systems are a viable solution for real-time MRI reconstruction as well as signal-processing applications in general.Comment: 15 pages, 10 figure

    Efficient multiplier-less VLSI architectures for folded pipelined complex FFT core

    Get PDF
    Fast Fourier transform (FFT) has become ubiquitous in many engineering applications. FFT is one of the most employed blocks in many communication and signal processing systems. Efficient algorithms are being designed to improve the architecture of FFT. Higher radix FFT algorithms have the traditional advantage of using less number of computational elements and are more suitable for calculating FFT of long data sequence. Among the different proposed algorithms, the split-radix FFT has shown considerable improvement in terms of reducing hardware complexity of the architecture compared to radix-2 and radix-4 FFT algorithms. Here radix-4, radix-8, and split-radix algorithms have been used in the design of different proposed complex FFT cores. The growing popularity of adopting virtual instrumentation (modular, customizable, software-defined instrumentation) has only became possible due to the use of LabVIEW with a highly interactive process known as graphical system design. The CompactRIO programmable automation controller is an advanced embedded control and data acquisition system designed for applications that require high performance and reliability. The work explains the real-time implementation of 256-point FFT and finding the power spectrum using LabVIEW and CompactRIO. New distributed arithmetic (NEDA) is one of the most used techniques in implementing multiplier-less architectures of many digital systems. In this thesis, four architectures for different FFT cores have been proposed: • Real-time implementation of FFT using CompactRIO • 32-Point Complex FFT Core Using Split-Radix Algorithm • 64-Point Complex FFT Core Using Radix-4 Algorithm • 64-Point Complex FFT Core Using Radix-8 Algorithm The proposed designs have implemented in both FPGA as well as ASIC design flows. 180nm process technology is being used for ASIC implementation. The results show the improvements of proposed designs compared to the other existing architectures

    FPGA Implementation of pipeline Digit-Slicing Multiplier-Less Radix 2 power of 2 DIF SDF Butterfly for Fourier Transform Structure

    Full text link
    The need for wireless communication has driven the communication systems to high performance. However, the main bottleneck that affects the communication capability is the Fast Fourier Transform (FFT), which is the core of most modulators. This paper presents FPGA implementation of pipeline digit-slicing multiplier-less radix 22 DIF (Decimation In Frequency) SDF (single path delay feedback) butterfly for FFT structure. The approach is taken, in order to reduce computation complexity in butterfly multiplier, the digit-slicing multiplier-less technique was utilized in the critical path of pipeline Radix-22 DIF SDF FFT structure. The proposed design focused on the trade-off between the speed and active silicon area for the chip implementation. The multiplier input data was sliced into four blocks each one with four bits to process at the same time in parallel. The new architecture was investigated and simulated with MATLAB software. The Verilog HDL code in Xilinx ISE environment was derived to describe the FFT Butterfly functionality and was downloaded to Virtex II FPGA board. Consequently, the Virtex-II FG456 Proto board was used to implement and test the design on the real hardware. As a result, from the findings, the synthesis report indicates the maximum clock frequency of 555.75 MHz with the total equivalent gate count of 32,146 is a marked and significant improvement over Radix 22 DIF SDF FFT butterfly. In comparison with the conventional butterfly architecture design which can only run at a maximum clock frequency of 200.102 MHz and the conventional multiplier can only run at a maximum clock frequency of 221.140 MHz, the proposed system exhibits better results.Comment: European Conference on Antennas and Propagation (EUCAP2011). Pp 4168- 417

    A Scalable Correlator Architecture Based on Modular FPGA Hardware, Reuseable Gateware, and Data Packetization

    Full text link
    A new generation of radio telescopes is achieving unprecedented levels of sensitivity and resolution, as well as increased agility and field-of-view, by employing high-performance digital signal processing hardware to phase and correlate large numbers of antennas. The computational demands of these imaging systems scale in proportion to BMN^2, where B is the signal bandwidth, M is the number of independent beams, and N is the number of antennas. The specifications of many new arrays lead to demands in excess of tens of PetaOps per second. To meet this challenge, we have developed a general purpose correlator architecture using standard 10-Gbit Ethernet switches to pass data between flexible hardware modules containing Field Programmable Gate Array (FPGA) chips. These chips are programmed using open-source signal processing libraries we have developed to be flexible, scalable, and chip-independent. This work reduces the time and cost of implementing a wide range of signal processing systems, with correlators foremost among them,and facilitates upgrading to new generations of processing technology. We present several correlator deployments, including a 16-antenna, 200-MHz bandwidth, 4-bit, full Stokes parameter application deployed on the Precision Array for Probing the Epoch of Reionization.Comment: Accepted to Publications of the Astronomy Society of the Pacific. 31 pages. v2: corrected typo, v3: corrected Fig. 1

    Testing Implementation of FAMTAR: Adaptive Multipath Routing

    Full text link
    Flow-Aware Multi-Topology Adaptive Routing (FAMTAR) is a new approach to multipath and adaptive routing in IP networks which enables automatic use of alternative paths when the primary one becomes congested. It provides more efficient network resource utilization and higher quality of transmission compared to standard IP routing. However, thus far it has only been evaluated through simulations. In this paper we share our experiences from building a real-time FAMTAR router and present results of its tests in a physical network. The results are in line with those obtained previously through simulations and they open the way to implementation of a production grade FAMTAR router

    Fast versions of Shor's quantum factoring algorithm

    Full text link
    We present fast and highly parallelized versions of Shor's algorithm. With a sizable quantum computer it would then be possible to factor numbers with millions of digits. The main algorithm presented here uses FFT-based fast integer multiplication. The quick reader can just read the introduction and the ``Results'' section.Comment: 37 pages, LaTeX, 1 figur

    CONFIGURABLE 2k/4k/8k FFT-IFFT CORE FOR DVB-T AND DVB-H

    Get PDF
    Modulation technique uses a modifier module IFFT signal data from frequency domain to time domain. While at the demodulation part, FFT module is used to change the return signal from the output of the IFFT and converted them from the time domain into the frequency domain. FFT�IFFT modules are made to support 2k/4k/8k FFT and IFFT algorithms. FFT�IFFT 2k/4k/8k Core are built using the radix 2, radix 4 and radix 8. Core is designed to be able to receive data continuously, without buffer (temporary data container). The FFT�IFFT 2k/4k/8k module designs started with the functional description in model. Then the design of hardware architecture is made based on functional design in model. Then the architecture design will be used in making model bit precision. Furthermore the model bit precision design is used as a foundation in designing RTL. The result of FFT�IFFT modules meet the standard specified by the DVB consortium, with a maximum test frequency of FFT�IFFT 2k/4k/8k Core is 69.36 MHz using FPGA STRATIX II EP2S60-F1020C3 that surpass the requirements in the standard DVB�T/DVB�H (40 MHz). In addition, the module has a high throughput with the average of 39.82 M sym /

    Software-Defined Radio GNSS Instrumentation for Spoofing Mitigation: A Review and a Case Study

    Full text link
    Recently, several global navigation satellite systems (GNSS) emerged following the transformative technology impact of the first GNSS: US Global Positioning System (GPS). The power level of GNSS signals as measured at the earths surface is below the noise floor and is consequently vulnerable against interference. Spoofers are smart GNSS-like interferers, which mislead the receivers into generating false position and time information. While many spoofing mitigation techniques exist, spoofers are continually evolving, producing a cycle of new spoofing attacks and counter-measures against them. Thus, upgradability of receivers becomes an important advantage for maintaining their immunity against spoofing. Software-defined radio (SDR) implementations of a GPS receiver address such flexibility but are challenged by demanding computational requirements of both GNSS signal processing and spoofing mitigation. Therefore, this paper reviews reported SDRs in the context of instrumentation capabilities for both conventional and spoofing mitigation modes. This separation is necessitated by significantly increased computational loads when in spoofing domain. This is demonstrated by a case study budget analysis

    Reliable Linear, Sesquilinear and Bijective Operations On Integer Data Streams Via Numerical Entanglement

    Get PDF
    A new technique is proposed for fault-tolerant linear, sesquilinear and bijective (LSB) operations on MM integer data streams (M3M\geq3), such as: scaling, additions/subtractions, inner or outer vector products, permutations and convolutions. In the proposed method, the MM input integer data streams are linearly superimposed to form MM numerically-entangled integer data streams that are stored in-place of the original inputs. A series of LSB operations can then be performed directly using these entangled data streams. The results are extracted from the MM entangled output streams by additions and arithmetic shifts. Any soft errors affecting any single disentangled output stream are guaranteed to be detectable via a specific post-computation reliability check. In addition, when utilizing a separate processor core for each of the MM streams, the proposed approach can recover all outputs after any single fail-stop failure. Importantly, unlike algorithm-based fault tolerance (ABFT) methods, the number of operations required for the entanglement, extraction and validation of the results is linearly related to the number of the inputs and does not depend on the complexity of the performed LSB operations. We have validated our proposal in an Intel processor (Haswell architecture with AVX2 support) via fast Fourier transforms, circular convolutions, and matrix multiplication operations. Our analysis and experiments reveal that the proposed approach incurs between 0.03%0.03\% to 7%7\% reduction in processing throughput for a wide variety of LSB operations. This overhead is 5 to 1000 times smaller than that of the equivalent ABFT method that uses a checksum stream. Thus, our proposal can be used in fault-generating processor hardware or safety-critical applications, where high reliability is required without the cost of ABFT or modular redundancy.Comment: to appear in IEEE Trans. on Signal Processing, 201
    corecore