559 research outputs found

    A Pipelined FFT Architecture for Real-Valued Signals

    Full text link

    VLSI Implementation of Reconfigurable FFT Processor Using Vedic Mathematics

    Get PDF
    Fast Fourier transform has been used in wide range of applications such as digital signal processing and wireless communications. In this we present a implementation of reconfigurable FFT processor using single path delay feedback architecture. To eliminate the use of read only memory’s (ROM’S). These are used to store the twiddle factors. To achieve the ROM-less FFT processor the proposed architecture applies the bit parallel multipliers and reconfigurable complex multipliers, thus consuming less power. The proposed architecture, Reconfigurable FFT processor based on Vedic mathematics is designed, simulated and implemented using VIRTEX-5 FPGA. Urdhva Triyakbhyam algorithm is an ancient Vedic mathematic sutra, which is used to achieve the high performance. This reconfigurable DIF-FFT is having the high speed and small area as compared with other conventional DIF-FF

    Cache Equalizer: A Cache Pressure Aware Block Placement Scheme for Large-Scale Chip Multiprocessors

    Get PDF
    This paper describes Cache Equalizer (CE), a novel distributed cache management scheme for large scale chip multiprocessors (CMPs). Our work is motivated by large asymmetry in cache sets usages. CE decouples the physical locations of cache blocks from their addresses for the sake of reducing misses caused by destructive interferences. Temporal pressure at the on-chip last-level cache, is continuously collected at a group (comprised of cache sets) granularity, and periodically recorded at the memory controller to guide the placement process. An incoming block is consequently placed at a cache group that exhibits the minimum pressure. CE provides Quality of Service (QoS) by robustly offering better performance than the baseline shared NUCA cache. Simulation results using a full-system simulator demonstrate that CE outperforms shared NUCA caches by an average of 15.5% and by as much as 28.5% for the benchmark programs we examined. Furthermore, evaluations manifested the outperformance of CE versus related CMP cache designs

    Building Conflict-Free FFT Schedules

    Get PDF
    A conflict-free schedule lets an FFT run to completion without ever having to pause for memory-conflict resolution. We show how to build such schedules for FFTs having any number of butterfly units B operating at any radix R, transforming any number of datapoints D. Our algorithm works for FFT datapaths with or without pipeline overlap, and for memory banks having any number of access ports. Specifically, it enables construction of conflict-free schedules using single-ported memory banks, which require less area than more traditional multi-ported designs

    FPGA Implementation of Fast Fourier Transform Core Using NEDA

    Get PDF
    Transforms like DFT are a major block in communication systems such as OFDM, etc. This thesis reports architecture of a DFT core using NEDA. The advantage of the proposed architecture is that the entire transform can be implemented using adder/subtractors and shifters only, thus minimising the hardware requirement compared to other architectures. The proposed design is implemented for 16-bit data path (12–bit for comparison) considering both integer representation as well as fixed point representation, thus increasing the scope of usage. The proposed design is mapped on to Xilinx XC2VP30 FPGA, which is fabricated using 130 nm process technology. The maximum on board frequency of operation of the proposed design is 122 MHz. NEDA is one of the techniques to implement many signal processing systems that require multiply and accumulate units. FFT is one of the most employed blocks in many communication and signal processing systems. The FPGA implementation of a 16 point radix-4 complex FFT is proposed. The proposed design has improvement in terms of hardware utilization compared to traditional methods. The design has been implemented on a range of FPGAs to compare the performance. The maximum frequency achieved is 114.27 MHz on XC5VLX330 FPGA and the maximum throughput, 1828.32 Mbit/s and minimum slice delay product, 9.18. The design is also implemented using synopsys DC synthesis in both 65 nm and 180 nm technology libraries. The advantages of multiplier-less architectures are reduced hardware and improved latency. The multiplier-less architectures for the implementation of radix-2^2 folded pipelined complex FFT core are based on NEDA. The number of points considered in the work is sixteen and the folding is done by a factor of four. The proposed designs are implemented on Xilinx XC5VSX240T FPGA. Proposed designs based on NEDA have reduced area over 83%. The observed slice-delay product for NEDA based designs are 2.196 and 5.735

    New FFT/IFFT Factorizations with Regular Interconnection Pattern Stage-to-Stage Subblocks

    Get PDF
    Les factoritzacions de la FFT (Fast Fourier Transform) que presenten un patró d’interconnexió regular entre factors o etapes son conegudes com algorismes paral·lels, o algorismes de Pease, ja que foren originalment proposats per Pease. En aquesta contribució s’han desenvolupat noves factoritzacions amb blocs que presenten el patró d’interconnexió regular de Pease. S’ha mostrat com aquests blocs poden ser obtinguts a una escala prèviament seleccionada. Les noves factoritzacions per ambdues FFT i IFFT (Inverse FFT) tenen dues classes de factors: uns pocs factors del tipus Cooley-Tukey i els nous factors que proporcionen la mateix patró d’interconnexió de Pease en blocs. Per a una factorització donada, els blocs comparteixen dimensions, el patró d’interconnexió etapa a etapa i a més cada un d’ells pot ser calculat independentment dels altres.FFT (Fast Fourier Transform) factorizations presenting a regular interconnection pattern between factors or stages are known as parallel algorithms, or Pease algorithms since were first proposed by Pease. In this paper, new FFT/IFFT (Inverse FFT) factorizations with blocks that exhibit regular Pease interconnection pattern are derived. It is shown these blocks can be obtained at a previously selected scale. The new factorizations for both the FFT and IFFT have two kinds of factors: a few Cooley-Tukey type factors and new factors providing the same Pease interconnection pattern property in blocks. For a given factorization, these blocks share dimensions, the interconnection pattern stage-to-stage, and all of them can be calculated independently from one another.Las factoritzaciones de la FFT (Fast Fourier Transform) que presentan un patrón de interconexiones regular entre factores o etapas son conocidas como algoritmos paralelos, o algoritmos de Pease, puesto que fueron originalmente propuestos por Pease. En esta contribución se han desarrollado nuevas factoritzaciones en subbloques que presentan el patrón de interconexión regular de Pease. Se ha mostrado como estos bloques pueden ser obtenidos a una escalera previamente seleccionada. Las nuevas factoritzaciones para ambas FFT y IFFT (Inverse FFT) tienen dos clases de factores: unos pocos factores del tipo Cooley-Tukey y los nuevos factores que proporcionan el mismo patrón de interconexión de Pease en bloques. Para una factoritzación dada, los bloques comparten dimensiones, patrón d’interconexión etapa a etapa y además cada uno de ellos puede ser calculado independientemente de los otros

    Hardware/Software Co-design for Multicore Architectures

    Get PDF
    Siirretty Doriast

    VLSI Architecture for Polar Codes Using Fast Fourier Transform-Like Design

    Get PDF
    Polar code is a novel and high-performance communication algorithm with the ability to theoretically achieving the Shannon limit, which has attracted increasing attention recently due to its low encoding and decoding complexity. Hardware optimization further reduces the cost and achieves better timing performance enabling real-time applications on resource-constrained devices. This thesis presents an area-efficient architecture for a successive cancellation (SC) polar decoder. Our design applies high-level transformations to reduce the number of Processing Elements (PEs), i.e., only log2 N pre-computed PEs are required in our architecture for an N-bit code. We also propose a customized loop-based shifting register to reduce the consumption of the delay elements further. Our experimental results demonstrate that our architecture reduces 98.90% and 93.38% in the area and area-time product, respectively, compared to prior works
    corecore