37,894 research outputs found

    SIMD pipelined processor implemented on a FPGA

    Get PDF
    The goal of this thesis was to create a processor using VHDL that could be used for educational purposes as well as a stepping stone in creating a reconfigurable system for digital signal processing or image processing applications. To do this a subset of MIPS instructions were chosen to demonstrate functionality within a five stage pipeline (instruction fetch, instruction decode, execution, memory, and write back) processor in simulation and in synthesis. A hazard controller was implemented to handle data forwarding and stalling. The basic MIPS architecture was extended by adding singlecycle multiplication functionality and single-cycle SIMD instructions. The architecture contains parameters for easy modification of SIMD units depending on the needs of the processor. The SIMD architecture was designed with distributed memory so that every memory unit received the same address. This simplifies the address logic so that the processor does not have to use a complex addressing mode. The memory can be pictured as row and columns method of access. The SIMD instructions were chosen to be able to perform binary operations to implement future morphological operations and to use the multiply and add operations for implementing MACs to perform convolution and filtering operations in future image processing applications. The board being used to verify the processor was a Xilinx University Program (XUP) board that contains Xilinx Virtex II Pro XC2VP30 FPGA, package FF896. The maximum number of units that can be instantiated in the FPGA on the XUP board is eight units which would use the entire FPGA slice area. This allows the processor to complete eight sets of 32-bit data operations per cycle when the SIMD pipeline is full. The design was shown to operate at the maximum speed of 100 MHz and utilize all the area of the FPGA. The processor was verified in both simulation and synthesis. The new soft-core 32-bit SIMD processor extends existing soft-core processors in that it provides a reconfigurable SIMD-pipeline allowing it to operate on multiple inputs concurrently, with 32-bit operands and a single-cycle throughput

    Low Power Processor Architectures and Contemporary Techniques for Power Optimization – A Review

    Get PDF
    The technological evolution has increased the number of transistors for a given die area significantly and increased the switching speed from few MHz to GHz range. Such inversely proportional decline in size and boost in performance consequently demands shrinking of supply voltage and effective power dissipation in chips with millions of transistors. This has triggered substantial amount of research in power reduction techniques into almost every aspect of the chip and particularly the processor cores contained in the chip. This paper presents an overview of techniques for achieving the power efficiency mainly at the processor core level but also visits related domains such as buses and memories. There are various processor parameters and features such as supply voltage, clock frequency, cache and pipelining which can be optimized to reduce the power consumption of the processor. This paper discusses various ways in which these parameters can be optimized. Also, emerging power efficient processor architectures are overviewed and research activities are discussed which should help reader identify how these factors in a processor contribute to power consumption. Some of these concepts have been already established whereas others are still active research areas. © 2009 ACADEMY PUBLISHER

    Data Provenance and Management in Radio Astronomy: A Stream Computing Approach

    Get PDF
    New approaches for data provenance and data management (DPDM) are required for mega science projects like the Square Kilometer Array, characterized by extremely large data volume and intense data rates, therefore demanding innovative and highly efficient computational paradigms. In this context, we explore a stream-computing approach with the emphasis on the use of accelerators. In particular, we make use of a new generation of high performance stream-based parallelization middleware known as InfoSphere Streams. Its viability for managing and ensuring interoperability and integrity of signal processing data pipelines is demonstrated in radio astronomy. IBM InfoSphere Streams embraces the stream-computing paradigm. It is a shift from conventional data mining techniques (involving analysis of existing data from databases) towards real-time analytic processing. We discuss using InfoSphere Streams for effective DPDM in radio astronomy and propose a way in which InfoSphere Streams can be utilized for large antennae arrays. We present a case-study: the InfoSphere Streams implementation of an autocorrelating spectrometer, and using this example we discuss the advantages of the stream-computing approach and the utilization of hardware accelerators

    CMOS VLSI correlator design for radio-astronomical signal processing : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Engineering at Massey University, Auckland, New Zealand

    Get PDF
    Multi-element radio telescopes employ methods of indirect imaging to capture the image of the sky. These methods are in contrast to direct imaging methods whereby the image is constructed from sensor measurements directly and involve extensive signal processing on antenna signals. The Square Kilometre Array, or the SKA, is a future radio telescope of this type that, once built, will become the largest telescope in the world. The unprecedented scale of the SKA requires novel solutions to be developed for its signal processing pipeline one of the most resource-consuming parts of which is the correlator. The SKA uses the FX correlator construction that consists of two parts: the F part that translates antenna signals into frequency domain and the X part that cross-correlates these signals between each other. This research focuses on the integrated circuit design and VLSI implementation issues of the X part of a very large FX correlator in 28 nm and 130 nm CMOS. The correlator’s main processing operation is the complex multiply-accumulation (CMAC) for which custom 28 nm CMAC designs are presented and evaluated. Performance of various memories inside the correlator also affects overall efficiency, and input-buffered and output-buffered approaches are considered with the goal of improving upon it. For output-buffered designs, custom memory control circuits have been designed and prototyped in 130 nm that improve upon eDRAM by taking advantage of sequential access patterns. For the input-buffered architecture, a new scheme is proposed that decreases the usage of the input-buffer memory by a third by making use of multiple accumulators in every CMAC. Because cross-correlation is a very data-intensive process, high-performance SerDes I/O is essential to any practical ASIC implementation. On the I/O design, the 28 nm full-rate transmitter delivering 15 Gbps per lane is presented. This design consists of the scrambler, the serialiser, the digital VCO with analog fine-tuning and the SST driver including features of a 4-tap FFE, impedance tuning and amplitude tuning
    • …
    corecore