259 research outputs found

    Extensible microprocessor without interlocked pipeline stages (emips), the reconfigurable microprocessor

    Get PDF
    In this thesis we propose to realize the performance benefits of applicationspecific hardware optimizations in a general-purpose, multi-user system environment using a dynamically extensible microprocessor architecture. We have called our dynamically extensible microprocessor design the Extensible Microprocessor without Interlocked Pipeline Stages, or eMIPS. The eMIPS architecture uses the interaction of fixed and configurable logic available in modern Field Programmable Gate Array (FPGA). This interaction is used to address the limitations of current microprocessor architectures based solely on Application Specific Integrated Circuits (ASIC). These limitations include inflexibility, size, and application specific performance optimization. The eMIPS system allows multiple secure extensions to load dynamically and to plug into the stages of a pipelined central processing unit (CPU) data path, thereby extending the core instruction set of the microprocessor. Extensions can also be used to realize on-chip peripherals, and if area permits, even multiple cores. Extension instructions reduce dramatically the execution time of frequently executed instruction patterns. These new functionalities we have developed can be exploited by patching the binaries of existing applications, without any changes to the compilers. A FPGA based workstation prototype and a flexible simulation system implementating this design demonstrates speedups of 2x-3x on a set of applications that include video games, real-time programs and the SPEC2000 integer benchmarks. eMIPS is the first realized workstation based entirely on a dynamically extensible microprocessor that is safe for general purpose, multi-user applications. By exposing the individual stages of the data path, eMIPS allows optimizations not previously possible. This includes permitting safe and coherent accesses to memory from within an extension, optimizing multi-branched blocks, and throwing precise and restart able exceptions from within an extension. This work describes a simplified implementation of an extensible microprocessor architecture based on the Microprocessor without Interlocked Pipeline Stages (MIPS) Reduced Instruction Set Computer (RISC) architecture. The concepts and methods contained within this thesis may be applied to other similar architectures. Given this simplified prototype we look forward to propose how this architecture will be expanded as it matures

    REAL-TIME ADAPTIVE PULSE COMPRESSION ON RECONFIGURABLE, SYSTEM-ON-CHIP (SOC) PLATFORMS

    Get PDF
    New radar applications need to perform complex algorithms and process a large quantity of data to generate useful information for the users. This situation has motivated the search for better processing solutions that include low-power high-performance processors, efficient algorithms, and high-speed interfaces. In this work, hardware implementation of adaptive pulse compression algorithms for real-time transceiver optimization is presented, and is based on a System-on-Chip architecture for reconfigurable hardware devices. This study also evaluates the performance of dedicated coprocessors as hardware accelerator units to speed up and improve the computation of computing-intensive tasks such matrix multiplication and matrix inversion, which are essential units to solve the covariance matrix. The tradeoffs between latency and hardware utilization are also presented. Moreover, the system architecture takes advantage of the embedded processor, which is interconnected with the logic resources through high-performance buses, to perform floating-point operations, control the processing blocks, and communicate with an external PC through a customized software interface. The overall system functionality is demonstrated and tested for real-time operations using a Ku-band testbed together with a low-cost channel emulator for different types of waveforms

    A Study of Multiprocessor Systems using the Picoblaze 8-bit Microcontroller Implemented on Field Programmable Gate Arrays

    Get PDF
    As Field Programmable Gate Arrays (FPGAs) are becoming more capable of implementing complex logic circuits, designers are increasingly choosing them over traditional microprocessor-based systems for implementing digital controllers and digital signal processing applications. Indeed, as FPGAs are being built using state-of-the-art deep submicron CMOS processes, the increased amount of logic and memory resources allows such FPGA-based implementations to compete in terms of speed, complexity, and power dissipation with most custom-built chips, but at a fraction of the development costs. The modern FPGA is now capable of implementing multiple instances of configurable processors that are completely specified by a high-level descriptor language. Such arrays of soft processor cores have opened up new design possibilities that include complex embedded systems applications that were previously implemented by custom multiprocessor chips. As the FPGA-based multiprocessor system is completely configurable by the user, it can be optimized for speed and power dissipation to fit a given application. The goal of this thesis is to investigate design methods for implementing an array of soft processor cores using the Xilinx FPGA-based 8-bit microcontroller known as PicoBlaze. While development tools exist for the larger 32-bit processor from Xilinx known as MicroBlaze, no such resources are currently available for the PicoBlaze microcontroller. PicoBlaze benefits in applications that requires only less data bits (less than 8 bits). For example, consider the gene sequencing or DNA sequencing in which the processing requires only 2 to 5 bits. In such an application, PicoBlaze can be a simple processor to produce the results. Also, the PicoBlaze unit offers a finer level of granularity and hence consumes fewer resources than the larger 32-bit MicroBlaze processor. Hence, the former will find applications in embedded systems requiring a complex design to be partitioned over several processors but where only an 8-bit datapath is required

    Design of a Processor Optimized for Syntax Parsing in Video Decoders

    No full text
    8International audienceHeterogeneous platforms aim to offer both performance and flexibility by providing designers processors and programmable logical units on a single platform. Processors implemented on these platforms are usually soft-cores (e.g. Altera NIOS) or ASIC (e.g. ARM Cortex-A8). However, these processors still face limitations in terms of performance compared to full hardware designs in particular for real-time video decoding applications. We present in this paper an innovative approach to improve performance using both a processor optimized for the syntax parsing (an Application-Specific Instruction-set Processor) and a FPGA. The case study has been synthesized on a Xilinx FPGA at a frequency of 100MHz and we estimate the performance that could be obtained with an ASIC

    From plasma to beefarm: Design experience of an FPGA-based multicore prototype

    Get PDF
    In this paper, we take a MIPS-based open-source uniprocessor soft core, Plasma, and extend it to obtain the Beefarm infrastructure for FPGA-based multiprocessor emulation, a popular research topic of the last few years both in the FPGA and the computer architecture communities. We discuss various design tradeoffs and we demonstrate superior scalability through experimental results compared to traditional software instruction set simulators. Based on our experience of designing and building a complete FPGA-based multiprocessor emulation system that supports run-time and compiler infrastructure and on the actual executions of our experiments running Software Transactional Memory (STM) benchmarks, we comment on the pros, cons and future trends of using hardware-based emulation for research.Peer ReviewedPostprint (author's final draft

    Accelerating FPGA-based evolution of wavelet transform filters by optimized task scheduling

    Get PDF
    Adaptive embedded systems are required in various applications. This work addresses these needs in the area of adaptive image compression in FPGA devices. A simplified version of an evolution strategy is utilized to optimize wavelet filters of a Discrete Wavelet Transform algorithm. We propose an adaptive image compression system in FPGA where optimized memory architecture, parallel processing and optimized task scheduling allow reducing the time of evolution. The proposed solution has been extensively evaluated in terms of the quality of compression as well as the processing time. The proposed architecture reduces the time of evolution by 44% compared to our previous reports while maintaining the quality of compression unchanged with respect to existing implementations. The system is able to find an optimized set of wavelet filters in less than 2 min whenever the input type of data changes

    Comparative Study of Keccak SHA-3 Implementations

    Get PDF
    This paper conducts an extensive comparative study of state-of-the-art solutions for im- plementing the SHA-3 hash function. SHA-3, a pivotal component in modern cryptography, has spawned numerous implementations across diverse platforms and technologies. This research aims to provide valuable insights into selecting and optimizing Keccak SHA-3 implementations. Our study encompasses an in-depth analysis of hardware, software, and software–hardware (hybrid) solutions. We assess the strengths, weaknesses, and performance metrics of each approach. Critical factors, including computational efficiency, scalability, and flexibility, are evaluated across differ- ent use cases. We investigate how each implementation performs in terms of speed and resource utilization. This research aims to improve the knowledge of cryptographic systems, aiding in the informed design and deployment of efficient cryptographic solutions. By providing a comprehensive overview of SHA-3 implementations, this study offers a clear understanding of the available options and equips professionals and researchers with the necessary insights to make informed decisions in their cryptographic endeavors
    • …
    corecore