19 research outputs found

    Microarchitectural Enhancements for Configurable Multi-Threaded Soft Processors

    No full text
    This paper describes a number of microarchitectural tech-niques for supporting multithreading in soft processor cores. These include a new thread scheduler that combines inter-leaved and block multithreading; a table of operation laten-cies (TOOL) for determining instruction latencies; support of arbitrary-latency custom computational units; and amulti-banked register file for supporting simultaneous write-back operations from different threads. Our results show that four-way, multithreaded, processors achieve speedups of up to 26 % over a single-threaded processor executing bench-marks that only use regular instructions, and up to 47%when executing benchmarks that include long-latency instructions. 1

    Supporting multithreading in configurable soft processor cores

    No full text
    In this paper, we describe the organization and microarchitecture of MT-MB, a configurable implementation of the Xilinx MicroBlaze soft processor that supports multithreading. Using a suite of synthetic benchmarks, we evaluate five variations of MT-MB and show that multithreading is very effective in hiding the variable latencies associated with custom instructions and custom computational units. Our experimental results show that interleaved and hybrid multithreading achieve speedup factors of 1.10 × to 5.13 × compared to our single-threaded baseline soft processor

    Customizing the Datapath and ISA of Soft VLIW Processors

    No full text
    Abstract. In this paper, we examine the trade-offs in performance and area due to customizing the datapath and instruction set architecture of a soft VLIW processor implemented in a high-density FPGA. In addition to describing our processor, we describe a number of microarchitectural optimizations we used to reduce the area of the datapath. We also describe the tools we developed to customize, generate, and program our processor. Our experimental results show that datapath and instruction set customization achieve high levels of performance, and that using onchip resources and implementing microarchitectural optimizations like selective data forwarding help keep FPGA resource utilization in check.

    A Comparison of VLIW and Traditional DSP Architectures for Compiled Code

    No full text
    Although programmable digital signal processors comprise a significant fraction of the processors sold in the world, their basic architectures have changed little since they were originally developed. The evolu-tion and implementation of these processors has been based more on commonly held beliefs than quantitative data. In this paper, we show that by changing to a VLIW model with more registers, orthogonal instructions, and better flexibility for instruction-level parallelism, it is possible to achieve at least a factor of 1.3–2 in performance gain over the traditional DSP architectures on a suite of DSP benchmarks. When accounting for the effect of restrictive register use in traditional DSP architectures, we argue that the actual performance gain is at least a factor of 1.8–2.8. To counter an argument about extra chip area, we show that the cost of adding more registers is minimal when the overall area of the processor and the performance benefits are considered. Although a VLIW architecture has a much lower instruction density, we also show that the average number of instructions is actually reduced because there are fewer memory operations. A significant contribution to the better performance of the VLIW architecture is the ability to express more instances of parallelism than the restricted parallelism of the more traditional architectures. However, efficient techniques for encoding long instructions are required to make the higher flexibility and better perfor-mance of VLIW architectures feasible
    corecore