Search CORE

19 research outputs found

Application-Specific Instruction-Set Architectures for Embedded DSP Applications

Author: Mazen A. R. Saghir
Publication venue
Publication date: 01/01/1998
Field of study

ii ii

University of Toronto Research Repository

CiteSeerX

Microarchitectural Enhancements for Configurable Multi-Threaded Soft Processors

Author: Mazen A. R. Saghir
Nabil Ghanem
Roger Moussali
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2007
Field of study

This paper describes a number of microarchitectural tech-niques for supporting multithreading in soft processor cores. These include a new thread scheduler that combines inter-leaved and block multithreading; a table of operation laten-cies (TOOL) for determining instruction latencies; support of arbitrary-latency custom computational units; and amulti-banked register file for supporting simultaneous write-back operations from different threads. Our results show that four-way, multithreaded, processors achieve speedups of up to 26 % over a single-threaded processor executing bench-marks that only use regular instructions, and up to 47%when executing benchmarks that include long-latency instructions. 1

CiteSeerX

Crossref

Supporting multithreading in configurable soft processor cores

Author: Mazen A. R. Saghir
Nabil Ghanem
Roger Moussali
Publication venue
Publication date: 01/01/2007
Field of study

In this paper, we describe the organization and microarchitecture of MT-MB, a configurable implementation of the Xilinx MicroBlaze soft processor that supports multithreading. Using a suite of synthetic benchmarks, we evaluate five variations of MT-MB and show that multithreading is very effective in hiding the variable latencies associated with custom instructions and custom computational units. Our experimental results show that interleaved and hybrid multithreading achieve speedup factors of 1.10 × to 5.13 × compared to our single-threaded baseline soft processor

CiteSeerX

Crossref

Customizing the Datapath and ISA of Soft VLIW Processors

Author: Mazen A. R. Saghir
Mohamad El-majzoub
Patrick Akl
Publication venue
Publication date: 01/01/2007
Field of study

Abstract. In this paper, we examine the trade-offs in performance and area due to customizing the datapath and instruction set architecture of a soft VLIW processor implemented in a high-density FPGA. In addition to describing our processor, we describe a number of microarchitectural optimizations we used to reduce the area of the datapath. We also describe the tools we developed to customize, generate, and program our processor. Our experimental results show that datapath and instruction set customization achieve high levels of performance, and that using onchip resources and implementing microarchitectural optimizations like selective data forwarding help keep FPGA resource utilization in check.

CiteSeerX

Crossref

A Comparison of VLIW and Traditional DSP Architectures for Compiled Code

Author: Corinna G. Lee
Mazen A. R. Saghir
Paul Chow
Publication venue
Publication date
Field of study

Although programmable digital signal processors comprise a significant fraction of the processors sold in the world, their basic architectures have changed little since they were originally developed. The evolu-tion and implementation of these processors has been based more on commonly held beliefs than quantitative data. In this paper, we show that by changing to a VLIW model with more registers, orthogonal instructions, and better flexibility for instruction-level parallelism, it is possible to achieve at least a factor of 1.3–2 in performance gain over the traditional DSP architectures on a suite of DSP benchmarks. When accounting for the effect of restrictive register use in traditional DSP architectures, we argue that the actual performance gain is at least a factor of 1.8–2.8. To counter an argument about extra chip area, we show that the cost of adding more registers is minimal when the overall area of the processor and the performance benefits are considered. Although a VLIW architecture has a much lower instruction density, we also show that the average number of instructions is actually reduced because there are fewer memory operations. A significant contribution to the better performance of the VLIW architecture is the ability to express more instances of parallelism than the restricted parallelism of the more traditional architectures. However, efficient techniques for encoding long instructions are required to make the higher flexibility and better perfor-mance of VLIW architectures feasible

CiteSeerX

Exploiting dual data-memory banks in digital signal processors

Author: Aho Alfred V.
Corinna G. Lee
Gwennap Linley
Hennessy John
Kohn L.
Mazen A. R. Saghir
Michael
Microprocessor Technical Brief' MIPS
Paul Chow
Procaskey Carla
Saghir Mazen A. R.
Saghir Mazen A. R.
Sudarsanam Ashok
Turley Jim
Zivojnovic Vojin
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref