2 research outputs found

    Performance Advantages of Merging Instruction- and Data-Level Parallelism

    No full text
    This paper presents a new architecture based on addding a vector pipeline to a superscalar microprocessor. The goal of this paper is to show that instruction-level parallelism (ILP) and data-level parallelism (DLP) can be merged in a single architecture to execute regular vectorizable code at a performance level that can not be achieved using only ILP techniques. We present an analysis of the two paradigms at the instruction set architecture (ISA) level that shows that the DLP model has several advantages: executes fewer instructions, fewer overall operations (by factors as large as 1.7), and generally executes fewer memory accesses. We then analyze the ILP model in terms of IPC. Our simulations show that a 4-way machine achieves IPCs in the range 1.03-1.52 and that by scaling to 16-way, only a 26% of the peak IPC is achieved. The combined ILP+DLP model, on the contrary, is shown to perform from 1.24 to 2.84 times better than the 4-way ILP machine. Moreover, when we scale up the ILP+DL..

    A Case for Merging the ILP and DLP Paradigms

    No full text
    The goal of this paper is to show that instruction level parallelism (ILP) and data-level parallelism (DLP) can be merged in a stngle architecture to ezecute vectorizable code at a performance level that can not be achieved using either paradigm on its own. We will show that the combination of the two techniques yields very high performance at a low cost and a low complexity. We will show that this architecture can reach a performance equivalent to a supers calar processor that sustained 10 instructions per cycle. We will see that the machine exploiting both types of paralleiism improves upon the IL P-only machine by factors of 1.5–1.8. We also present a study on the scalability of both paradigms and show that, when we increase resources to reach a 16-issue machine, the advantage of the IL P+DLP machine over the IL P-only machine increases up to 2.0-3.45. Whiie the peak achieved IPC for the ILP machine is 4, the ILP+DLP machine exceeds 10 instructions per cycle.
    corecore