2 research outputs found

    Performance Evaluation of Cascade ALU Architecture for Asynchronous Super-Scalar Processors

    No full text
    Current out-of-order architectures have the critical path in the memory structure. Since the memory access delay mainly consists of wire delays, the feature size reduction will make little contribution on the critical path reduction. Therefore, the performance of the out-of-order architecture will not improve in spite of an expected advance in future technologies. To solve this problem, we present a novel architecture, called the Cascade ALU architecture, in which the critical path lies in the ALU. Since the ALU latency mainly consists of gate delays, the cycle time can be reduced with feature size reduction. In the Cascade ALU architecture, the instruction execution latency varies depending on executed instructions. Thus, an asynchronous implementation is suitable for the Cascade ALU. Since asynchronous handshake overhead may be too large to enhance the processor performance with the Cascade ALU. We show a method for hiding the handshake overhead, based on the fine-grain pipelining. Finally, we show the evaluation results that demonstrate the Cascade ALU architecture can achieve a good performance scalability in the ALU latency reduction

    Performance evaluation of Cascade ALU architecture for asynchronous super-scalar processors

    No full text
    Current out-of-order architectures have the critical path in the memory structure. Since the memory access delay mainly consists of wire delays, the feature size reduction will make little contribution on the critical path reduction. Therefore, the performance of the out-of-order architecture will not improve in spite of an expected advance in future technologies. To solve this problem, we present a novel architecture, called the Cascade ALU architecture, in which the critical path lies in the ALU. Since the ALU latency mainly consists of gate delays, the cycle time can be reduced with feature size reduction. In the Cascade ALU architecture, the instruction execution latency varies depending on executed instructions. Thus, an asynchronous implementation is suitable for the Cascade ALU. Since asynchronous handshake overhead may be too large to enhance the processor performance with the Cascade ALU. We show a method for hiding the handshake overhead, based on the fine-grain pipelining. Finally, we show the evaluation results that demonstrate the Cascade ALU architecture can achieve a good performance scalability in the ALU latency reduction
    corecore