6 research outputs found

    Speculative Trace Scheduling in VLIW Processors

    No full text
    VLIW processors are statically scheduled processors and their performance depends on the quality of schedules generated by the compiler's scheduler. We propose a new scheduling scheme where the application is first divided into decision trees and then further split into traces. Traces are speculatively scheduled on the processor based on their probability of execution. We have developed a tool "SpliTree" to generate traces automatically. Using dynamic branch prediction for scheduling traces our scheme achieves approximately 1.4x performance improvement over that using decision trees for Spec92 benchmarks simulated on TriMedia tm

    Speculative Trace Scheduling in VLIW Processors

    No full text
    VLIW processors are statically scheduled processors and their performance depends on the quality of schedules generated by the compiler's scheduler. We propose a new scheduling scheme where the application is first divided into decision trees and then further split into traces. Traces are speculatively scheduled on the processor based on their probability of execution. We have developed a tool "SpliTree" to generate traces automatically. By using dynamic branch prediction for scheduling traces our scheme achieves approximately 1.4x1.4x performance improvement over that using decision trees for Spec92 benchmarks simulated on TriMediatmTriMedia^{tm}

    Multithreaded Architectural Support for Speculative Trace Scheduling in VLIW Processors

    No full text
    VLIW processors are statically scheduled processors and their performance depends on the quality of schedules generated by the compiler's scheduler. We propose a multi-threaded architectural support for speculative trace scheduling in VLIW processors. In this multithreaded architecture the next most probable trace is speculatively executed, overlapping the stall cycles of the processor during cache misses and page faults. Switching between traces is achieved with the help of special hardware units viz. operation state buffers and trace buffers. We observe an 8.39% reduction in the overall misprediction penalty as compared to that incurred when the stall cycles due to cache misses alone are not overlapped
    corecore