The number of pipeline stages separating dynamic instruction scheduling from instruction execution has increased considerably in recent out-of-order microprocessor implementations, forcing the scheduler to allocate functional units and other execution resources several cycles before they are actually used. Unfortunately, several proposed microarchitectural optimizations become less desirable or even impossible in such an environment, since they require instantaneous or near-instantaneous changes in execution behavior and resource usage in response to dynamic events that occur during instruction execution. Since they are detected several cycles after scheduling decisions have already been made, such dynamic responses are infeasible. To overcome this limitation, we propose to implement optimizations by performing what we call speculative decode. Speculative decode alters the mapping between user-visible instructions and the implemented core instructions based on observed runtime characteristics and generates speculative instruction sequences. In these sequences, optimizations are prescheduled in a manner compatible with realistic pipelines with multicycle scheduling latency. We present case studies on memory reference combining and silent store squashing, and demonstrate that speculative decode performs comparably or even better than impractical in-core implementations that require zero-cycle scheduling latency
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.