With reducing feature sizes, more transistors can be integrated on the chip. The increased transistor budget can be utilized to improve the instruction level parallelism (ILP) exploited from the processor. However, the transistors cannot be used to arbitrarily increase the processor width and size in the hope of exploiting better ILP. In this paper, we propose an architecture where the superscalar datapath is tightly coupled with a reconfigurable unit (RFU). The reconfiguration unit is configured to execute the traces of dynamic instructions that are frequently executed. To address the data dependency issues between the instructions in the superscalar and the RFU, we propose to execute the trace on the RFU with predicted values. When the trace instructions reach the issue queue in the superscalar, the predictions are validated. In this technique, performance improvement is obtained for correct prediction, whereas no performance degradation is incurred for mispredictions. With this architecture, we observe an average instructions per cycle (IPC) improvement of about 11 % over the simulated SPEC 2000 benchmarks, using a very small last value data value predictor
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.