We investigate the utility of augmenting a microprocessor with a single
execution pipeline by adding a second copy of the execution pipeline in
parallel with the existing one. The resulting dual-hardware-threaded
microprocessor has two identical, independent, single-issue in-order execution
pipelines (hardware threads) which share a common memory sub-system (consisting
of instruction and data caches together with a memory management unit). From a
design perspective, the assembly and verification of the dual threaded
processor is simplified by the use of existing verified implementations of the
execution pipeline and a memory unit. Because the memory unit is shared by the
two hardware threads, the relative area overhead of adding the second hardware
thread is 25\% of the area of the existing single threaded processor. Using an
FPGA implementation we evaluate the performance of the dual threaded processor
relative to the single threaded one. On applications which can be parallelized,
we observe speedups of 1.6X to 1.88X. For applications that are not
parallelizable, the speedup is more modest. We also observe that the dual
threaded processor performance is degraded on applications which generate large
numbers of cache misses