スーパスカラプロセッサのレジスタファイルの面積・エネルギー効率向上に関する研究 by 山田 淳二 & Yamada Junji
論 文 の 内 容 の 要 旨 
 
 
 
論文題目    Register Files of Superscalar Processors 
for Area and Energy Efficiency 
（スーパスカラプロセッサのレジスタファイルの面積・エ
ネルギー効率向上に関する研究） 
 
           氏  名    山田 淳二 
 
 
 
 
In the era of multicore processors, the area and energy efficiency of out-of-order 
superscalar processor cores is all the more important. It is because a multicore 
processor with more efficient cores can have a larger number of cores, and 
consequently more computational power. However, the region that includes the 
register file is one of the hot spots, and limits the computational power of the 
cores. 
The area and energy consumption of the register file is proportional to the square 
of the number of ports. Thus, reducing its ports is effective to downscale the 
register file, and a number of techniques have been proposed to do so. This thesis 
mainly focuses on the two techniques, introducing a register cache, and 
multibanking the register file. 
First, the author designed a register cache system in detail. The register cache 
is a cache for the main register file. Compared with the original register file, 
the register cache is smaller because it has a smaller size; the main register 
file is smaller because it has fewer ports. However, conventional register cache 
systems suffer from low IPC (Instructions Per Cycle) due to register cache misses. 
Shioya, et al. solved this problem with Non-latency Oriented Register Cache System 
(NORCS). Researchers in NVIDIA adopted this idea for their GPUs. 
However, they did not show detailed design of NORCS. The original article evaluated 
NORCS from the viewpoint of microarchitecture, and used CACTI, a design space 
exploration tool for usual instruction/data caches (not for register caches). In 
contrast, the authors designed NORCS with FreePDK45, an open source process design 
kit for 45nm technology, for detailed evaluation from the viewpoint of LSI design. 
The results with FreePDK45 are consistent with that of the original article. The 
author also performed SPICE simulations with RC parasitics to precisely estimate 
the latency of the register cache system. 
Second, the author proposes the two architectural techniques for multibanked 
register files. Multibanking is the ultimate way to reduce the register file ports. 
Multibanking divides one n-port register file into n (or more) single-port banks 
while maintaining the throughput. Although multibanking achieves the minimum 
number of ports (i.e., 1), pipeline disturbance caused by bank conflicts can 
considerably degrade the IPC. To reduce the bank conflict probability of 
multibanked register files, this thesis shows the two microarchitectural 
techniques; one is Bank-Aware Instruction Scheduler (BAIS), and the other is 
Skewed Multistaged Multibanked Register File (MStage). 
BAIS schedules the instructions so that no bank conflict occurs in the stages to 
read/write the register file. The idea of bank-aware scheduling itself is not new. 
Prior studies briefly mentioned the possibility of bank-aware scheduling, or 
rejected it because it could increase the latency. On the contrary, the author 
shows an implementation of BAIS and clarifies that the latency of the logic is 
not practically increased. Although bank-aware scheduler uses as many arbiters 
as the number of banks, they do not practically prolong the latency because they 
work in parallel. 
In contrast, MStage is a totally new microarchitecture. MStage has two stages to 
read the bank of the multibanked register file, and an instruction that missed 
the bank because of a bank conflict still has a second chance to read the same 
bank in the second stage. As a result, MStage drastically reduces the pipeline 
disturbance caused by bank conflicts. This thesis also shows the analytic 
solutions for the pipeline disturbance probabilities of several multibanked 
register files. 
The evaluation results show that, from NORCS, BAIS with 24 banks achieves a 23.6% 
and 61.8%, and MStage with 18 banks achieves a 40.6% and 68.9% reduction in area 
and in energy consumption, while maintaining a relative IPC of 97.2% and 97.3%, 
respectively. In summary, NORCS, BAIS, and MStage show higher efficiency in area 
and energy consumption in ascending order. 
