Recent research suggests that DSM clusters can benefit from parallel coherence controllers. Parallel controllers require address partitioning and synchronization to avoid handling multiple coherence events for the same memory address simultaneously. This paper evaluates a spectrum of address partitioning schemes that vary in performance, hardware complexity, and cost. Dynamic partitioning minimizes load imbalance in controllers by using hardware address synchronizers to distribute the load among multiple protocol engines at runtime. Static partitioning obviates the need for hardware synchronization and assigns memory addresses to protocol engines at design time, but may lead to load imbalance among engines. We present simulation results indicating that: (i) dynamic partitioning performs best speeding up application execution on an 8 8-way cluster on average by 62% using four-engine as compared to single-engine controllers, (ii) block-interleaved static partitioning using loworder addr..
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.