Scalable architectures with non-uniform memory access time (NUMAs) have gained increased popularity in recent years. The increased scalability have increased the demand for scalable lock implementations, such as the queue-based locks of Mellor-Crummey and Scott (MCS lock), and of Craig, Landin and Hagersten (CLH lock). This paper demonstrates that the first-come first-served nature of queue-based locks make them less suitable for nonuniform communication architectures (NUCAs), for example NUMAs built from a few large nodes. In contrast, the simpler test-and-set locks gives a unfair advantage to neighboring processors when a lock is released, which will create a fast lock handover time as well as create more locality for the data accessed in the critical region. We also propose the new RH lock that explores the NUCA architectures by creating a controlled unfairness in combination with a much reduced traffic compared with the testand-set locks. A critical section guarded by the RH lock is shown to take less than half the time to execute compared with the same critical section guarded by any other lock. We also investigate the effectiveness of our new lock on a set of real SPLASH-2 applications. For example, execution time for Raytrace with 30 processors can be improved between 1.83 and 5.70 times by using the RH locks instead of any other tested locks
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.