We present the multiple GPU computing with the common unified device
architecture (CUDA) for the Swendsen-Wang multi-cluster algorithm of
two-dimensional (2D) q-state Potts model. Extending our algorithm for single
GPU computing [Comp. Phys. Comm. 183 (2012) 1155], we realize the GPU
computation of the Swendsen-Wang multi-cluster algorithm for multiple GPUs. We
implement our code on the large-scale open science supercomputer TSUBAME 2.0,
and test the performance and the scalability of the simulation of the 2D Potts
model. The performance on Tesla M2050 using 256 GPUs is obtained as 37.3 spin
flips per a nano second for the q=2 Potts model (Ising model) at the critical
temperature with the linear system size L=65536.Comment: accepted for publication in Comp. Phys. Commun. arXiv admin note:
substantial text overlap with arXiv:1202.063