Abstract. Wait states in parallel applications can be identified by scanning event traces for characteristic patterns. In our earlier work, we have defined such patterns for MPI-2 one-sided communication, although still based on a trace-analysis scheme with limited scalability. Taking advantage of a new scalable trace-analysis approach based on a parallel replay, which was originally developed for MPI-1 point-to-point and collective communication, we show how wait states in onesided communications can be detected in a more scalable fashion. We demonstrate the scalability of our method and its usefulness for the optimization cycle with applications running on up to 8,192 cores. Keywords: MPI-2, remote memory access, performance analysis, scalability, pattern search.