MPI uses the concept of communicators to connect groups of processes. It
provides nonblocking collective operations on communicators to overlap
communication and computation. Flexible algorithms demand flexible
communicators. E.g., a process can work on different subproblems within
different process groups simultaneously, new process groups can be created, or
the members of a process group can change. Depending on the number of
communicators, the time for communicator creation can drastically increase the
running time of the algorithm. Furthermore, a new communicator synchronizes all
processes as communicator creation routines are blocking collective operations.
We present RBC, a communication library based on MPI, that creates
range-based communicators in constant time without communication. These RBC
communicators support (non)blocking point-to-point communication as well as
(non)blocking collective operations. Our experiments show that the library
reduces the time to create a new communicator by a factor of more than 400
whereas the running time of collective operations remains about the same. We
propose Janus Quicksort, a distributed sorting algorithm that avoids any load
imbalances. We improved the performance of this algorithm by a factor of 15 for
moderate inputs by using RBC communicators. Finally, we discuss different
approaches to bring nonblocking (local) communicator creation of lightweight
(range-based) communicators into MPI