We discuss the performance characteristics of using the modification of the
tree code suggested by Barnes \citep{1990JCoPh..87..161B} in the context of the
TreePM code. The optimisation involves identifying groups of particles and
using only one tree walk to compute force for all the particles in the group.
This modification has been in use in our implementation of the TreePM code for
some time, and has also been used by others in codes that make use of tree
structures. In this paper, we present the first detailed study of the
performance characteristics of this optimisation. We show that the
modification, if tuned properly can speed up the TreePM code by a significant
amount. We also combine this modification with the use of individual time steps
and indicate how to combine these two schemes in an optimal fashion. We find
that the combination is at least a factor of two faster than the modified
TreePM without individual time steps. Overall performance is often faster by a
larger factor, as the scheme of groups optimises use of cache for large
simulations.Comment: 16 pages, 5 figures; Accepted for publication in Research In
Astronomy and Astrophysics (RAA