(Abridged) We have developed a numerical software library for collisionless
N-body simulations named "Phantom-GRAPE" which highly accelerates force
calculations among particles by use of a new SIMD instruction set extension to
the x86 architecture, AVX, an enhanced version of SSE. In our library, not only
the Newton's forces, but also central forces with an arbitrary shape f(r),
which has a finite cutoff radius r_cut (i.e. f(r)=0 at r>r_cut), can be quickly
computed. Using an Intel Core i7--2600 processor, we measure the performance of
our library for both the forces. In the case of Newton's forces, we achieve 2 x
10^9 interactions per second with 1 processor core, which is 20 times higher
than the performance of an implementation without any explicit use of SIMD
instructions, and 2 times than that with the SSE instructions. With 4 processor
cores, we obtain the performance of 8 x 10^9 interactions per second. In the
case of the arbitrarily shaped forces, we can calculate 1 x 10^9 and 4 x 10^9
interactions per second with 1 and 4 processor cores, respectively. The
performance with 1 processor core is 6 times and 2 times higher than those of
the implementations without any use of SIMD instructions and with the SSE
instructions. These performances depend weakly on the number of particles. It
is good contrast with the fact that the performance of force calculations
accelerated by GPUs depends strongly on the number of particles. Substantially
weak dependence of the performance on the number of particles is suitable to
collisionless N-body simulations, since these simulations are usually performed
with sophisticated N-body solvers such as Tree- and TreePM-methods combined
with an individual timestep scheme. Collisionless N-body simulations
accelerated with our library have significant advantage over those accelerated
by GPUs, especially on massively parallel environments.Comment: 19 pages, 11 figures, 4tables, accepted for publication in New
Astronom