There is an enormous growth in performance capability of computing platform in the last decade. The parallelism becomes an inevitable trend for future computing hardware / software design. Motivated by the practical computation performance demands in power system, especially distribution system, and the advances in modern computing platform, we developed a high performance parallel distribution power flow solver for Monte Carlo styled application. From computer architecture and programming point of view, we show that by applying various performance tuning techniques and parallelization, our distribution power flow solver is able to achieve 50% of a CPU's theoretical peak performance. That is 50x speedup comparing to an already fully compiler-optimized C++ implementation.</p