We propose the design and an implementation of a bulk-parallel external
memory priority queue to take advantage of both shared-memory parallelism and
high external memory transfer speeds to parallel disks. To achieve higher
performance by decoupling item insertions and extractions, we offer two
parallelization interfaces: one using "bulk" sequences, the other by defining
"limit" items. In the design, we discuss how to parallelize insertions using
multiple heaps, and how to calculate a dynamic prediction sequence to prefetch
blocks and apply parallel multiway merge for extraction. Our experimental
results show that in the selected benchmarks the priority queue reaches 75% of
the full parallel I/O bandwidth of rotational disks and and 65% of SSDs, or the
speed of sorting in external memory when bounded by computation.Comment: extended version of SEA'15 conference pape