In this paper we introduce RADULS2, the fastest parallel sorter based on
radix algorithm. It is optimized to process huge amounts of data making use of
modern multicore CPUs. The main novelties include: extremely optimized
algorithm for handling tiny arrays (up to about a hundred of records) that
could appear even billions times as subproblems to handle and improved
processing of larger subarrays with better use of non-temporal memory stores