Manycores are consolidating in HPC community as a way of improving
performance while keeping power efficiency. Knights Landing is the recently
released second generation of Intel Xeon Phi architecture. While optimizing
applications on CPUs, GPUs and first Xeon Phi's has been largely studied in the
last years, the new features in Knights Landing processors require the revision
of programming and optimization techniques for these devices. In this work, we
selected the Floyd-Warshall algorithm as a representative case study of graph
and memory-bound applications. Starting from the default serial version, we
show how data, thread and compiler level optimizations help the parallel
implementation to reach 338 GFLOPS.Comment: Computer Science - CACIC 2017. Springer Communications in Computer
and Information Science, vol 79