11,623 research outputs found

    Tuning block-parallel all-pairs shortest path algorithm for efficient multi-core implementation

    Get PDF
    Finding shortest paths in a weighted graph is one of the key problems in computer-science, which has numerous practical applications in multiple domains. This paper analyzes the parallel blocked all-pairs shortest path algorithm at the aim of evaluating the influence of the multi-core system and its hierarchical cache memory on the parameters of algorithm implementation depending on the size of the graph and the size of distance matrix’s block. It proposes a technique of tuning the block-size to the given multi-core system. The technique involves profiling tools in the tuning process and allows the increase of the parallel algorithm throughput. Computational experiments carried out on a rack server equipped with two intel xeon e5-2620 v4 processors of 8 cores and 16 hardware threads each have convincingly shown for various graph sizes that the behavior and parameters of the hierarchical cache memory operation don’t depend on the graph size and are determined only by the distance matrix’s block size. To tune the algorithm to the target multi-core system, the preferable block size can be found once for the graph size whose in-memory matrix representation is larger than the size of cache shared among all processor’s cores. Then this block-size can be reused on graphs of bigger size for efficient solving the all-pairs shortest path problem

    Blocked All-Pairs Shortest Paths Algorithm on Intel Xeon Phi KNL Processor: A Case Study

    Full text link
    Manycores are consolidating in HPC community as a way of improving performance while keeping power efficiency. Knights Landing is the recently released second generation of Intel Xeon Phi architecture. While optimizing applications on CPUs, GPUs and first Xeon Phi's has been largely studied in the last years, the new features in Knights Landing processors require the revision of programming and optimization techniques for these devices. In this work, we selected the Floyd-Warshall algorithm as a representative case study of graph and memory-bound applications. Starting from the default serial version, we show how data, thread and compiler level optimizations help the parallel implementation to reach 338 GFLOPS.Comment: Computer Science - CACIC 2017. Springer Communications in Computer and Information Science, vol 79

    Routing on the Visibility Graph

    Full text link
    We consider the problem of routing on a network in the presence of line segment constraints (i.e., obstacles that edges in our network are not allowed to cross). Let PP be a set of nn points in the plane and let SS be a set of non-crossing line segments whose endpoints are in PP. We present two deterministic 1-local O(1)O(1)-memory routing algorithms that are guaranteed to find a path of at most linear size between any pair of vertices of the \emph{visibility graph} of PP with respect to a set of constraints SS (i.e., the algorithms never look beyond the direct neighbours of the current location and store only a constant amount of additional information). Contrary to {\em all} existing deterministic local routing algorithms, our routing algorithms do not route on a plane subgraph of the visibility graph. Additionally, we provide lower bounds on the routing ratio of any deterministic local routing algorithm on the visibility graph.Comment: An extended abstract of this paper appeared in the proceedings of the 28th International Symposium on Algorithms and Computation (ISAAC 2017). Final version appeared in the Journal of Computational Geometr
    corecore