56,101 research outputs found

    Netloc: a Tool for Topology-Aware Process Mapping

    Get PDF
    International audienceInterconnection networks in parallel platforms can be made of thousands of nodes and hundreds of switches. The communication cost between tasks of a parallel application varies significantly with their actual location in such platforms. Topology-aware process mapping consists in matching the application communication pattern with the network topology to improve the communication cost by placing related tasks close on the hardware. We show that our Netloc tool for gathering network topology in a generic way can be combined with the state-of-the-art Scotch partitioner for computing topology-aware MPI process placement. Our experiments with a stencil application on a fat-tree machine show that we are able to significantly improve the runtime in the vast majority of cases

    Mapping applications onto FPGA-centric clusters

    Full text link
    High Performance Computing (HPC) is becoming increasingly important throughout science and engineering as ever more complex problems must be solved through computational simulations. In these large computational applications, the latency of communication between processing nodes is often the key factor that limits performance. An emerging alternative computer architecture that addresses the latency problem is the FPGA-centric cluster (FCC); in these systems, the devices (FPGAs) are directly interconnected and thus many layers of hardware and software are avoided. The result can be scalability not currently achievable with other technologies. In FCCs, FPGAs serve multiple functions: accelerator, network interface card (NIC), and router. Moreover, because FPGAs are configurable, there is substantial opportunity to tailor the router hardware to the application; previous work has demonstrated that such application-aware configuration can effect a substantial improvement in hardware efficiency. One constraint of FCCs is that it is convenient for their interconnect to be static, direct, and have a two or three dimensional mesh topology. Thus, applications that are naturally of a different dimensionality (have a different logical topology) from that of the FCC must be remapped to obtain optimal performance. In this thesis we study various aspects of the mapping problem for FCCs. There are two major research thrusts. The first is finding the optimal mapping of logical to physical topology. This problem has received substantial attention by both the theory community, where topology mapping is referred to as graph embedding, and by the High Performance Computing (HPC) community, where it is a question of process placement. We explore the implications of the different mapping strategies on communication behavior in FCCs, especially on resulting load imbalance. The second major research thrust is built around the hypothesis that applications that need to be remapped (due to differing logical and physical topologies) will have different optimal router configurations from those applications that do not. For example, due to remapping, some virtual or physical communication links may have little occupancy; therefore fewer resources should be allocated to them. Critical here is the creation of a new set of parameterized hardware features that can be configured to best handle load imbalances caused by remapping. These two thrusts form a codesign loop: certain mapping algorithms may be differentially optimal due to application-aware router reconfiguration that accounts for this mapping. This thesis has four parts. The first part introduces the background and previous work related to communication in general and, in particular, how it is implemented in FCCs. We build on previous work on application-aware router configuration. The second part introduces topology mapping mechanisms including those derived from graph embeddings and a greedy algorithm commonly used in HPC. In the third part, topology mappings are evaluated for performance and imbalance; we note that different mapping strategies lead to different imbalances both in the overall network and in each node. The final part introduces reconfigure router design that allocates resources based on different imbalance situations caused by different mapping behaviors

    Topology-aware GPU scheduling for learning workloads in cloud environments

    Get PDF
    Recent advances in hardware, such as systems with multiple GPUs and their availability in the cloud, are enabling deep learning in various domains including health care, autonomous vehicles, and Internet of Things. Multi-GPU systems exhibit complex connectivity among GPUs and between GPUs and CPUs. Workload schedulers must consider hardware topology and workload communication requirements in order to allocate CPU and GPU resources for optimal execution time and improved utilization in shared cloud environments. This paper presents a new topology-aware workload placement strategy to schedule deep learning jobs on multi-GPU systems. The placement strategy is evaluated with a prototype on a Power8 machine with Tesla P100 cards, showing speedups of up to ≈1.30x compared to state-of-the-art strategies; the proposed algorithm achieves this result by allocating GPUs that satisfy workload requirements while preventing interference. Additionally, a large-scale simulation shows that the proposed strategy provides higher resource utilization and performance in cloud systems.This project is supported by the IBM/BSC Technology Center for Supercomputing collaboration agreement. It has also received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 639595). It is also partially supported by the Ministry of Economy of Spain under contract TIN2015-65316-P and Generalitat de Catalunya under contract 2014SGR1051, by the ICREA Academia program, and by the BSC-CNS Severo Ochoa program (SEV-2015-0493). We thank our IBM Research colleagues Alaa Youssef and Asser Tantawi for the valuable discussions. We also thank SC17 committee member Blair Bethwaite of Monash University for his constructive feedback on the earlier drafts of this paper.Peer ReviewedPostprint (published version

    Memetic Multi-Objective Particle Swarm Optimization-Based Energy-Aware Virtual Network Embedding

    Full text link
    In cloud infrastructure, accommodating multiple virtual networks on a single physical network reduces power consumed by physical resources and minimizes cost of operating cloud data centers. However, mapping multiple virtual network resources to physical network components, called virtual network embedding (VNE), is known to be NP-hard. With considering energy efficiency, the problem becomes more complicated. In this paper, we model energy-aware virtual network embedding, devise metrics for evaluating performance of energy aware virtual network-embedding algorithms, and propose an energy aware virtual network-embedding algorithm based on multi-objective particle swarm optimization augmented with local search to speed up convergence of the proposed algorithm and improve solutions quality. Performance of the proposed algorithm is evaluated and compared with existing algorithms using extensive simulations, which show that the proposed algorithm improves virtual network embedding by increasing revenue and decreasing energy consumption.Comment: arXiv admin note: text overlap with arXiv:1504.0684
    • …
    corecore