137,278 research outputs found

    A Review on Software Architectures for Heterogeneous Platforms

    Full text link
    The increasing demands for computing performance have been a reality regardless of the requirements for smaller and more energy efficient devices. Throughout the years, the strategy adopted by industry was to increase the robustness of a single processor by increasing its clock frequency and mounting more transistors so more calculations could be executed. However, it is known that the physical limits of such processors are being reached, and one way to fulfill such increasing computing demands has been to adopt a strategy based on heterogeneous computing, i.e., using a heterogeneous platform containing more than one type of processor. This way, different types of tasks can be executed by processors that are specialized in them. Heterogeneous computing, however, poses a number of challenges to software engineering, especially in the architecture and deployment phases. In this paper, we conduct an empirical study that aims at discovering the state-of-the-art in software architecture for heterogeneous computing, with focus on deployment. We conduct a systematic mapping study that retrieved 28 studies, which were critically assessed to obtain an overview of the research field. We identified gaps and trends that can be used by both researchers and practitioners as guides to further investigate the topic

    Design and optimization of a portable LQCD Monte Carlo code using OpenACC

    Full text link
    The present panorama of HPC architectures is extremely heterogeneous, ranging from traditional multi-core CPU processors, supporting a wide class of applications but delivering moderate computing performance, to many-core GPUs, exploiting aggressive data-parallelism and delivering higher performances for streaming computing applications. In this scenario, code portability (and performance portability) become necessary for easy maintainability of applications; this is very relevant in scientific computing where code changes are very frequent, making it tedious and prone to error to keep different code versions aligned. In this work we present the design and optimization of a state-of-the-art production-level LQCD Monte Carlo application, using the directive-based OpenACC programming model. OpenACC abstracts parallel programming to a descriptive level, relieving programmers from specifying how codes should be mapped onto the target architecture. We describe the implementation of a code fully written in OpenACC, and show that we are able to target several different architectures, including state-of-the-art traditional CPUs and GPUs, with the same code. We also measure performance, evaluating the computing efficiency of our OpenACC code on several architectures, comparing with GPU-specific implementations and showing that a good level of performance-portability can be reached.Comment: 26 pages, 2 png figures, preprint of an article submitted for consideration in International Journal of Modern Physics

    EPOBF: Energy Efficient Allocation of Virtual Machines in High Performance Computing Cloud

    Full text link
    Cloud computing has become more popular in provision of computing resources under virtual machine (VM) abstraction for high performance computing (HPC) users to run their applications. A HPC cloud is such cloud computing environment. One of challenges of energy efficient resource allocation for VMs in HPC cloud is tradeoff between minimizing total energy consumption of physical machines (PMs) and satisfying Quality of Service (e.g. performance). On one hand, cloud providers want to maximize their profit by reducing the power cost (e.g. using the smallest number of running PMs). On the other hand, cloud customers (users) want highest performance for their applications. In this paper, we focus on the scenario that scheduler does not know global information about user jobs and user applications in the future. Users will request shortterm resources at fixed start times and non interrupted durations. We then propose a new allocation heuristic (named Energy-aware and Performance per watt oriented Bestfit (EPOBF)) that uses metric of performance per watt to choose which most energy-efficient PM for mapping each VM (e.g. maximum of MIPS per Watt). Using information from Feitelson's Parallel Workload Archive to model HPC jobs, we compare the proposed EPOBF to state of the art heuristics on heterogeneous PMs (each PM has multicore CPU). Simulations show that the EPOBF can reduce significant total energy consumption in comparison with state of the art allocation heuristics.Comment: 10 pages, in Procedings of International Conference on Advanced Computing and Applications, Journal of Science and Technology, Vietnamese Academy of Science and Technology, ISSN 0866-708X, Vol. 51, No. 4B, 201

    Direct NN-body code on low-power embedded ARM GPUs

    Full text link
    This work arises on the environment of the ExaNeSt project aiming at design and development of an exascale ready supercomputer with low energy consumption profile but able to support the most demanding scientific and technical applications. The ExaNeSt compute unit consists of densely-packed low-power 64-bit ARM processors, embedded within Xilinx FPGA SoCs. SoC boards are heterogeneous architecture where computing power is supplied both by CPUs and GPUs, and are emerging as a possible low-power and low-cost alternative to clusters based on traditional CPUs. A state-of-the-art direct NN-body code suitable for astrophysical simulations has been re-engineered in order to exploit SoC heterogeneous platforms based on ARM CPUs and embedded GPUs. Performance tests show that embedded GPUs can be effectively used to accelerate real-life scientific calculations, and that are promising also because of their energy efficiency, which is a crucial design in future exascale platforms.Comment: 16 pages, 7 figures, 1 table, accepted for publication in the Computing Conference 2019 proceeding

    Distributed Gaussian Processes

    Get PDF
    To scale Gaussian processes (GPs) to large data sets we introduce the robust Bayesian Committee Machine (rBCM), a practical and scalable product-of-experts model for large-scale distributed GP regression. Unlike state-of-the-art sparse GP approximations, the rBCM is conceptually simple and does not rely on inducing or variational parameters. The key idea is to recursively distribute computations to independent computational units and, subsequently, recombine them to form an overall result. Efficient closed-form inference allows for straightforward parallelisation and distributed computations with a small memory footprint. The rBCM is independent of the computational graph and can be used on heterogeneous computing infrastructures, ranging from laptops to clusters. With sufficient computing resources our distributed GP model can handle arbitrarily large data sets.Comment: 10 pages, 5 figures. Appears in Proceedings of ICML 201

    Device-Aware Routing and Scheduling in Multi-Hop Device-to-Device Networks

    Full text link
    The dramatic increase in data and connectivity demand, in addition to heterogeneous device capabilities, poses a challenge for future wireless networks. One of the promising solutions is Device-to-Device (D2D) networking. D2D networking, advocating the idea of connecting two or more devices directly without traversing the core network, is promising to address the increasing data and connectivity demand. In this paper, we consider D2D networks, where devices with heterogeneous capabilities including computing power, energy limitations, and incentives participate in D2D activities heterogeneously. We develop (i) a device-aware routing and scheduling algorithm (DARS) by taking into account device capabilities, and (ii) a multi-hop D2D testbed using Android-based smartphones and tablets by exploiting Wi-Fi Direct and legacy Wi-Fi connections. We show that DARS significantly improves throughput in our testbed as compared to state-of-the-art
    • …
    corecore