4,011 research outputs found

    A Study of Dynamic Optimization Techniques: Lessons and Directions in Kernel Design

    Get PDF
    The Synthesis kernel [21,22,23,27,28] showed that dynamic code generation, software feedback, and fine-grain modular kernel organization are useful implementation techniques for improving the performance of operating system kernels. In addition, and perhaps more importantly, we discovered that there are strong interactions between the techniques. Hence, a careful and systematic combination of the techniques can be very powerful even though each one by itself may have serious limitations. By identifying these interactions we illustrate the problems of applying each technique in isolation to existing kernels. We also highlight the important common under-pinnings of the Synthesis experience and present our ideas on future operating system design and implementation. Finally, we outline a more uniform approach to dynamic optimizations called incremental partial evaluation

    ADACORE: Achieving Energy Efficiency via Adaptive Core Morphing at Runtime

    Get PDF
    Heterogeneous multicore processors offer an energy-efficient alternative to homogeneous multicores. Typically, heterogeneous multi-core refers to a system with more than one core where all the cores use a single ISA but differ in one or more micro-architectural configurations. A carefully designed multicore system consists of cores of diverse power and performance profiles. During execution, an application is run on a core that offers the best trade-off between performance and energy-efficiency. Since the resource needs of an application may vary with time, so does the optimal core choice. Moving a thread from one core to another involves transferring the entire processor state and cache warm-up. Frequent migration leads to large performance overhead, negating any benefits of migration. Infrequent migration on the other hand leads to missed opportunities. Thus, reducing overhead of migration is integral to harnessing benefits of heterogeneous multicores. \par This work proposes \textit{AdaCore}, a novel core architecture which pushes the heterogeneity exploited in the heterogeneous multicore into a single core. \textit{AdaCore} primarily addresses the resource bottlenecks in workloads. The design attempts to adaptively match the resource demands by reconfiguring on-chip resources at a fine-grain granularity. The adaptive core morphing allows core configurations with diverse power and performance profiles within a single core by adaptive voltage, frequency and resource reconfiguration. Towards this end, the proposed novel architecture while providing energy savings, improves performance with a low overhead in-core reconfiguration. This thesis further compares \textit{AdaCore} with a standard Out-of-Order core with capability to perform Dynamic Voltage and Frequency Scaling (DVFS) designed to achieve energy efficiency. The results presented in this thesis indicate that the proposed scheme can improve the performance/Watt of application, on average, by 32\% over a static out-of-order core and by 14\% over DVFS. The proposed scheme improves IPS2/WattIPS^{2}/Watt by 38\% over static out-of-order core

    Towards a Mini-App for Smoothed Particle Hydrodynamics at Exascale

    Full text link
    The smoothed particle hydrodynamics (SPH) technique is a purely Lagrangian method, used in numerical simulations of fluids in astrophysics and computational fluid dynamics, among many other fields. SPH simulations with detailed physics represent computationally-demanding calculations. The parallelization of SPH codes is not trivial due to the absence of a structured grid. Additionally, the performance of the SPH codes can be, in general, adversely impacted by several factors, such as multiple time-stepping, long-range interactions, and/or boundary conditions. This work presents insights into the current performance and functionalities of three SPH codes: SPHYNX, ChaNGa, and SPH-flow. These codes are the starting point of an interdisciplinary co-design project, SPH-EXA, for the development of an Exascale-ready SPH mini-app. To gain such insights, a rotating square patch test was implemented as a common test simulation for the three SPH codes and analyzed on two modern HPC systems. Furthermore, to stress the differences with the codes stemming from the astrophysics community (SPHYNX and ChaNGa), an additional test case, the Evrard collapse, has also been carried out. This work extrapolates the common basic SPH features in the three codes for the purpose of consolidating them into a pure-SPH, Exascale-ready, optimized, mini-app. Moreover, the outcome of this serves as direct feedback to the parent codes, to improve their performance and overall scalability.Comment: 18 pages, 4 figures, 5 tables, 2018 IEEE International Conference on Cluster Computing proceedings for WRAp1
    • …
    corecore