67,843 research outputs found

    MERIC and RADAR generator: tools for energy evaluation and runtime tuning of HPC applications

    Get PDF
    This paper introduces two tools for manual energy evaluation and runtime tuning developed at IT4Innovations in the READEX project. The MERIC library can be used for manual instrumentation and analysis of any application from the energy and time consumption point of view. Besides tracing, MERIC can also change environment and hardware parameters during the application runtime, which leads to energy savings. MERIC stores large amounts of data, which are difficult to read by a human. The RADAR generator analyses the MERIC output files to find the best settings of evaluated parameters for each instrumented region. It generates a Open image in new window report and a MERIC configuration file for application production runs

    Domain knowledge specification for energy tuning

    Get PDF
    To overcome the challenges of energy consumption of HPC systems, the European Union Horizon 2020 READEX (Runtime Exploitation of Application Dynamism for Energy-efficient Exascale computing) project uses an online auto-tuning approach to improve energy efficiency of HPC applications. The READEX methodology pre-computes optimal system configurations at design-time, such as the CPU frequency, for instances of program regions and switches at runtime to the configuration given in the tuning model when the region is executed. READEX goes beyond previous approaches by exploiting dynamic changes of a region's characteristics by leveraging region and characteristic specific system configurations. While the tool suite supports an automatic approach, specifying domain knowledge such as the structure and characteristics of the application and application tuning parameters can significantly help to create a more refined tuning model. This paper presents the means available for an application expert to provide domain knowledge and presents tuning results for some benchmarks.Web of Science316art. no. E465

    An integrated cryogenic optical modulator

    Full text link
    Integrated electrical and photonic circuits (PIC) operating at cryogenic temperatures are fundamental building blocks required to achieve scalable quantum computing, and cryogenic computing technologies. Optical interconnects offer better performance and thermal insulation than electrical wires and are imperative for true quantum communication. Silicon PICs have matured for room temperature applications but their cryogenic performance is limited by the absence of efficient low temperature electro-optic (EO) modulation. While detectors and lasers perform better at low temperature, cryogenic optical switching remains an unsolved challenge. Here we demonstrate EO switching and modulation from room temperature down to 4 K by using the Pockels effect in integrated barium titanate (BaTiO3)-based devices. We report the nonlinear optical (NLO) properties of BaTiO3 in a temperature range which has previously not been explored, showing an effective Pockels coefficient of 200 pm/V at 4 K. We demonstrate the largest EO bandwidth (30 GHz) of any cryogenic switch to date, ultra-low-power tuning which is 10^9 times more efficient than thermal tuning, and high-speed data modulation at 20 Gbps. Our results demonstrate a missing component for cryogenic PICs. It removes major roadblocks for the realisation of novel cryogenic-compatible systems in the field of quantum computing and supercomputing, and for interfacing those systems with the real world at room-temperature

    Performance and Power Analysis of HPC Workloads on Heterogenous Multi-Node Clusters

    Get PDF
    Performance analysis tools allow application developers to identify and characterize the inefficiencies that cause performance degradation in their codes, allowing for application optimizations. Due to the increasing interest in the High Performance Computing (HPC) community towards energy-efficiency issues, it is of paramount importance to be able to correlate performance and power figures within the same profiling and analysis tools. For this reason, we present a performance and energy-efficiency study aimed at demonstrating how a single tool can be used to collect most of the relevant metrics. In particular, we show how the same analysis techniques can be applicable on different architectures, analyzing the same HPC application on a high-end and a low-power cluster. The former cluster embeds Intel Haswell CPUs and NVIDIA K80 GPUs, while the latter is made up of NVIDIA Jetson TX1 boards, each hosting an Arm Cortex-A57 CPU and an NVIDIA Tegra X1 Maxwell GPU.The research leading to these results has received funding from the European Community’s Seventh Framework Programme [FP7/2007-2013] and Horizon 2020 under the Mont-Blanc projects [17], grant agreements n. 288777, 610402 and 671697. E.C. was partially founded by “Contributo 5 per mille assegnato all’Università degli Studi di Ferrara-dichiarazione dei redditi dell’anno 2014”. We thank the University of Ferrara and INFN Ferrara for the access to the COKA Cluster. We warmly thank the BSC tools group, supporting us for the smooth integration and test of our setup within Extrae and Paraver.Peer ReviewedPostprint (published version
    corecore