344 research outputs found

    A Survey on Compiler Autotuning using Machine Learning

    Full text link
    Since the mid-1990s, researchers have been trying to use machine-learning based approaches to solve a number of different compiler optimization problems. These techniques primarily enhance the quality of the obtained results and, more importantly, make it feasible to tackle two main compiler optimization problems: optimization selection (choosing which optimizations to apply) and phase-ordering (choosing the order of applying optimizations). The compiler optimization space continues to grow due to the advancement of applications, increasing number of compiler optimizations, and new target architectures. Generic optimization passes in compilers cannot fully leverage newly introduced optimizations and, therefore, cannot keep up with the pace of increasing options. This survey summarizes and classifies the recent advances in using machine learning for the compiler optimization field, particularly on the two major problems of (1) selecting the best optimizations and (2) the phase-ordering of optimizations. The survey highlights the approaches taken so far, the obtained results, the fine-grain classification among different approaches and finally, the influential papers of the field.Comment: version 5.0 (updated on September 2018)- Preprint Version For our Accepted Journal @ ACM CSUR 2018 (42 pages) - This survey will be updated quarterly here (Send me your new published papers to be added in the subsequent version) History: Received November 2016; Revised August 2017; Revised February 2018; Accepted March 2018

    Energy Demand Response for High-Performance Computing Systems

    Get PDF
    The growing computational demand of scientific applications has greatly motivated the development of large-scale high-performance computing (HPC) systems in the past decade. To accommodate the increasing demand of applications, HPC systems have been going through dramatic architectural changes (e.g., introduction of many-core and multi-core systems, rapid growth of complex interconnection network for efficient communication between thousands of nodes), as well as significant increase in size (e.g., modern supercomputers consist of hundreds of thousands of nodes). With such changes in architecture and size, the energy consumption by these systems has increased significantly. With the advent of exascale supercomputers in the next few years, power consumption of the HPC systems will surely increase; some systems may even consume hundreds of megawatts of electricity. Demand response programs are designed to help the energy service providers to stabilize the power system by reducing the energy consumption of participating systems during the time periods of high demand power usage or temporary shortage in power supply. This dissertation focuses on developing energy-efficient demand-response models and algorithms to enable HPC system\u27s demand response participation. In the first part, we present interconnection network models for performance prediction of large-scale HPC applications. They are based on interconnected topologies widely used in HPC systems: dragonfly, torus, and fat-tree. Our interconnect models are fully integrated with an implementation of message-passing interface (MPI) that can mimic most of its functions with packet-level accuracy. Extensive experiments show that our integrated models provide good accuracy for predicting the network behavior, while at the same time allowing for good parallel scaling performance. In the second part, we present an energy-efficient demand-response model to reduce HPC systems\u27 energy consumption during demand response periods. We propose HPC job scheduling and resource provisioning schemes to enable HPC system\u27s emergency demand response participation. In the final part, we propose an economic demand-response model to allow both HPC operator and HPC users to jointly reduce HPC system\u27s energy cost. Our proposed model allows the participation of HPC systems in economic demand-response programs through a contract-based rewarding scheme that can incentivize HPC users to participate in demand response

    WHYPE: A Scale-Out Architecture with Wireless Over-the-Air Majority for Scalable In-memory Hyperdimensional Computing

    Full text link
    Hyperdimensional computing (HDC) is an emerging computing paradigm that represents, manipulates, and communicates data using long random vectors known as hypervectors. Among different hardware platforms capable of executing HDC algorithms, in-memory computing (IMC) has shown promise as it is very efficient in performing matrix-vector multiplications, which are common in the HDC algebra. Although HDC architectures based on IMC already exist, how to scale them remains a key challenge due to collective communication patterns that these architectures required and that traditional chip-scale networks were not designed for. To cope with this difficulty, we propose a scale-out HDC architecture called WHYPE, which uses wireless in-package communication technology to interconnect a large number of physically distributed IMC cores that either encode hypervectors or perform multiple similarity searches in parallel. In this context, the key enabler of WHYPE is the opportunistic use of the wireless network as a medium for over-the-air computation. WHYPE implements an optimized source coding that allows receivers to calculate the bit-wise majority of multiple hypervectors (a useful operation in HDC) being transmitted concurrently over the wireless channel. By doing so, we achieve a joint broadcast distribution and computation with a performance and efficiency unattainable with wired interconnects, which in turn enables massive parallelization of the architecture. Through evaluations at the on-chip network and complete architecture levels, we demonstrate that WHYPE can bundle and distribute hypervectors faster and more efficiently than a hypothetical wired implementation, and that it scales well to tens of receivers. We show that the average error rate of the majority computation is low, such that it has negligible impact on the accuracy of HDC classification tasks.Comment: Accepted at IEEE Journal on Emerging and Selected Topics in Circuits and Systems (JETCAS). arXiv admin note: text overlap with arXiv:2205.1088

    On Security and Privacy for Networked Information Society : Observations and Solutions for Security Engineering and Trust Building in Advanced Societal Processes

    Get PDF
    Our society has developed into a networked information society, in which all aspects of human life are interconnected via the Internet — the backbone through which a significant part of communications traffic is routed. This makes the Internet arguably the most important piece of critical infrastructure in the world. Securing Internet communications for everyone using it is extremely important, as the continuing growth of the networked information society relies upon fast, reliable and secure communications. A prominent threat to the security and privacy of Internet users is mass surveillance of Internet communications. The methods and tools used to implement mass surveillance capabilities on the Internet pose a danger to the security of all communications, not just the intended targets. When we continue to further build the networked information upon the unreliable foundation of the Internet we encounter increasingly complex problems,which are the main focus of this dissertation. As the reliance on communication technology grows in a society, so does the importance of information security. At this stage, information security issues become separated from the purely technological domain and begin to affect everyone in society. The approach taken in this thesis is therefore both technical and socio-technical. The research presented in this PhD thesis builds security in to the networked information society and provides parameters for further development of a safe and secure networked information society. This is achieved by proposing improvements on a multitude of layers. In the technical domain we present an efficient design flow for secure embedded devices that use cryptographic primitives in a resource-constrained environment, examine and analyze threats to biometric passport and electronic voting systems, observe techniques used to conduct mass Internet surveillance, and analyze the security of Finnish web user passwords. In the socio-technical domain we examine surveillance and how it affects the citizens of a networked information society, study methods for delivering efficient security education, examine what is essential security knowledge for citizens, advocate mastery over surveillance data by the targeted citizens in the networked information society, and examine the concept of forced trust that permeates all topics examined in this work.Yhteiskunta, jossa elämme, on muovautunut teknologian kehityksen myötä todelliseksi tietoyhteiskunnaksi. Monet verkottuneen tietoyhteiskunnan osa-alueet ovat kokeneet muutoksen tämän kehityksen seurauksena. Tämän muutoksen keskiössä on Internet: maailmanlaajuinen tietoverkko, joka mahdollistaa verkottuneiden laitteiden keskenäisen viestinnän ennennäkemättömässä mittakaavassa. Internet on muovautunut ehkä keskeisimmäksi osaksi globaalia viestintäinfrastruktuuria, ja siksi myös globaalin viestinnän turvaaminen korostuu tulevaisuudessa yhä enemmän. Verkottuneen tietoyhteiskunnan kasvu ja kehitys edellyttävät vakaan, turvallisen ja nopean viestintäjärjestelmän olemassaoloa. Laajamittainen tietoverkkojen joukkovalvonta muodostaa merkittävän uhan tämän järjestelmän vakaudelle ja turvallisuudelle. Verkkovalvonnan toteuttamiseen käytetyt menetelmät ja työkalut eivät vain anna mahdollisuutta tarkastella valvonnan kohteena olevaa viestiliikennettä, vaan myös vaarantavat kaiken Internet-liikenteen ja siitä riippuvaisen toiminnan turvallisuuden. Kun verkottunutta tietoyhteiskuntaa rakennetaan tämän kaltaisia valuvikoja ja haavoittuvuuksia sisältävän järjestelmän varaan, keskeinen uhkatekijä on, että yhteiskunnan ydintoiminnot ovat alttiina ulkopuoliselle vaikuttamiselle. Näiden uhkatekijöiden ja niiden taustalla vaikuttavien mekanismien tarkastelu on tämän väitöskirjatyön keskiössä. Koska työssä on teknisen sisällön lisäksi vahva yhteiskunnallinen elementti, tarkastellaan tiukan teknisen tarkastelun sijaan aihepiirä laajemmin myös yhteiskunnallisesta näkökulmasta. Tässä väitöskirjassa pyritään rakentamaan kokonaiskuvaa verkottuneen tietoyhteiskunnan turvallisuuteen, toimintaan ja vakauteen vaikuttavista tekijöistä, sekä tuomaan esiin uusia ratkaisuja ja avauksia eri näkökulmista. Työn tavoitteena on osaltaan mahdollistaa entistä turvallisemman verkottuneen tietoyhteiskunnan rakentaminen tulevaisuudessa. Teknisestä näkökulmasta työssä esitetään suunnitteluvuo kryptografisia primitiivejä tehokkaasti hyödyntäville rajallisen laskentatehon sulautetuviiille järjestelmille, analysoidaan biometrisiin passeihin, kansainväliseen passijärjestelmään, sekä sähköiseen äänestykseen kohdistuvia uhkia, tarkastellaan joukkovalvontaan käytettyjen tekniikoiden toimintaperiaatteita ja niiden aiheuttamia uhkia, sekä tutkitaan suomalaisten Internet-käyttäjien salasanatottumuksia verkkosovelluksissa. Teknis-yhteiskunnallisesta näkökulmasta työssä tarkastellaan valvonnan teoriaa ja perehdytään siihen, miten valvonta vaikuttaa verkottuneen tietoyhteiskunnan kansalaisiin. Lisäksi kehitetään menetelmiä parempaan tietoturvaopetukseen kaikilla koulutusasteilla, määritellään keskeiset tietoturvatietouden käsitteet, tarkastellaan mahdollisuutta soveltaa tiedon herruuden periaatetta verkottuneen tietoyhteiskunnan kansalaisistaan keräämän tiedon hallintaan ja käyttöön, sekä tutkitaan luottamuksen merkitystä yhteiskunnan ydintoimintojen turvallisuudelle ja toiminnalle, keskittyen erityisesti pakotetun luottamuksen vaikutuksiin

    Accurate and Scalable Many-Node Simulation

    Full text link
    Accurate performance estimation of future many-node machines is challenging because it requires detailed simulation models of both node and network. However, simulating the full system in detail is unfeasible in terms of compute and memory resources. State-of-the-art techniques use a two-phase approach that combines detailed simulation of a single node with network-only simulation of the full system. We show that these techniques, where the detailed node simulation is done in isolation, are inaccurate because they ignore two important node-level effects: compute time variability, and inter-node communication. We propose a novel three-stage simulation method to allow scalable and accurate many-node simulation, combining native profiling, detailed node simulation and high-level network simulation. By including timing variability and the impact of external nodes, our method leads to more accurate estimates. We validate our technique against measurements on a multi-node cluster, and report an average 6.7% error on 64 nodes (maximum error of 12%), compared to on average 27% error and up to 54% when timing variability and the scaling overhead are ignored. At higher node counts, the prediction error of ignoring variable timings and scaling overhead continues to increase compared to our technique, and may lead to selecting the wrong optimal cluster configuration. Using our technique, we are able to accurately project performance to thousands of nodes within a day of simulation time, using only a single or a few simulation hosts. Our method can be used to quickly explore large many-node design spaces, including node micro-architecture, node count and network configuration

    3rd Many-core Applications Research Community (MARC) Symposium. (KIT Scientific Reports ; 7598)

    Get PDF
    This manuscript includes recent scientific work regarding the Intel Single Chip Cloud computer and describes approaches for novel approaches for programming and run-time organization
    corecore