5,282 research outputs found

    Automatic Energy Saving Schemes for Parallel Applications

    Get PDF
    Although high-performance computing traditionally focuses on the efficient execution of large-scale applications, both energy and power have become critical concerns when approaching exascale. Drastic increases in the power consumption of supercomputers affect significantly their operating costs and failure rates. In modern microprocessor architectures, equipped with dynamic voltage and frequency scaling (DVFS) and CPU clock modulation (throttling), the power consumption may be controlled in software. Additionally, network interconnect, such as Infiniband, may be exploited to maximize energy savings while the application performance loss and frequency switching overheads must be carefully balanced. This work first studies two important collective communication operations, all-to-all and allgather and proposes energy saving strategies on the per-call basis. Next, it targets point-to-point communications to group them into phases and apply frequency scaling to them to save energy by exploiting the architectural and communication stalls. Finally, it proposes an automatic runtime system which combines both collective and point-to-point communications into phases, and applies throttling to them apart from DVFS to maximize energy savings. The experimental results are presented for NAS parallel benchmark problems as well as for the realistic parallel electronic structure calculations performed by the widely used quantum chemistry package GAMESS. Close to the maximum energy savings were obtained with a substantially low performance loss on the given platform

    Power Management Strategies for Wired Communication Networks.

    Get PDF
    With the exponential traffic growth and the rapid expansion of communication infrastructures worldwide, energy expenditure of the Internet has become a major concern in IT-reliant society. This energy problem has motivated the urgent demands of new strategies to reduce the consumption of telecommunication networks, with a particular focus on IP networks. In addition to the development of a new generation of energy-efficient network equipment, a significant body of research has concentrated on incorporating power/energy-awareness into network control and management, which aims at reducing the network power/energy consumption by either dynamically scaling speeds of each active network component to make it capable of adapting to its current load or putting to sleep the lightly loaded network elements and reconfiguring the network. However, the fundamental challenge of greening the Internet is to achieve a balance between the power/energy saving and the demands of quality-of-service (QoS) performance, which is an issue that has received less attention but is becoming a major problem in future green network designs. In this dissertation, we study how energy consumption can be reduced through different power/energy- and QoS-aware strategies for wired communication networks. To sufficiently reduce energy consumption while meeting the desire QoS requirements, we introduce several different schemes combing power management techniques with different scheduling strategies, which can be classified into experimental power management (EPM) and algorithmic power management (APM). In these proposed schemes, the power management techniques that we focus on are speed scaling and sleep mode. When the network processor is active, its speed and supply voltage can be decreased to reduce the energy consumption (speed scaling), while when the processor is idle, it can be put in a low power mode to save the energy consumption (sleep mode). The resulting problem is to determine how and when to adjust speeds for the processors, and/or to put a device into sleep mode. In this dissertation, we first discuss three families of dynamic voltage/frequency scaling (DVFS) based, QoS-aware EPM schemes, which aim to reduce the energy consumption in network equipment by using different packet scheduling strategies, while adhering to QoS requirements of supported applications. Then, we explore the problem of energy minimization under QoS constraints through a mathematical programming model, which is a DVFS-based, delay-aware APM scheme combing the speed scaling technique with the existing rate monotonic scheduling policy. Among these speed scaling based schemes, up to 26.76% dynamic power saving of the total power consumption can be achieved. In addition to speed scaling approaches, we further propose a sleep-based, traffic-aware EPM scheme, which is used to reduce power consumption by greening routing light load and putting the related network equipment into sleep mode according to twelve flow traffic density changes in 24-hour of an arbitrarily selected day. Meanwhile, a speed scaling technique without violating network QoS performance is also considered in this scheme when the traffic is rerouted. Applying this sleep-based strategy can lead to power savings of up to 62.58% of the total power consumption

    Separation Framework: An Enabler for Cooperative and D2D Communication for Future 5G Networks

    Get PDF
    Soaring capacity and coverage demands dictate that future cellular networks need to soon migrate towards ultra-dense networks. However, network densification comes with a host of challenges that include compromised energy efficiency, complex interference management, cumbersome mobility management, burdensome signaling overheads and higher backhaul costs. Interestingly, most of the problems, that beleaguer network densification, stem from legacy networks' one common feature i.e., tight coupling between the control and data planes regardless of their degree of heterogeneity and cell density. Consequently, in wake of 5G, control and data planes separation architecture (SARC) has recently been conceived as a promising paradigm that has potential to address most of aforementioned challenges. In this article, we review various proposals that have been presented in literature so far to enable SARC. More specifically, we analyze how and to what degree various SARC proposals address the four main challenges in network densification namely: energy efficiency, system level capacity maximization, interference management and mobility management. We then focus on two salient features of future cellular networks that have not yet been adapted in legacy networks at wide scale and thus remain a hallmark of 5G, i.e., coordinated multipoint (CoMP), and device-to-device (D2D) communications. After providing necessary background on CoMP and D2D, we analyze how SARC can particularly act as a major enabler for CoMP and D2D in context of 5G. This article thus serves as both a tutorial as well as an up to date survey on SARC, CoMP and D2D. Most importantly, the article provides an extensive outlook of challenges and opportunities that lie at the crossroads of these three mutually entangled emerging technologies.Comment: 28 pages, 11 figures, IEEE Communications Surveys & Tutorials 201

    A software controlled voltage tuning system using multi-purpose ring oscillators

    Full text link
    This paper presents a novel software driven voltage tuning method that utilises multi-purpose Ring Oscillators (ROs) to provide process variation and environment sensitive energy reductions. The proposed technique enables voltage tuning based on the observed frequency of the ROs, taken as a representation of the device speed and used to estimate a safe minimum operating voltage at a given core frequency. A conservative linear relationship between RO frequency and silicon speed is used to approximate the critical path of the processor. Using a multi-purpose RO not specifically implemented for critical path characterisation is a unique approach to voltage tuning. The parameters governing the relationship between RO and silicon speed are obtained through the testing of a sample of processors from different wafer regions. These parameters can then be used on all devices of that model. The tuning method and software control framework is demonstrated on a sample of XMOS XS1-U8A-64 embedded microprocessors, yielding a dynamic power saving of up to 25% with no performance reduction and no negative impact on the real-time constraints of the embedded software running on the processor

    Automated Hardware Prototyping for 3D Network on Chips

    Get PDF
    Vor mehr als 50 Jahren stellte IntelÂź MitbegrĂŒnder Gordon Moore eine Prognose zum Entwicklungsprozess der Transistortechnologie auf. Er prognostizierte, dass sich die Zahl der Transistoren in integrierten Schaltungen alle zwei Jahre verdoppeln wird. Seine Aussage ist immer noch gĂŒltig, aber ein Ende von Moores Gesetz ist in Sicht. Mit dem Ende von Moore’s Gesetz mĂŒssen neue Aspekte untersucht werden, um weiterhin die Leistung von integrierten Schaltungen zu steigern. Zwei mögliche AnsĂ€tze fĂŒr "More than Moore” sind 3D-Integrationsverfahren und heterogene Systeme. Gleichzeitig entwickelt sich ein Trend hin zu Multi-Core Prozessoren, basierend auf Networks on chips (NoCs). Neben dem Ende des Mooreschen Gesetzes ergeben sich bei immer kleiner werdenden TechnologiegrĂ¶ĂŸen, vor allem jenseits der 60 nm, neue Herausforderungen. Eine Schwierigkeit ist die WĂ€rmeableitung in großskalierten integrierten Schaltkreisen und die daraus resultierende Überhitzung des Chips. Um diesem Problem in modernen Multi-Core Architekturen zu begegnen, muss auch die Verlustleistung der Netzwerkressourcen stark reduziert werden. Diese Arbeit umfasst eine durch Hardware gesteuerte Kombination aus Frequenzskalierung und Power Gating fĂŒr 3D On-Chip Netzwerke, einschließlich eines FPGA Prototypen. DafĂŒr wurde ein Takt-synchrones 2D Netzwerk auf ein dreidimensionales asynchrones Netzwerk mit mehreren Frequenzbereichen erweitert. ZusĂ€tzlich wurde ein skalierbares Online-Power-Management System mit geringem Ressourcenaufwand entwickelt. Die Verifikation neuer Hardwarekomponenten ist einer der zeitaufwendigsten Schritte im Entwicklungsprozess hochintegrierter digitaler Schaltkreise. Um diese Aufgabe zu beschleunigen und um eine parallele Softwareentwicklung zu ermöglichen, wurde im Rahmen dieser Arbeit ein automatisiertes und benutzerfreundliches Tool fĂŒr den Entwurf neuer Hardware Projekte entwickelt. Eine grafische BenutzeroberflĂ€che zum Erstellen des gesamten Designablaufs, vom Erstellen der Architektur, Parameter Deklaration, Simulation, Synthese und Test ist Teil dieses Werkzeugs. Zudem stellt die GrĂ¶ĂŸe der Architektur fĂŒr die Erstellung eines Prototypen eine besondere Herausforderung dar. FrĂŒhere Arbeiten haben es versĂ€umt, eine schnelles und unkompliziertes Prototyping, insbesondere von Architekturen mit mehr als 50 Prozessorkernen, zu realisieren. Diese Arbeit umfasst eine Design Space Exploration und FPGA-basierte Prototypen von verschiedenen 3D-NoC Implementierungen mit mehr als 80 Prozessoren

    Enabling Fine-Grain Restricted Coset Coding Through Word-Level Compression for PCM

    Full text link
    Phase change memory (PCM) has recently emerged as a promising technology to meet the fast growing demand for large capacity memory in computer systems, replacing DRAM that is impeded by physical limitations. Multi-level cell (MLC) PCM offers high density with low per-byte fabrication cost. However, despite many advantages, such as scalability and low leakage, the energy for programming intermediate states is considerably larger than programing single-level cell PCM. In this paper, we study encoding techniques to reduce write energy for MLC PCM when the encoding granularity is lowered below the typical cache line size. We observe that encoding data blocks at small granularity to reduce write energy actually increases the write energy because of the auxiliary encoding bits. We mitigate this adverse effect by 1) designing suitable codeword mappings that use fewer auxiliary bits and 2) proposing a new Word-Level Compression (WLC) which compresses more than 91% of the memory lines and provides enough room to store the auxiliary data using a novel restricted coset encoding applied at small data block granularities. Experimental results show that the proposed encoding at 16-bit data granularity reduces the write energy by 39%, on average, versus the leading encoding approach for write energy reduction. Furthermore, it improves endurance by 20% and is more reliable than the leading approach. Hardware synthesis evaluation shows that the proposed encoding can be implemented on-chip with only a nominal area overhead.Comment: 12 page
    • 

    corecore