69 research outputs found

    Architectural Exploration of Data Recomputation for Improving Energy Efficiency

    Get PDF
    University of Minnesota Ph.D. dissertation. July 2017. Major: Electrical/Computer Engineering. Advisor: Ulya Karpuzcu. 1 computer file (PDF); viii, 99 pages.There are two fundamental challenges for modern computer system design. The first one is accommodating the increasing demand for performance in a tight power budget. The second one is ensuring correct progress despite the increasing possibility of faults that may occur in the system. To address the first challenge, it is essential to track where the power goes. The energy consumption of data orchestration (i.e., storage, movement, communication) dominates the energy consumption of actual data production, i.e., computation. Oftentimes, recomputing data becomes more energy efficient than storing and retrieving pre-computed data by minimizing the prevalent power and performance overhead of data storage, retrieval, and communication. At the same time, recomputation can reduce the demand for communication bandwidth and shrink the memory footprint. In the first half of the dissertation, the potential of data recomputation in improving energy efficiency is quantified and a practical recomputation framework is introduced to trade computation for communication. To address the second challenge, it is needed to provide scalable checkpointing and recovery mechanisms. The traditional method to recover from a fault is to periodically checkpoint the state of the machine. Periodic checkpointing of the machine state makes rollback and restart of execution from a safe state possible upon detection of a fault. The energy overhead of checkpointing, however, as incurred by storage and communication of the machine state grows with the frequency of checkpointing. Amortizing this overhead becomes especially challenging, considering the growth of expected error rates as an artifact of contemporary technology scaling. Recomputation of data (which otherwise would be read from a checkpoint) can reduce both the frequency of checkpointing, the size of the checkpoints and thereby mitigate checkpointing overhead. In the second half, quantitative characterization of recomputation-enabled checkpointing (based on recomputation framework) is provided

    A Logically Centralized Approach for Control and Management of Large Computer Networks

    Get PDF
    Management of large enterprise and Internet Service Provider networks is a complex, error-prone, and costly challenge. It is widely accepted that the key contributors to this complexity are the bundling of control and data forwarding in traditional routers and the use of fully distributed protocols for network control. To address these limitations, the networking research community has been pursuing the vision of simplifying the functional role of a router to its primary task of packet forwarding. This enables centralizing network control at a decision plane where network-wide state can be maintained, and network control can be centrally and consistently enforced. However, scalability and fault-tolerance concerns with physical centralization motivate the need for a more flexible and customizable approach. This dissertation is an attempt at bridging the gap between the extremes of distribution and centralization of network control. We present a logically centralized approach for the design of network decision plane that can be realized by using a set of physically distributed controllers in a network. This approach is aimed at giving network designers the ability to customize the level of control and management centralization according to the scalability, fault-tolerance, and responsiveness requirements of their networks. Our thesis is that logical centralization provides a robust, reliable, and efficient paradigm for management of large networks and we present several contributions to prove this thesis. For network planning, we describe techniques for optimizing the placement of network controllers and provide guidance on the physical design of logically centralized networks. For network operation, algorithms for maintaining dynamic associations between the decision plane and network devices are presented, along with a protocol that allows a set of network controllers to coordinate their decisions, and present a unified interface to the managed network devices. Furthermore, we study the trade-offs in decision plane application design and provide guidance on application state and logic distribution. Finally, we present results of extensive numerical and simulative analysis of the feasibility and performance of our approach. The results show that logical centralization can provide better scalability and fault-tolerance while maintaining performance similarity with traditional distributed approach

    System-level power optimization:techniques and tools

    Get PDF
    This tutorial surveys design methods for energy-efficient system-level design. We consider electronic sytems consisting of a hardware platform and software layers. We consider the three major constituents of hardware that consume energy, namely computation, communication, and storage units, and we review methods of reducing their energy consumption. We also study models for analyzing the energy cost of software, and methods for energy-efficient software design and compilation. This survery is organized around three main phases of a system design: conceptualization and modeling design and implementation, and runtime management. For each phase, we review recent techniques for energy-efficient design of both hardware and software

    Efficient Learning Machines

    Get PDF
    Computer scienc

    Analytical cost metrics: days of future past

    Get PDF
    2019 Summer.Includes bibliographical references.Future exascale high-performance computing (HPC) systems are expected to be increasingly heterogeneous, consisting of several multi-core CPUs and a large number of accelerators, special-purpose hardware that will increase the computing power of the system in a very energy-efficient way. Specialized, energy-efficient accelerators are also an important component in many diverse systems beyond HPC: gaming machines, general purpose workstations, tablets, phones and other media devices. With Moore's law driving the evolution of hardware platforms towards exascale, the dominant performance metric (time efficiency) has now expanded to also incorporate power/energy efficiency. This work builds analytical cost models for cost metrics such as time, energy, memory access, and silicon area. These models are used to predict the performance of applications, for performance tuning, and chip design. The idea is to work with domain specific accelerators where analytical cost models can be accurately used for performance optimization. The performance optimization problems are formulated as mathematical optimization problems. This work explores the analytical cost modeling and mathematical optimization approach in a few ways. For stencil applications and GPU architectures, the analytical cost models are developed for execution time as well as energy. The models are used for performance tuning over existing architectures, and are coupled with silicon area models of GPU architectures to generate highly efficient architecture configurations. For matrix chain products, analytical closed form solutions for off-chip data movement are built and used to minimize the total data movement cost of a minimum op count tree

    From constraint programming to heterogeneous parallelism

    Get PDF
    The scaling limitations of multi-core processor development have led to a diversification of the processor cores used within individual computers. Heterogeneous computing has become widespread, involving the cooperation of several structurally different processor cores. Central processor (CPU) cores are most frequently complemented with graphics processors (GPUs), which despite their name are suitable for many highly parallel computations besides computer graphics. Furthermore, deep learning accelerators are rapidly gaining relevance. Many applications could profit from heterogeneous computing but are held back by the surrounding software ecosystems. Heterogeneous systems are a challenge for compilers in particular, which usually target only the increasingly marginalised homogeneous CPU cores. Therefore, heterogeneous acceleration is primarily accessible via libraries and domain-specific languages (DSLs), requiring application rewrites and resulting in vendor lock-in. This thesis presents a compiler method for automatically targeting heterogeneous hardware from existing sequential C/C++ source code. A new constraint programming method enables the declarative specification and automatic detection of computational idioms within compiler intermediate representation code. Examples of computational idioms are stencils, reductions, and linear algebra. Computational idioms denote algorithmic structures that commonly occur in performance-critical loops. Consequently, well-designed accelerator DSLs and libraries support computational idioms with their programming models and function interfaces. The detection of computational idioms in their middle end enables compilers to incorporate DSL and library backends for code generation. These backends leverage domain knowledge for the efficient utilisation of heterogeneous hardware. The constraint programming methodology is first derived on an abstract model and then implemented as an extension to LLVM. Two constraint programming languages are designed to target this implementation: the Compiler Analysis Description Language (CAnDL), and the extended Idiom Detection Language (IDL). These languages are evaluated on a range of different compiler problems, culminating in a complete heterogeneous acceleration pipeline integrated with the Clang C/C++ compiler. This pipeline was evaluated on the established benchmark collections NPB and Parboil. The approach was applicable to 10 of the benchmark programs, resulting in significant speedups from 1.26× on “histo” to 275× on “sgemm” when starting from sequential baseline versions. In summary, this thesis shows that the automatic recognition of computational idioms during compilation enables the heterogeneous acceleration of sequential C/C++ programs. Moreover, the declarative specification of computational idioms is derived in novel declarative programming languages, and it is demonstrated that constraint programming on Single Static Assignment intermediate code is a suitable method for their automatic detection
    corecore