3,984 research outputs found

    A Study on Performance and Power Efficiency of Dense Non-Volatile Caches in Multi-Core Systems

    Full text link
    In this paper, we present a novel cache design based on Multi-Level Cell Spin-Transfer Torque RAM (MLC STTRAM) that can dynamically adapt the set capacity and associativity to use efficiently the full potential of MLC STTRAM. We exploit the asymmetric nature of the MLC storage scheme to build cache lines featuring heterogeneous performances, that is, half of the cache lines are read-friendly, while the other is write-friendly. Furthermore, we propose to opportunistically deactivate ways in underutilized sets to convert MLC to Single-Level Cell (SLC) mode, which features overall better performance and lifetime. Our ultimate goal is to build a cache architecture that combines the capacity advantages of MLC and performance/energy advantages of SLC. Our experiments show an improvement of 43% in total numbers of conflict misses, 27% in memory access latency, 12% in system performance, and 26% in LLC access energy, with a slight degradation in cache lifetime (about 7%) compared to an SLC cache

    A Low-Power CoAP for Contiki

    Get PDF
    Internet of Things devices will by and large be battery-operated, but existing application protocols have typically not been designed with power-efficiency in mind. In low-power wireless systems, power-efficiency is determined by the ability to maintain a low radio duty cycle: keeping the radio off as much as possible. We present an implementation of the IETF Constrained Application Protocol (CoAP) for the Contiki operating system that leverages the ContikiMAC low-power duty cycling mechanism to provide power efficiency. We experimentally evaluate our low-power CoAP, demonstrating that an existing application layer protocol can be made power-efficient through a generic radio duty cycling mechanism. To the best of our knowledge, our CoAP implementation is the first to provide power-efficient operation through radio duty cycling. Our results question the need for specialized low-power mechanisms at the application layer, instead providing low-power operation only at the radio duty cycling layer

    Elevating commodity storage with the SALSA host translation layer

    Full text link
    To satisfy increasing storage demands in both capacity and performance, industry has turned to multiple storage technologies, including Flash SSDs and SMR disks. These devices employ a translation layer that conceals the idiosyncrasies of their mediums and enables random access. Device translation layers are, however, inherently constrained: resources on the drive are scarce, they cannot be adapted to application requirements, and lack visibility across multiple devices. As a result, performance and durability of many storage devices is severely degraded. In this paper, we present SALSA: a translation layer that executes on the host and allows unmodified applications to better utilize commodity storage. SALSA supports a wide range of single- and multi-device optimizations and, because is implemented in software, can adapt to specific workloads. We describe SALSA's design, and demonstrate its significant benefits using microbenchmarks and case studies based on three applications: MySQL, the Swift object store, and a video server.Comment: Presented at 2018 IEEE 26th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS

    Distributed N-body Simulation on the Grid Using Dedicated Hardware

    Full text link
    We present performance measurements of direct gravitational N -body simulation on the grid, with and without specialized (GRAPE-6) hardware. Our inter-continental virtual organization consists of three sites, one in Tokyo, one in Philadelphia and one in Amsterdam. We run simulations with up to 196608 particles for a variety of topologies. In many cases, high performance simulations over the entire planet are dominated by network bandwidth rather than latency. With this global grid of GRAPEs our calculation time remains dominated by communication over the entire range of N, which was limited due to the use of three sites. Increasing the number of particles will result in a more efficient execution. Based on these timings we construct and calibrate a model to predict the performance of our simulation on any grid infrastructure with or without GRAPE. We apply this model to predict the simulation performance on the Netherlands DAS-3 wide area computer. Equipping the DAS-3 with GRAPE-6Af hardware would achieve break-even between calculation and communication at a few million particles, resulting in a compute time of just over ten hours for 1 N -body time unit. Key words: high-performance computing, grid, N-body simulation, performance modellingComment: (in press) New Astronomy, 24 pages, 5 figure

    Computer Architectures to Close the Loop in Real-time Optimization

    Get PDF
    © 2015 IEEE.Many modern control, automation, signal processing and machine learning applications rely on solving a sequence of optimization problems, which are updated with measurements of a real system that evolves in time. The solutions of each of these optimization problems are then used to make decisions, which may be followed by changing some parameters of the physical system, thereby resulting in a feedback loop between the computing and the physical system. Real-time optimization is not the same as fast optimization, due to the fact that the computation is affected by an uncertain system that evolves in time. The suitability of a design should therefore not be judged from the optimality of a single optimization problem, but based on the evolution of the entire cyber-physical system. The algorithms and hardware used for solving a single optimization problem in the office might therefore be far from ideal when solving a sequence of real-time optimization problems. Instead of there being a single, optimal design, one has to trade-off a number of objectives, including performance, robustness, energy usage, size and cost. We therefore provide here a tutorial introduction to some of the questions and implementation issues that arise in real-time optimization applications. We will concentrate on some of the decisions that have to be made when designing the computing architecture and algorithm and argue that the choice of one informs the other

    Low Power system Design techniques for mobile computers

    Get PDF
    Portable products are being used increasingly. Because these systems are battery powered, reducing power consumption is vital. In this report we give the properties of low power design and techniques to exploit them on the architecture of the system. We focus on: min imizing capacitance, avoiding unnecessary and wasteful activity, and reducing voltage and frequency. We review energy reduction techniques in the architecture and design of a hand-held computer and the wireless communication system, including error control, sys tem decomposition, communication and MAC protocols, and low power short range net works
    • 

    corecore