158 research outputs found

    A comprehensive approach to DRAM power management

    Full text link
    This paper describes a comprehensive approach for using the memory controller to improve DRAM energy efficiency and manage DRAM power. We make three contributions: (1) we describe a simple power-down policy for exploiting low power modes of modern DRAMs; (2) we show how the idea of adaptive history-based memory schedulers can be naturally extended to manage power and energy; and (3) for situations in which additional DRAM power reduction is needed, we present a throttling approach that arbitrarily reduces DRAM activity by delaying the issuance of memory commands. Using detailed microarchitectural simulators of the IBM Power5+ and a DDR2-533 SDRAM, we show that our first two techniques combine to increase DRAM energy efficiency by an average of 18.2%, 21.7%, 46.1%, and 37.1 % for the Stream, NAS, SPEC2006fp, and commercial benchmarks, respectively. We also show that our throttling approach provides performance that is within 4.4 % of an idealized oracular approach.

    FaulTM: Fault-tolerance using hardware transactional memory

    Get PDF
    Fault-tolerance has become an essential concern for processor designers due to increasing soft-error rates. In this study, we are motivated by the fact that Transactional Memory (TM) hardware provides an ideal base upon which to build a fault-tolerant system. We show how it is possible to provide low-cost faulttolerance for serial programs by using a minimallymodified Hardware Transactional Memory (HTM) that features lazy conflict detection, lazy data versioning. This scheme, called FaulTM, employs a hybrid hardware-software fault-tolerance technique. On the software side, FaulTM programming model is able to provide the flexibility for programmers to decide between performance and reliability. Our experimental results indicate that FaulTM produces relatively less performance overhead by reducing the number of comparisons and by leveraging already proposed TM hardware. We also conduct experiments which indicate that the baseline FaulTM design has a good error coverage. To the best of our knowledge, this is the first architectural fault-tolerance proposal using Hardware Transactional Memory.Peer ReviewedPostprint (published version

    Automatic SMT threading for OpenMP applications on the Intel Xeon Phi co-processor

    Get PDF
    Simultaneous multithreading is a technique that can improve performance when running parallel applications on the Intel Xeon Phi co-processor. Selecting the most efficient thread count is however non-trivial, as the potential increase in efficiency has to be balanced against other, potentially negative factors such as inter-thread competition for cache capacity and increased synchronization overheads. In this paper, we extend CRUST (ClusteR-aware Under-subscribed Scheduling of Threads), a technique for finding the optimum thread count of OpenMP applications running on clustered cache architectures, to take the behavior of simultaneous multithreading on the Xeon Phi into account. CRUST can automatically find the optimum thread count at sub-application granularity by exploiting application phase behavior at OpenMP parallel section boundaries, and uses hardware performance counter information to gain insight into the application's behavior. We implement a CRUST prototype inside the Intel OpenMP runtime library and show its efficiency running on real Xeon Phi hardware

    Circuit design of a dual-versioning L1 data cache for optimistic concurrency

    Get PDF
    This paper proposes a novel L1 data cache design with dual-versioning SRAM cells (dvSRAM) for chip multi-processors (CMP) that implement optimistic concurrency proposals. In this new cache architecture, each dvSRAM cell has two cells, a main cell and a secondary cell, which keep two versions of the same data. These values can be accessed, modified, moved back and forth between the main and secondary cells within the access time of the cache. We design and simulate a 32-KB dual-versioning L1 data cache with 45nm CMOS technology at 2GHz processor frequency and 1V supply voltage, which we describe in detail. We also introduce three well-known use cases that make use of optimistic concurrency execution and that can benefit from our proposed design. Moreover, we evaluate one of the use cases to show the impact of the dual-versioning cell in both performance and energy consumption. Our experiments show that large speedups can be achieved with acceptable overall energy dissipation.Postprint (published version

    Tuberculous Abdominal Cocoon

    Get PDF

    From plasma to beefarm: Design experience of an FPGA-based multicore prototype

    Get PDF
    In this paper, we take a MIPS-based open-source uniprocessor soft core, Plasma, and extend it to obtain the Beefarm infrastructure for FPGA-based multiprocessor emulation, a popular research topic of the last few years both in the FPGA and the computer architecture communities. We discuss various design tradeoffs and we demonstrate superior scalability through experimental results compared to traditional software instruction set simulators. Based on our experience of designing and building a complete FPGA-based multiprocessor emulation system that supports run-time and compiler infrastructure and on the actual executions of our experiments running Software Transactional Memory (STM) benchmarks, we comment on the pros, cons and future trends of using hardware-based emulation for research.Peer ReviewedPostprint (author's final draft

    Hardware transactional memory with software-defined conflicts

    Get PDF
    In this paper we propose conflict-defined blocks, a programming language construct that allows programmers to change the concept of conflict from one transaction to another, or even throughout the course of the same transaction. Defining conflicts in software makes possible the removal of dependencies which, though not necessary for the correct execution of the transactions, arise as a result of the coarse synchronization style encouraged by TM. Programmers take advantage of their knowledge about the problem and specify through confict-defined blocks what types of dependencies are superfluous in a certain part of the transaction, in order to extract more performance out of coarse-grained transactions without having to write minimally synchronized code. Our experiments with several transactional benchmarks reveal that using software-defined conflicts, the programmer achieves significant reductions in the number of aborted transactions and improve scalability.Peer ReviewedPostprint (author's final draft

    Neutral Higgs bosons in the MNMSSM with explicit CP violation

    Full text link
    Within the framework of the minimal non-minimal supersymmetric standard model (MNMSSM) with tadpole terms, CP violation effects in the Higgs sector are investigated at the one-loop level, where the radiative corrections from the loops of the quark and squarks of the third generation are taken into account. Assuming that the squark masses are not degenerate, the radiative corrections due to the stop and sbottom quarks give rise to CP phases, which trigger the CP violation explicitly in the Higgs sector of the MNMSSM. The masses, the branching ratios for dominant decay channels, and the total decay widths of the five neutral Higgs bosons in the MNMSSM are calculated in the presence of the explicit CP violation. The dependence of these quantities on the CP phases is quite recognizable, for given parameter values.Comment: 25 pages, 8 figure

    Passivation behaviour of Alloy 31 (UNS N08031) in polluted phosphoric acid at different temperatures

    Full text link
    The influence of temperature (20–80 °C) and chloride concentration (0.06–0.42 wt.% KCl) on the electrochemical behaviour of the UNS N08031 was studied in 40 wt.% polluted phosphoric acid solution. Passivation behaviour was investigated by using potentiostatic tests at different potentials. From the linear regions of the log i vs. log t transients, the parameter n was obtained. The results showed that the applied potential hardly affects on the passivation rate n. However, n values decreased when temperature increased. The values of n demonstrated that the passive film formed on Alloy 31 was compact and highly protective.The authors express their gratitude to the MAEC of Spain (PCI Mediterraneo C/8196/07, C/018046/08, D/023608/09 and D/030177/10), to Programa de Apoyo a la Investigacion y Desarrollo de la UPV (PAID-06-09) and to the Generalitat Valenciana (GV/2011/093) for the financial support and to Dr. Asuncion Jaime for her translation assistance.Escrivá Cerdán, C.; Blasco Tamarit, ME.; García García, DM.; García Antón, J.; Guenbour, A. (2012). Passivation behaviour of Alloy 31 (UNS N08031) in polluted phosphoric acid at different temperatures. Corrosion Science. 56:114-122. https://doi.org/10.1016/j.corsci.2011.11.014S1141225
    • …
    corecore