456 research outputs found

    Software directed issue queue power reduction

    Get PDF
    The issue logic of a superscalar processor dissipates a large amount of static and dynamic power. Furthermore, its power density makes it a hot-spot requiring expensive cooling systems and additional packaging. In this paper we present a novel software assisted approach to power reduction where the processor dynamically resizes the issue queue based on compiler analysis. The compiler passes information to the processor about the number of entries needed which limits the number of instructions dispatched and resident in the queue. This saves power without adversely affecting performance. Compared with recently proposed hardware techniques, our approach is faster, simpler and saves more power. Using a simplistic scheme we achieve 47% dynamic and 31% static power savings in the issue queue with only a 2.2% performance loss. We then show that the performance loss can be reduced to less than 1.3% with 45% dynamic and 30% static power savings, outperforming all current approaches.Postprint (published version

    Optimizing Embedded Software of Self-Powered IoT Edge Devices for Transient Computing

    Get PDF
    IoT edge computing becomes increasingly popular as it can mitigate the burden of cloud servers significantly by offloading tasks from the cloud to the edge which contains the majority of IoT devices. Currently, there are trillions of edge devices all over the world, and this number keeps increasing. A vast amount of edge devices work under power-constrained scenarios such as for outdoor environmental monitoring. Considering the cost and sustainability, in the long run, self-powering through energy harvesting technology is preferred for these IoT edge devices. Nevertheless, a common and critical drawback of self-powered IoT edge devices is that their runtime states in volatile memory such as SRAM will be lost during the power outage. Thanks to the state-of-the-art non-volatile processor (NVP), the runtime volatile states can be saved into the on-chip non-volatile memory before the power outage and recovered when harvesting power becomes available. Yet the potential of a self-powered IoT edge device is still hindered by the intrinsic low energy efficiency and reliability. In order to fully exert the potentials of existing self-powered IoT edge devices, this dissertation aims at optimizing the energy efficiency and reliability of self-powered IoT edge devices through several software approaches. First, to prevent execution progress loss during the power outage, NVP-aware task schedulers are proposed to maximize the overall task execution progress especially for the atomic tasks of which the unfinished progress is subjected to loss regardless of having been checkpointed. Second, to minimize both the time and energy overheads of checkpointing operations on non-volatile memory, an intelligent checkpointing scheme is proposed which can not only ensure a successful checkpointing but also predict the necessity of conducting checkpointing to avoid excessive checkpointing overhead. Third, to avoid inappropriate runtime CPU clock frequency with low energy utility, a CPU frequency modulator is proposed which adjusts the runtime CPU clock frequency adaptively. Finally, to thrive in ultra-low harvesting power scenarios, a light-weight software paradigm is proposed to help maximize the energy extraction rate of the energy harvester and power regulator bundle. Besides, checkpointing is also optimized for more energy-efficient and light-weight operation

    Verification of Concurrent Systems : optimality, Scalability and Applicability

    Get PDF
    Tesis inédita de la Universidad Complutense de Madrid, Facultad de Informática, leída el 14-10-2020Tanto el testing como la verificacion de sistemas concurrentes requieren explorar todos los posibles entrelazados no deterministas que la ejecucion concurrente puede tener, ya que cualquiera de estos entrelazados podra revelar un comportamiento erroneo del sistema. Esto introduce una explosion combinatoria en el numero de estados del programa que deben ser considerados, lo que frecuentemente lleva a un problema computacionalmente intratable. El objetivo de esta tesis es el desarrollo de tecnicas novedosas para el testing y la verificacion de programas concurrentes que permitan reducir esta explosion combinatoria...Both verification and testing of concurrent systems require exploring all possible non-deterministic interleavings that the concurrent execution may have, as any of the interleavings may reveal an erroneous behavior of the system. This introduces a combinatorial explosion on the number of program states that must be considered, what leads often to a computationally intractable problem. The overall goal of this thesis is to investigate novel techniques for testing and verification of concurrent programs that reduce this combinatorial explosion...Fac. de InformáticaTRUEunpu

    Characterization and Avoidance of Critical Pipeline Structures in Aggressive Superscalar Processors

    Get PDF
    In recent years, with only small fractions of modern processors now accessible in a single cycle, computer architects constantly fight against propagation issues across the die. Unfortunately this trend continues to shift inward, and now the even most internal features of the pipeline are designed around communication, not computation. To address the inward creep of this constraint, this work focuses on the characterization of communication within the pipeline itself, architectural techniques to avoid it when possible, and layout co-design for early detection of problems. I present work in creating a novel detection tool for common case operand movement which can rapidly characterize an applications dataflow patterns. The results produced are suitable for exploitation as a small number of patterns can describe a significant portion of modern applications. Work on dynamic dependence collapsing takes the observations from the pattern results and shows how certain groups of operations can be dynamically grouped, avoiding unnecessary communication between individual instructions. This technique also amplifies the efficiency of pipeline data structures such as the reorder buffer, increasing both IPC and frequency. I also identify the same sets of collapsible instructions at compile time, producing the same benefits with minimal hardware complexity. This technique is also done in a backward compatible manner as the groups are exposed by simple reordering of the binarys instructions. I present aggressive pipelining approaches for these resources which avoids the critical timing often presumed necessary in aggressive superscalar processors. As these structures are designed for the worst case, pipelining them can produce greater frequency benefit than IPC loss. I also use the observation that the dynamic issue order for instructions in aggressive superscalar processors is predictable. Thus, a hardware mechanism is introduced for caching the wakeup order for groups of instructions efficiently. These wakeup vectors are then used to speculatively schedule instructions, avoiding the dynamic scheduling when it is not necessary. Finally, I present a novel approach to fast and high-quality chip layout. By allowing architects to quickly evaluate what if scenarios during early high-level design, chip designs are less likely to encounter implementation problems later in the process.Ph.D.Committee Chair: Scott Wills; Committee Member: David Schimmel; Committee Member: Gabriel Loh; Committee Member: Hsien-Hsin Lee; Committee Member: Yorai Ward

    A low-power cache system for high-performance processors

    Get PDF
    制度:新 ; 報告番号:甲3439号 ; 学位の種類:博士(工学) ; 授与年月日:12-Sep-11 ; 早大学位記番号:新576

    Banked microarchitectures for complexity-effective superscalar microprocessors

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.Includes bibliographical references (p. 95-99).High performance superscalar microarchitectures exploit instruction-level parallelism (ILP) to improve processor performance by executing instructions out of program order and by speculating on branch instructions. Monolithic centralized structures with global communications, including issue windows and register files, are used to buffer in-flight instructions and to maintain machine state. These structures scale poorly to greater issue widths and deeper pipelines, as they must support simultaneous global accesses from all active instructions. The lack of scalability is exacerbated in future technologies, which have increasing global interconnect delay and a much greater emphasis on reducing both switching and leakage power. However, these fully orthogonal structures are over-engineered for typical use. Banked microarchitectures that consist of multiple interleaved banks of fewer ported cells can significantly reduce power, area, and latency of these structures.(cont.) Although banked structures exhibit a minor performance penalty, significant reductions in delay and power can potentially be used to increase clock rate and lead to more complexity-effective designs. There are two main contributions in this thesis. First, a speculative control scheme is proposed to simplify the complicated control logic that is involved in managing a less-ported banked register file for high-frequency superscalar processors. Second, the RingScalar architecture, a complexity-effective out-of-order superscalar microarchitecture, based on a ring topology of banked structures, is introduced and evaluated.by Jessica Hui-Chun Tseng.Ph.D

    WCET Analysis of a Parallel 3D Multigrid Solver Executed on the MERASA Multi-Core

    Get PDF
    To meet performance requirements as well as constraints on cost and power consumption, future embedded systems will be designed with multi-core processors. However, the question of timing analysability is raised with these architectures. In the MERASA project, a WCET-aware multi-core processor has been designed with the appropriate system software. They both guarantee that the WCET of tasks running on different cores can be safely analyzed since their possible interactions can be bounded. Nevertheless, computing the WCET of a parallel application is still not straightforward and a high-level preliminary analysis of the communication and synchronization patterns must be performed. In this paper, we report on our experience in evaluating the WCET of a parallel 3D multigrid solver code and we propose lines for further research on this topic

    Compiler-Directed Energy Savings in Superscalar Processors

    Get PDF
    Institute for Computing Systems ArchitectureSuperscalar processors contain large, complex structures to hold data and instructions as they wait to be executed. However, many of these structures consume large amounts of energy, making them hotspots requiring sophisticated cooling systems. With the trend towards larger, more complex processors, this will become more of a problem, having important implications for future technology. This thesis uses compiler-based optimisation schemes to target the issue queue and register file. These are two of the most energy consuming structures in the processor. The algorithms and hardware techniques developed in this work dynamically adapt the processor's resources to the changing program phases, turning off parts of each structure when they are unused to save dynamic and static energy. To optimise the issue queue, the compiler analysis tracks data dependences through each program procedure. It identifies the critical path through each program region and informs the hardware of the minimum number of queue entries required to prevent it slowing down. This reduces the occupancy of the queue and increases the opportunities to save energy. With just a 1.3% performance loss, 26% dynamic and 32% static energy savings are achieved. Registers can be idle for many cycles after they are last read, before they are released and put back on the free-list to be reused by another instruction. Alternatively, they can be turned off for energy savings. Early register releasing can be used to perform this operation sooner than usual, but hardware schemes must wait for the instruction redefining the relevant logical register to enter the pipeline. This thesis presents an exploration of compiler-directed early register releasing. The compiler can exactly identify the last use of each register and pass the information to the hardware, based on simple data-flow and liveness analysis. The best scheme achieves 15% dynamic and 19% static energy savings. Finally, the issue queue limiting and early register releasing schemes are combined for energy savings in both processor structures. Four different configurations are evaluated bringing 25% to 31% dynamic and 19% to 34% static issue queue energy savings and reductions of 18% to 25% dynamic and 20% to 21% static energy in the register file
    corecore