669 research outputs found

    Power and memory optimization techniques in embedded systems design

    Get PDF
    Embedded systems incur tight constraints on power consumption and memory (which impacts size) in addition to other constraints such as weight and cost. This dissertation addresses two key factors in embedded system design, namely minimization of power consumption and memory requirement. The first part of this dissertation considers the problem of optimizing power consumption (peak power as well as average power) in high-level synthesis (HLS). The second part deals with memory usage optimization mainly targeting a restricted class of computations expressed as loops accessing large data arrays that arises in scientific computing such as the coupled cluster and configuration interaction methods in quantum chemistry. First, a mixed-integer linear programming (MILP) formulation is presented for the scheduling problem in HLS using multiple supply-voltages in order to optimize peak power as well as average power and energy consumptions. For large designs, the MILP formulation may not be suitable; therefore, a two-phase iterative linear programming formulation and a power-resource-saving heuristic are presented to solve this problem. In addition, a new heuristic that uses an adaptation of the well-known force-directed scheduling heuristic is presented for the same problem. Next, this work considers the problem of module selection simultaneously with scheduling for minimizing peak and average power consumption. Then, the problem of power consumption (peak and average) in synchronous sequential designs is addressed. A solution integrating basic retiming and multiple-voltage scheduling (MVS) is proposed and evaluated. A two-stage algorithm namely power-oriented retiming followed by a MVS technique for peak and/or average power optimization is presented. Memory optimization is addressed next. Dynamic memory usage optimization during the evaluation of a special class of interdependent large data arrays is considered. Finally, this dissertation develops a novel integer-linear programming (ILP) formulation for static memory optimization using the well-known fusion technique by encoding of legality rules for loop fusion of a special class of loops using logical constraints over binary decision variables and a highly effective approximation of memory usage

    Optimizing the flash-RAM energy trade-off in deeply embedded systems

    Full text link
    Deeply embedded systems often have the tightest constraints on energy consumption, requiring that they consume tiny amounts of current and run on batteries for years. However, they typically execute code directly from flash, instead of the more energy efficient RAM. We implement a novel compiler optimization that exploits the relative efficiency of RAM by statically moving carefully selected basic blocks from flash to RAM. Our technique uses integer linear programming, with an energy cost model to select a good set of basic blocks to place into RAM, without impacting stack or data storage. We evaluate our optimization on a common ARM microcontroller and succeed in reducing the average power consumption by up to 41% and reducing energy consumption by up to 22%, while increasing execution time. A case study is presented, where an application executes code then sleeps for a period of time. For this example we show that our optimization could allow the application to run on battery for up to 32% longer. We also show that for this scenario the total application energy can be reduced, even if the optimization increases the execution time of the code

    Block level voltage

    Get PDF
    Over the past years, state-of-art power optimization methods move towards higher abstraction levels that result in more efficient power savings. Among existing power optimization approaches, dynamic power management (DPM) is considered to be one of the most effective strategies. Depending on abstraction levels, DPM can be implemented in different formats but here we focus on scheduling that is more suitable for real-time system design use. This differs from the concurrent scheduling approaches that start from either the HLS (High-Level Synthesis) or RTS (Real-Time System) point of view, we propose a synergy solution of both approaches, namely block-level voltage/frequency scheduling (BLVFS). The presented block-level voltage/ frequency scheduling approach shows a generic solution for low power SoC (System on Chip) system design while the approaches which belong to the HLS and RTS categories have a strong dependency on the system functionalities. Consider a SoC as a combination of heterogeneous functional blocks, our approach provides efficient power savings by dynamically scheduling the scaling of voltage and frequency at the same time. Simulation results indicate that by using heuristic based strategies significant power savings can be achieved

    Complexity of Scheduling in Synthesizing Hardware from Concurrent Action Oriented Specifications

    Get PDF
    Concurrent Action Oriented Specifications (CAOS) formalism such as Bluespec Inc.\u27s Bluespec System Verilog (BSV) has been recently shown to be effective for hardware modeling and synthesis. This formalism offers the benefits of automatic handling of concurrency issues in highly concurrent system descriptions, and the associated synthesis algorithms have been shown to produce efficient hardware comparable to those generated from hand-written Verilog/VHDL. These benefits which are inherent in such a synthesis process also aid in faster architectural exploration. This is because CAOS allows a high-level description (above RTL) of a design in terms of atomic transactions, where each transaction corresponds to a collection of operations. Optimal scheduling of such actions in CAOS-based synthesis process is crucial in order to generate hardware that is efficient in terms of area, latency and power. In this paper, we analyze the complexity of the scheduling problems associated with CAOS-based synthesis and discuss several heuristics for meeting the peak power goals of designs generated from CAOS. We also discuss approximability of these problems as appropriate

    Thermal-Safe Test Scheduling for Core-Based System-on-a-Chip Integrated Circuits

    No full text
    Overheating has been acknowledged as a major problem during the testing of complex system-on-chip (SOC) integrated circuits. Several power-constrained test scheduling solutions have been recently proposed to tackle this problem during system integration. However, we show that these approaches cannot guarantee hot-spot-free test schedules because they do not take into account the non-uniform distribution of heat dissipation across the die and the physical adjacency of simultaneously active cores. This paper proposes a new test scheduling approach that is able to produce short test schedules and guarantee thermal-safety at the same time. Two thermal-safe test scheduling algorithms are proposed. The first algorithm computes an exact (shortest) test schedule that is guaranteed to satisfy a given maximum temperature constraint. The second algorithm is a heuristic intended for complex systems with a large number of embedded cores, for which the exact thermal-safe test scheduling algorithm may not be feasible. Based on a low-complexity test session thermal cost model, this algorithm produces near-optimal length test schedules with significantly less computational effort compared to the optimal algorithm

    Analog layout design automation: ILP-based analog routers

    Get PDF
    The shrinking design window and high parasitic sensitivity in the advanced technology have imposed special challenges on the analog and radio frequency (RF) integrated circuit design. In this thesis, we propose a new methodology to address such a deficiency based on integer linear programming (ILP) but without compromising the capability of handling any special constraints for the analog routing problems. Distinct from the conventional methods, our algorithm utilizes adaptive resolutions for various routing regions. For a more congested region, a routing grid with higher resolution is employed, whereas a lower-resolution grid is adopted to a less crowded routing region. Moreover, we strengthen its speciality in handling interconnect width control so as to route the electrical nets based on analog constraints while considering proper interconnect width to address the acute interconnect parasitics, mismatch minimization, and electromigration effects simultaneously. In addition, to tackle the performance degradation due to layout dependent effects (LDEs) and take advantage of optical proximity correction (OPC) for resolution enhancement of subwavelength lithography, in this thesis we have also proposed an innovative LDE-aware analog layout migration scheme, which is equipped with our special routing methodology. The LDE constraints are first identified with aid of a special sensitivity analysis and then satisfied during the layout migration process. Afterwards the electrical nets are routed by an extended OPC-inclusive ILP-based analog router to improve the final layout image fidelity while the routability and analog constraints are respected in the meantime. The experimental results demonstrate the effectiveness and efficiency of our proposed methods in terms of both circuit performance and image quality compared to the previous works

    Test-Delivery Optimization in Manycore SOCs

    Get PDF
    We present two test-data delivery optimization algorithms for system-on-chip (SOC) designs with hundreds of cores, where a network-on-chip (NOC) is used as the interconnection fabric. We first present an e ective algorithm based on a subsetsum formulation to solve the test-delivery problem in NOCs with arbitrary topology that use dedicated routing. We further propose an algorithm for the important class of NOCs with grid topology and XY routing. The proposed algorithm is the first to co-optimize the number of access points, access-point locations, pin distribution to access points, and assignment of cores to access points for optimal test resource utilization of such NOCs. Testtime minimization is modeled as an NOC partitioning problem and solved with dynamic programming in polynomial time. Both the proposed methods yield high-quality results and are scalable to large SOCs with many cores. We present results on synthetic grid topology NOC-based SOCs constructed using cores from the ITC’02 benchmark, and demonstrate the scalability of our approach for two SOCs of the future, one with nearly 1,000 cores and the other with 1,600 cores. Test scheduling under power constraints is also incorporated in the optimization framework

    Resource Management Algorithms for Computing Hardware Design and Operations: From Circuits to Systems

    Get PDF
    The complexity of computation hardware has increased at an unprecedented rate for the last few decades. On the computer chip level, we have entered the era of multi/many-core processors made of billions of transistors. With transistor budget of this scale, many functions are integrated into a single chip. As such, chips today consist of many heterogeneous cores with intensive interaction among these cores. On the circuit level, with the end of Dennard scaling, continuously shrinking process technology has imposed a grand challenge on power density. The variation of circuit further exacerbated the problem by consuming a substantial time margin. On the system level, the rise of Warehouse Scale Computers and Data Centers have put resource management into new perspective. The ability of dynamically provision computation resource in these gigantic systems is crucial to their performance. In this thesis, three different resource management algorithms are discussed. The first algorithm assigns adaptivity resource to circuit blocks with a constraint on the overhead. The adaptivity improves resilience of the circuit to variation in a cost-effective way. The second algorithm manages the link bandwidth resource in application specific Networks-on-Chip. Quality-of-Service is guaranteed for time-critical traffic in the algorithm with an emphasis on power. The third algorithm manages the computation resource of the data center with precaution on the ill states of the system. Q-learning is employed to meet the dynamic nature of the system and Linear Temporal Logic is leveraged as a tool to describe temporal constraints. All three algorithms are evaluated by various experiments. The experimental results are compared to several previous work and show the advantage of our methods
    corecore