Operation Scheduling Algorithms for Power, Energy and Resource Minimization in High-Level Synthesis

Abstract

Power, energy and resource minimization subject to a latency constraint are important optimization objectives in operation scheduling in high-level synthesis. The research work presented herein aims to address each of the objective as follows. First, we proposed that the degree of optimization achievable in high-level synthesis (HLS) designs with functional unit (FU) or module selection is significantly dependent on how the FUs in the resource library are parameterized. For power minimization, our proposal is that appreciably more power optimization is possible when: • the FUs for each function type (FT) have a wide range of both power and delay metrics; • their pair-wise power-delay product ratios are close to 1, say, in the range [0.8, 1.25], than when these criteria are not satisfied. We showed that it is possible to achieve these parameter ranges for arithmetic FTs due to design variety and flexibility to hierarchically combine different design approaches. We also provided a probabilistic rationale for our hypotheses and further bolster it empirically by constructing different FU libraries that either meet or do not meet the above FU parameter criteria. Using a new power-driven simulated annealing (SA) based algorithm PSA, we consistently found that the power consumption of designs using libraries that meet our criteria are significantly lower than those that do not. Then, we proposed a leakage energy (LE) minimization scheduling algorithm LPR-GPS. It co-explores unit-time leakage power (LP) and latency spaces in order to minimize their product. LPR-GPS extends the classical force-directed scheduling (FDS) by: • an initial probabilistic distribution graph (DG) based on a non-uniform probability-driven randomized scheduling that yields the final starting scheduling probabilities that are conducive to LE minimization; • a root-mean-square (RMS) based estimation of the maximum FU usage distributed across cc’s that contributes to LE minimization; • a fast and greedy noncommittal scheduling algorithm for estimating the latency by scheduling output operations first. Experimental results show LPR-GPS reduces total LE by an average of 44% compared to the power-driven FDS and 12% compared to a version of LPR-GPS that only minimizes unit-LP. Finally, we proposed an iterative list scheduling (LS) type algorithm FALLS to minimize the total number of FUs allocated, and thus the total area, in HLS designs. FALLS incorporates a novel lookahead technique to selectively schedule available non-0-slack operations by allocating the needed FUs earlier or reserving available FUs for scheduling more timing-urgent operations later, such that no additional FU is needed and a higher FU utilization is obtained. Further, a fractional search framework is developed to iteratively estimate the number of FUs of each FT required in the final design based on the current scheduling and FU utilization, and reiterate the lookahead-based list scheduling with the new FU allocation estimate to further increase FU utilization. Experimental results comparing FALLS with several state-of-the-art algorithms using a non-trivial FU library show an average 18.9% to 71.4% FU reduction while only has 5.5% optimality gap compared to an optimal integer linear programming (ILP) formulation. FALLS also performs much better in architectural area (FU + mux/demux + register area), interconnect congestion and number of interconnects than state-of-the-art approximate algorithms, and is at most 4.0% worse in these metrics than the optimal ILP method

    Similar works