3 research outputs found

    Operation Scheduling Algorithms for Power, Energy and Resource Minimization in High-Level Synthesis

    No full text
    Power, energy and resource minimization subject to a latency constraint are important optimization objectives in operation scheduling in high-level synthesis. The research work presented herein aims to address each of the objective as follows. First, we proposed that the degree of optimization achievable in high-level synthesis (HLS) designs with functional unit (FU) or module selection is significantly dependent on how the FUs in the resource library are parameterized. For power minimization, our proposal is that appreciably more power optimization is possible when: • the FUs for each function type (FT) have a wide range of both power and delay metrics; • their pair-wise power-delay product ratios are close to 1, say, in the range [0.8, 1.25], than when these criteria are not satisfied. We showed that it is possible to achieve these parameter ranges for arithmetic FTs due to design variety and flexibility to hierarchically combine different design approaches. We also provided a probabilistic rationale for our hypotheses and further bolster it empirically by constructing different FU libraries that either meet or do not meet the above FU parameter criteria. Using a new power-driven simulated annealing (SA) based algorithm PSA, we consistently found that the power consumption of designs using libraries that meet our criteria are significantly lower than those that do not. Then, we proposed a leakage energy (LE) minimization scheduling algorithm LPR-GPS. It co-explores unit-time leakage power (LP) and latency spaces in order to minimize their product. LPR-GPS extends the classical force-directed scheduling (FDS) by: • an initial probabilistic distribution graph (DG) based on a non-uniform probability-driven randomized scheduling that yields the final starting scheduling probabilities that are conducive to LE minimization; • a root-mean-square (RMS) based estimation of the maximum FU usage distributed across cc’s that contributes to LE minimization; • a fast and greedy noncommittal scheduling algorithm for estimating the latency by scheduling output operations first. Experimental results show LPR-GPS reduces total LE by an average of 44% compared to the power-driven FDS and 12% compared to a version of LPR-GPS that only minimizes unit-LP. Finally, we proposed an iterative list scheduling (LS) type algorithm FALLS to minimize the total number of FUs allocated, and thus the total area, in HLS designs. FALLS incorporates a novel lookahead technique to selectively schedule available non-0-slack operations by allocating the needed FUs earlier or reserving available FUs for scheduling more timing-urgent operations later, such that no additional FU is needed and a higher FU utilization is obtained. Further, a fractional search framework is developed to iteratively estimate the number of FUs of each FT required in the final design based on the current scheduling and FU utilization, and reiterate the lookahead-based list scheduling with the new FU allocation estimate to further increase FU utilization. Experimental results comparing FALLS with several state-of-the-art algorithms using a non-trivial FU library show an average 18.9% to 71.4% FU reduction while only has 5.5% optimality gap compared to an optimal integer linear programming (ILP) formulation. FALLS also performs much better in architectural area (FU + mux/demux + register area), interconnect congestion and number of interconnects than state-of-the-art approximate algorithms, and is at most 4.0% worse in these metrics than the optimal ILP method

    On the Correlation between Resource Minimization and Interconnect Complexities in High-Level Synthesis

    No full text
    As the technology node of VLSI designs advances to sub10 nm, two interconnect-centric metrics of a circuit, the interconnect complexity (either number of interconnects or wirelength/WL) and congestion, become critically important across all design stages alongside conventional resource or function-unit (FU)-centric metrics like area/number-of-FUs and leakage power. High Level synthesis (HLS), one of the earliest and most impactful design stages, rarely monitors interconnect metrics, which makes their recovery at later stages very difficult. HLS algorithms and tools typically perform FU-centric minimization via operation scheduling, module selection (SMS) and binding. As a consequence, it mostly overlooks interconnect-based metrics. In this paper, we explore whether this can adversely affect interconnect metrics, and in general explore the correlation between FU-centric optimization in SMS, and the resulting interconnect metrics co-optimized (along with FU metrics) in the later binding stage(s). For this purpose we develop a probabilistic analysis for post-scheduling binding to estimate interconnect metrics, and verify its accuracy by comparison to empirical results across different scheduling techniques that generate different degrees of FU optimization. Based on both empirical and analytical results we predict how interconnects metrics will pan out with different degrees of FU optimization. Finally, based on our analysis, we also provide suggestions to improve interconnect metrics for whatever FU optimization degree an available SMS technique can achieve

    A Power-Driven Stochastic-Deterministic Hierarchical High-Level Synthesis Framework for Module Selection, Scheduling and Binding

    No full text
    We present a power-driven hierarchical framework for module/functional-unit selection, scheduling, and binding in high level synthesis. A significant aspect of algorithm design for large and complex problems is arriving at tradeoffs between quality of solution and timing complexity. Towards this end, we integrate an improved version of the very runtime-efficient list scheduling algorithm called modified list scheduling (MLS) with a power-driven simulated annealing (SA) algorithm for module selection. Our hierarchical framework efficiently explores the problem solution space by an extensive exploration of the power-driven module-selection solution space via SA, and for each module selection solution, uses MLS to obtain a scheduling and (integrated) binding (S&B) solution in which the binding is either a regular one (minimizing number of FUs and thus FU leakage power) or power-driven with mux/demux power considerations. This framework avoids the very runtime intensive exploration of both module selection and S&B within a conventional SA algorithm, but retains the basic prowess of SA by exploring only the important aspect of power-driven module-selection in a stochastic manner. The proposed hierarchical framework provides an average of 9.5% FU leakage power improvement over state of the art (approximate) algorithms that optimize only FU leakage power, and has a smaller runtime by factors of 2.5–3x. Further, compared to a sophisticated flat simulated annealing framework and an optimal 0/1-ILP formulation for total (dynamic and leakage) FU and architecture power optimization under latency constraints, PSA-MLS provides an improvement of 5.3–5.8% with a runtime advantage of 2x, and has an average optimality gap of only 4.7–4.8% with a significant runtime advantage of a factor of more than 1900, respectively
    corecore