Abstract
Introduction
For many years, minimum area and maximum performance were the only two design criteria of any practical importance for commercial chips. Analysis of power consumption was performed only as an afterthought with the results used more to determine packaging requirements than to drive any optimizations. Recently, however, power consumption has begun to play an increasingly important role in determining the overall quality of a design. The principle driver for this has been the explosive increase in the demand for portable electronics such as PDA's, lap-tops, and personal communicators. There has also been an increased push for low-power in the high-performance computing market, motivated by the reliability and cost issues associated with packaging and cooling high power devices such as DEC's 30W Alpha microprocessor [1] .
The surge of interest in low power has spawned numerous research efforts into design techniques for reducing power consumption [2] [3] [4] [5] [6] [7] . The results from these researchers have made it apparent that the most dramatic power reductions stem from optimizations at the higher levels of abstraction. In particular, designers have reported lowpower strategies at the algorithm and architecture levels that promise orders of magnitude savings in power [4] [5], tools at the gate and circuit levels are widely available [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] . Logic synthesis tools targeted at low power are beginning to appear as well. Unfortunately, these tools fail to focus on the higher levels of abstraction where the most significant optimizations are possible.
There have been a few attempts at higher level analysis and optimization tools [22] [23] [24] [25] . These tools tend to suffer from fairly severe inaccuracies, however, with error rates ranging from 50-100% or more. Moreover, all of these attempts fail to recognize that designs cannot be fully optimized by focusing on a single level of abstraction.
Rather it is necessary for optimization efforts, and therefore tools, to span several levels of abstraction in order to produce the highest quality solutions.
The contribution of this paper is a fully-integrated CAD environment that supports a top-down methodology for low-power design. The framework is targeted at digital signal processing (DSP) application-specific integrated circuits (ASIC's), but could serve as a model for more general environments. Section 2 will describe the specific design flow that the framework supports. Section 3 will go on to briefly detail some of the tools that enable the proposed design flow and methodology. Finally, Section 4 will demonstrate the efficacy of these tools and of the CAD framework by applying them to a real-world design -in particular, the implementation of a low-power eighth-order bandpass filter. This example will not only demonstrate the large power savings that are possible using high-level optimization techniques, but also it will illustrate how CAD tools can help in searching the vast ATP implementation space.
Overview of Design Flow
This paper advocates a design flow that spans several levels of the design hierarchy as shown in Figure 1 . The suggested design flow begins at the algorithm level and proceeds down to layout. At each level, the designer is free to apply optimizations appropriate to that level. This section takes the reader through the design flow, indicating the lowpower techniques apropos to each step and describing how a hierarchy of analysis and optimization tools can be used to converge on the desired low-power solution.
Algorithm Level
In accordance with the top-down approach and the focus on DSP applications, the designer first explores his/her algorithmic alternatives. Often different algorithms will be available that accomplish the same task, but have quite different complexities. For example, [26] describes three different speech coding algorithms of approximately equal quality that have complexities differing by as much as 50%. Since it does not contribute to quality, this additional complexity is wasted. Avoiding waste is a recurring theme of low-power design. The modularity or locality of the algorithm also has an important effect on power. Data transfers on global buses and accesses to global memories consume a large amount of power. Algorithms with a high degree of locality tend to map well onto more power-efficient distributed architectures with fewer global buses and memories. A useful CAD environment should contain a tool for evaluating the impact of these algorithm-level issues on power consumption. The CAD environment described in this paper, contains an algorithm-level power estimator to satisfy this need. Accurately predicting power and performance based solely on algorithmic criteria is a difficult problem, but by taking a library-based approach, the tools are able to
give estimates a firm basis in reality.
Complexity and locality are not the sole predictors of power consumption. Many researchers have suggested that available concurrency is an equally important criterion, since it determines how well the algorithm will map to the low-voltage, parallel architecture that have become popular in the low-power community [2] [3] [4] [5] . Running processors at low voltage and exploiting parallelism to circumvent the loss in performance is sometimes referred to as trading area/performance for power [3] . The CAD environment we propose contains a novel ATP exploration tool that allows the designer to rapidly make these trade-offs between the often conflicting requirements of complexity and concurrency.
Using the algorithm-level estimation and exploration tools, the designer can begin to narrow down the ATP search space. The results from both tools, however, contain a good deal of uncertainty. The reason is that at this stage the chip structure and behavior are not fully-specified, and this limits the accuracy that can be obtained at the algorithm level. This point is exemplified by Figure 2 , which shows the algorithmic and architectural power estimates for nine versions of a digital filter. In general, there is adequate correlation between the algorithmic and architectural estimates in spite of the lower absolute accuracy of the algorithm-level tool. Therefore, estimates at the algorithm level can be used for relative evaluation of different designs. One must bear in mind, however, the limited accuracy of the algo- Therefore, an appropriate strategy is to use the algorithm-level estimation and ATP exploration tools to get a rough ranking of algorithms and architectures. The designer can then narrow the search space down to a few candidates that seem to offer good performance (within the confidence interval of the estimation tools). These candidates can be analyzed more carefully at the architecture level. Such an approach is only possible since our design environment integrates tools at several levels of abstraction.
Architecture Level
After exploration at the algorithm level, the designer can consider optimizations at the architecture level. Since architectural analysis tools have more information available to them, they can provide the designer with the accuracy required to make a final selection of the best algorithmic and architectural solution. Moreover, tools at this level are able to analyze phenomenon that are transparent to the higher level estimation tools. For example, the assignment of operations to hardware units affects the signal statistics/activity of the architecture and, therefore, its power consumption. By assigning operations from a temporally and spatially local portion of the computational graph onto the same processing elements, maximum correlation and, therefore, minimum activity and power can be attained. We refer to this low-power technique as exploiting locality. The algorithmic model is oblivious to such effects, however, architectural power analysis can be used to make these kinds of refinements. 
Gate/Circuit/Layout Levels
After an algorithmic and architectural selection have been made, the designer can proceed to gate-, circuit-, and layout-level implementation of the system. Further power optimizations are possible at these levels. For instance, the designer can supply the synthesis tools with a low-power cell library [27] . Also, chip placement and routing can be targeted towards minimizing the activity-capacitance product of wires [28] [29] . Moreover, as a final verification step, switch-or device-level analysis tools (such as PowerMill [13] and SPICE [14] ) can be used to validate the results of the higher level tools. In order to support the design flow down to this low level of abstraction our CAD framework has been seamlessly linked to the Lager IV silicon compilation system [30] .
Summary
To review, the methodology proposed in this section advocates a top-down approach to design optimization. Beginning at the algorithm level, the designer can invoke behavioral power estimators to begin to classify alternative algorithms in terms of intrinsic power requirements. ATP exploration tools can then be used to evaluate the suitability of algorithms for implementation on low-power (e.g. low-voltage, concurrent) architectures. Next, architectural power analysis can be used to verify design decisions made at the algorithm-level and to explore additional opportunities for power optimization that are transparent to algorithmic estimation tools. At all phases, power predictions are based on data from pre-characterized cell libraries, allowing the user to have some level of confidence in even the highest level estimates. Finally, the optimized algorithm and architecture can be synthesized down to layout and verified with lowlevel analysis tools.
Low-Power CAD Environment and Tools
The previous section took us through a proposed design flow and mentioned several analysis and exploration tools that would be required to support this methodology. This section provides some details relating to the CAD framework and the tools it contains.
Environment
The low-power CAD framework has been built around HYPER, a high-level synthesis tool targeted at generating ASIC's for datapath-intensive DSP applications. Previous publications have described various aspects of the HYPER [34] . Therefore, this paper will focus on recent extensions to HYPER that allow the user to perform the kind of multi-level optimization and exploration that was described in the previous section.
As shown in Figure 3 , HYPER relies on three main tools to facilitate low-power design: an algorithmic power estimator, an ATP exploration tool, and an architectural power analyzer. All these tools employ a library of power models for datapath, memory, control and interconnect. Each tool makes use of whatever information is available to it at that level of abstraction. By applying these tools in an integrated, top-down fashion the user is able to begin with a highlevel description of the desired functionality and systematically converge to the optimum low-power algorithm and architecture. The following sections will describe each of the three tools, discussing the advantages and limitations imposed by the level of the tool in the design hierarchy.
Algorithmic Power Estimation
The task of algorithmic power estimation is to predict the power consumption of the datapath, memory, control, and interconnect components of a chip given only the algorithm to be executed. This is not an easy task since the same operations can consume different amounts of power when performed on different pieces of hardware. Still, it is possible to produce some sort of power estimates based solely on the information available from the algorithm description.
For example, the datapath and memory power consumption can be estimated by looking at the type, quantity, and characteristic energy of the various operations required by the algorithm. To some extent, these characteristic energies will depend upon the final hardware implementation, however, some rough estimates can be made based perhaps on results from previous designs or from existing hardware libraries. For instance, HYPER selects memory and execution units from a custom, low-power hardware library. Since the library is pre-defined, the power consumption of each cell can be characterized a priori, reducing the datapath power estimation task to a series of table lookups.
While operation and memory access counts can help us to estimate datapath and memory power consumption based solely on the algorithm description, it is difficult to estimate controller and interconnect power without more detailed implementation-specific information such as hardware allocations and chip area. By analyzing the topology of the behavioral flowgraph including data dependencies and timing constraints, it is possible to produce reasonable estimates of these factors [32] . Combining these estimates with statistical models based on past designs, algorithmic estimators can provide meaningful early predictions of control and interconnect power consumption. These models might take the form of a database relating the controller power or the average interconnect length of previous chips to algorithm-level parameters. HYPER's algorithmic power estimator employs this approach. A statistical controller power model was extracted from a set of 46 benchmarks representing a wide cross-section of DSP applications [35] . 
ATP Exploration
The accuracy of the resulting model is depicted in Figure 4a . The average and maximum modeling errors for the benchmark set are 11.5% and 41.1%, respectively. A similar strategy was used to develop a high-level area model (see Figure 4b ), which can be used directly to estimate average interconnect length [36] . The average and maximum errors of this model are 17% and 44%, respectively.
The principle advantage of algorithm-level estimation is that it provides valuable feedback to the user very early in the design process. Not surprisingly, this comes at the cost of accuracy since the tools have access to only a very small amount of implementation information. For example, without knowledge of the final architecture and how hardware resources are shared, we cannot determine the activity statistics of signals and circuits. Consequently, the modules are characterized for completely random uniform white noise inputs. In many cases, the white noise models are sufficient, with errors as low as 10-20% relative to switch-level simulations; however, when data streams are correlated estimation errors grow and results can be off by 50-100% or more (see Figure 2) [37] [38] . These inaccuracies can be eliminated at the architecture level of analysis.
ATP Design Space Exploration
While estimation allows the designer to plot a single point in the ATP space, it is also important to have a tool which allows the designer to explore a whole range of trade-offs between area, time, and power. As mentioned in Section 2, one way of trading area and performance for power is to scale down voltage and make up for the loss in performance by employing techniques such as parallel processing. The HYPER environment provides an exploration tool that allows the designer to see this trade-off between area and power graphically.
The output of the tool is a set of exploration curves that plot the estimated area and power of an algorithm as a function of supply voltage (see Figure 5 ). Typically the area increases for lower voltages since the longer circuit delays must be compensated for by additional parallel processors (with their associated interconnect, memory, and control). This parallel processing allows the design to meet the algorithmic performance requirements while operating at reduced voltage.
The ATP exploration curves are produced point-by-point by iteratively invoking the algorithmic power estimator over a range of operating voltages [35] . The curves provide the designer useful feedback that can guide him/her in selecting the algorithm and architecture that offer the best compromise between area, performance, and power. As the exploration curves are based directly on the algorithmic power estimates, they are subject to the same inaccuracies.
For a finer grain classification of implementations we must rely on an architectural power analysis tool as described in the next section.
Architectural Power Analysis
Prior to specifying an architecture, very little implementation specific information is available to the estimation tools. This limits the accuracy that these tools can achieve. Much better results can be obtained using architectural power analysis since the allocation, partitioning, and interconnection of datapath, memory, and control modules are more completely specified.
For example, it was difficult to model power consumption at the algorithm level due to lack of knowledge about the activity statistics of the inputs feeding the various module. In contrast, at the architecture level the activity can be ascertained by performing fast and efficient functional simulation of the design, perhaps using register-transfer level VHDL as a simulation platform. Then, accurate power estimates can be generated by applying black-box models that calculate module power consumption as a function of these activity measurements. So, instead of characterizing modules only for uniform white noise inputs, modules could be characterized for many different types of input activity.
The HYPER architectural power analyzer, for example, utilizes this strategy to accurately estimate datapath and memory power consumption. The model is based on the realization that two's-complement data consists of two types of bits, which exhibit quite different activity patterns. The least significant bits (LSB's) contain data and the most significant bits (MSB's) contain sign. The distinct activity behavior of the two bit types is depicted in Figure 6 . The difficulty with architectural power analysis stems primarily from lingering uncertainties as to the final placement and routing of the register-transfer level (RTL) components. This lack of information makes it difficult to accurately estimate interconnect power consumption. Possible solutions to the dilemma include using interconnect models based on derivatives of Rent's Rule [36] or back-annotation after early floorplanning.
Overall, HYPER's architecture-level power estimates are typically within 20% of switch-level simulations based on extracted layouts. Since the architectural power analysis tool provides such accurate results, it is used in this paper as a basis of comparison for higher level estimates. 
Case Study: The Avenhaus Filter
Sections 2 and 3 described several recurring themes or strategies for low-power design as well as a set of tools forming a low-power CAD framework. In this section we present a case study in low-power design that will demonstrate how these high-level analysis and exploration tools can be used to support a comprehensive low-power design methodology. The general approach taken here is applicable to a wide variety of applications, however, for the purposes of illustration we will consider the task of producing a low-power implementation of the Avenhaus filter [39] .
Using this example, we proceed through a sample design flow, at each stage highlighting issues of particular importance. The process begins with a preliminary evaluation of the Avenhaus filter, comparing the various structures that can be used to implement its transfer function. Next, we explore the design space at the algorithm level, applying several of the low-power strategies of section 2. After narrowing down the design space, we proceed to architecture-level analysis in order to verify and refine our design decisions. We conclude with a review of the power savings achieved at each stage of optimization.
Preliminary Evaluation
The Avenhaus, an eighth-order bandpass filter, can have several different structural implementations. The different structures considered here are the cascade, continued fraction, direct form II, ladder, and parallel forms proposed by
Crochiere [40] . Each of these forms has a very different computational structure and, thus, might be expected to map to distinct points in the ATP design space. Other studies have considered the algorithm-level area and performance trade-offs for these structures [33] [41], but to the best of our knowledge this is the first attempt to study the power aspects.
We shall assume that the designer is free to select the algorithm (i.e. filter structure), apply any number of transformations on it, and choose an appropriate supply voltage, but is limited by a 2.75MHz throughput requirement imposed by the surrounding system. As a preliminary step in the algorithm selection process, we can profile each filter structure in terms of several key parameters that will have some influence on the power consumption (see Table 1 ). The table shows the maximum throughput of each algorithm (in terms of critical path and sample frequency) and the complexity (in terms of required word length and operation count). It also gives an estimate of the energy required per sample assuming a straightforward implementation of each structure. To allow a direct comparison, maximum frequency and energy results are quoted for standard 5V implementations, with clocking frequencies set so that all operations finish within a single clock cycle. Note that all estimates were produced using the behavioral estimator described in section 3.2.
As discussed in section 2, the complexity of an algorithm has a major impact on the power consumed. Extra bits of word length contribute to larger physical capacitance and increased activity. Likewise, higher operation counts contribute directly to increased activity and can also necessitate increased hardware and routing, resulting in larger physical capacitance. This is reflected in the table, which shows that the cascade and parallel implementations have among the lowest operation counts and the smallest word lengths and, consequently, the lowest energies. This relates directly to the previously mentioned low-power theme of avoiding waste.
Complexity is not, however, the only factor determining power consumption. The minimum operating voltage also has an important effect. Notice that, in their current forms, only the cascade and parallel structures can meet the 2.75
MHz throughput constraint at 5V. The critical paths for the other algorithms are much longer and, thus, they cannot realize the required sample rate even at 5V.
As yet, however, we have only considered the direct implementation of each filter type. It is quite possible that transformations such as constant multiplication expansion or pipelining could reduce critical path of one of the other algorithms allowing it to operate at a much lower voltage and, perhaps, a lower power. Likewise, other transformations might drastically reduce the complexity of one of the structures making it optimal for power. In summary, at this stage the cascade and parallel forms look promising, but more exploration is needed before making a final selection.
Programmable vs. Dedicated Hardware
One transformation that can be very useful in power reduction is expansion of constant multiplications into adds and shifts. The multiplication operation can then be implemented on one or more adders and programmable shifters.
In this way, only additions and shifts corresponding to 1's in the coefficient are performed. In contrast, the array mul- On the other hand, the dedicated array multiplier has certain advantages. First, it performs shifts by hard-wired routing, while the add-shift version uses programmable shifters, latching intermediate results between stages. These shifters and latches contribute additional overhead and can offset the gains from the reduced number of additions.
Whether this overhead is dominant or not depends on the particular value of the coefficient. Actually, we can even consider optimizing or scaling coefficients for minimum power as suggested in [4] .
For the case of the Avenhaus filters, we see an increase in energy after applying this transformation (see Table 2 ).
Notice that while there are no multiplications, the number of additions has increased substantially and a number of subtract and programmable shift operations are added. This increased complexity has a significant impact on the power consumed. Moreover, the programmable hardware has additional control overhead. Table 2 also reveals a considerable increases in some of the critical paths. This is a considerable penalty since, as we have noted in section 2, a high speed design is always desirable due to its potential for allowing voltage reduction. For this example we, therefore, choose to use dedicated multipliers for each multiplication instead of programmable add/ shifts. Again, this may not always be the optimum decision, and the trade-offs must be evaluated on a case-by-case basis.
Critical Path Reduction and Voltage Scaling
As previously mentioned, supply voltage reduction is an excellent way to reduce power consumption. As is apparent from Table 1 , however, not all the algorithms can meet the throughput constraint even at 5V. Therefore, in their current forms we cannot consider reducing the voltage for any of these structures below 5V. If we can reduce the critical paths, however, we open the possibility of voltage reductions.
Graph transformations provide a powerful tool for design optimization at the algorithm level and can be used for critical path reduction. Transformations alter the graph structures, while preserving the input/output relationships.
Our high-level design environment automates the task of applying transformations. This allows us to explore the effect of many different transformations on each of the candidate algorithms -a task that would not be practical for a designer taking a manual approach.
One important transformation for critical path reduction is pipelining. By allowing more operations to occur in parallel, pipelining reduces the critical path of the design, enabling voltage reduction while still maintaining the required throughput. As a result of pipelining, some of the filter structures that could not meet the throughput constraint initially become feasible, while those already feasible can be made to operate at lower voltages than before [22] . Table 3 shows the reduction in critical paths after pipelining.
But critical path reduction is only an indirect measure of improvement. We are actually interested in the minimum voltage and energy achieved after pipelining (as shown in Figure 8 ). The curves in this figure were produced using the exploration tool discussed in section 3.3. These curves graphically illustrate that voltages (and energies) can be appreciably reduced for all examples (except the continued fraction) by applying pipelining.
The results from the exploration curves are summarized in Table 4 . Notice that the optimum level of pipelining is not always equal to the maximum pipelining. This is due to the overhead associated with pipelining. For example, plotting the exploration curves for the maximally pipelined cascade would reveal that it consumes a minimum energy of 5.1 nJ which is higher than the two-stage version which consumes 4.4 nJ.
It is apparent that the cascade and parallel versions still yield the best quality solutions. Therefore, based on the results obtained from our high-level exploration tools, we can at this point eliminate the continued fraction, direct form, and ladder implementations from consideration. We do not eliminate the parallel form at this stage since it gives results close to the cascade, and the error inherent in the high-level estimation tools does not allow us to resolve differences that are so small.
Power is not, however, the only consideration. As we reduce voltage we require more parallel hardware to meet the throughput constraints. So, while energy can be reduced by this technique, a price has to be paid in terms of area. The area exploration curves of Figure 9 illustrate this point. Since the cost of a component is directly related to the amount of silicon real-estate it requires, real-world designers have a limited amount of silicon area available to them. For our example, in order to achieve the minimum energies, the designer would have to accept significant area penalties as shown in the figure. But if he/she chooses a slightly higher operating voltage of about 2V, the area penalties are much less severe and, moreover, the resulting energies are not significantly higher (see Figure 8 ). The exploration curves allow him/her to evaluate the area penalties associated with parallelism and make appropriate decisions on the areaenergy trade-offs. For the purposes of this example, let us assume that we choose to avoid the large area penalties by operating at 2V.
There are, of course, other power reduction techniques which the designer could explore within the HYPER environment. A number of transformations for low-power are described in [22] . But the purpose of this case study is to present a general methodology and show how high-level tools can be used to facilitate low-power design -not to enumerate all possible implementations of the Avenhaus filter. Figure 9 . Area -energy trade-off for the cascade and parallel versions. 
Architectural Exploration
Using power estimation and exploration tools, we have been able to narrow down the design space to two filter structures: a cascade and a parallel form both with two stages of pipelining operating at 2.75 MHz and 2V. A detailed breakdown of the energy estimates at the algorithm and architecture levels is given in Table 5 . As described in section 3.4, at the algorithm level we are unable to account for the effect of signal statistics. If we assume that we are filtering speech data (which is highly correlated) we might expect some inaccuracy in the algorithmic power estimates. In order to verify and refine these results, we now synthesize the selected algorithms using HYPER and resort to architectural power analysis for more accurate energy estimates. The results of this process are also contained in Table 5 (the meaning of the local assignment column will be explained shortly).
For a speech input, we see that the algorithmic power estimator did, indeed, overestimate the power consumed by the execution units by 47% for the cascade and 60% for the parallel; however, the total chip powers were more accurate with 17% and 23% errors, respectively. The important point, however, is that the errors in the algorithmic power estimates were systematic overestimates rather than random errors. This suggests that relative classifications made during algorithmic design space exploration are meaningful.
Also notice that the cascade structure continues to be the lowest power solution. Therefore, based on these accurate architecture-level estimates, we are able to select the cascade filter as our final low-power Avenhaus implementation. Some final optimizations are still possible, however, at the architecture level.
For instance, assignment of operations to hardware operators can have a significant impact on power. One possible assignment strategy involves assigning operations to operators to maximize locality as mentioned in section 2. This is beneficial because signals that are local to a small subsection of the filter tend to be more highly correlated than likely to be less activity in the sign bits as discussed in section 3.4. This in turn should reduce the power consumption of the implementation. Note that the analysis of this effect can only be performed using the DBT-based architectural power analyzer -an estimator based on the uniform white noise model would not be able to make this distinction. The results of assignment for locality on the cascade are also shown in Table 5 . In this case, the overall energy is reduced an additional 22% with no penalty in area or performance.
Gains from design space exploration.
The preceding case study demonstrates that design decisions at the algorithm and architecture levels may have a tremendous impact on the power. Selecting the correct algorithm (cascade) can save a factor of three in power, compared to the worst case (direct form). Moreover, the direct form (at 89.5 nJ) could not achieve the required 2.75 MHz sampling rate. If the algorithms were compared for the same throughputs, the cascade would actually be even more than 3x better. We also found that, counter to what we may expect, expanding multiplications into shifts and adds is not universally beneficial and actually increases the power consumed for some cases. Transformations on the graph structure, e.g. pipelining, were found to further reduce the power by a factor of more than 6x. Finally, local assignment helped to reduce the power by another 22%. As Table 6 illustrates, coupling the algorithmic improvements with architectural optimizations can allow us to achieve more than an order of magnitude power reduction (27x for this example).
Conclusions
This paper has presented a number of tools that facilitate a hierarchical, top-down approach to low-power design.
These tools are implemented as part of the HYPER high-level synthesis system and form an interactive framework for low-power design. In addition, this paper described a power optimization methodology that emphasized highlevel techniques based on the recurring low-power themes of trading area-performance for power, avoiding waste, and exploiting locality. The efficacy of the methodology and supporting tools was demonstrated through a case study on the Avenhaus filter structures. Using algorithmic and architectural optimization strategies, power savings of 20-30x were demonstrated. This work is ongoing and other issues still need to be addressed. Aside from improvements in the current power modeling strategies, more research is needed into strategies for ordering transformations to enable maximum power reductions. In addition, the current environment is interactive and primarily user-driven. In the future, one can envision a fully-automated environment, which intelligently searches the design space attempting to optimize some cost function based on area, performance, and power.
In conclusion, the high-level design space is vast and presents the designer with numerous trade-offs and implementation options. The multiple degrees of freedom preclude a manual exploration of all avenues for power optimization.
The HYPER design space exploration environment provides the user with a means for rapidly and efficiently searching the design space for solutions that best meet his/her area, performance, and power constraints.
6.
