The paper presents the development of a low-power synthesis flow for the development of dedicated silicon circuits fbr data-dominated applications such as DSP systems. The work was carried out as part of a European ESPRIT low power action and a collaborative "low-power" project involving the universities of Liverpool, Manchester and Sheffield The design flow is briefly d e s m i and some results are presented for mult~~lier implementations and their use m the development of a Discrete Cosine Transform (DCT) circuit 1.
1.

Introduction
In data-dominated applications, the performance capabilities of customized solutions are attractive particularly m applications such as video compression where functionality is fixed or has been standardized. For high volumes, customized VLSI solutions oft& the superior system performance m terms of both area and speed. For example, a processing performance of 100 GUPs/cm2Ws is achievable for computationally complex DSP algorithms m 0.35~ standard cell CMOS technology. The performance gain comes fiom the process of developing an architecture that allows the algorithm to be efficiently mapped to hardware. This results m a very efficient solution as it is oRen possible to exploit characteristics of the algorithm m such a way to allow a highly optimised implementation to be realized There are a number of difkent design flows one of which is d e s m i d m this paper. The resulting designs are characterized by high area utilization, high levels of locality (preserving power) and efficient memory utilization.
Low-power Regular Processors
Broadly speaking, DSP processor development can be classified as two design styles.
The first approach involves the development of an i"dJ -on set processor where suitable processing units are synthesized to achieve the required performance. Once a sensible architecture has been &t&ed, the majority of the design activity is involved m scheduling, memory o p t h i d o n and code transformation. As the underlying architecture is well understood and highly predictable, accurate power perfhmame is acbievable.
The second approach attempts to synthesize an application specific architecture which closely matches the application fimctiomlity that it implements. This allows the designer to apply a number of performance efficient opthisations that would not be avajlable m the previous design flow. In addition, the designer bas access to hardware transfixmations and diffkent number systems which can o f k advantages m some applications.
The material described in this paper concentrates on the synthesis of solutions which fall into the second classification and closely fdlows the work described m the book produced ftom this work'. transfixmations are used to speed up the system's throughput beyond that which is necessary. The voltage is then reduced, slowing up performance until the required throughput rate is met but at a lower power consumption budget. However, in the semi-custom approach, the voltage of the technology is pre-determined by the silicon fiundry and cannot be altered by the user to reduce power. Typically, the voltage will have already been developed for low power operation and the user will access the design though h i phantoms which the user cannot m o w . Therefore, the "reduced voltage" techniques cannot be employed.
Low power design therefbre must be targeted at reducing the switched capacitance. This is the summation of the products of the toggling o c d g on each node m the circuit by the capacitance of that node. It is important to consider switched capacitance as opposed to toggling or circuit capacitance dependently. For example, a circuit may have a large capacitative net which has a low switching activity and which will not contribute greatly to power c4"ption3. Conversely, low capacitance nets may have a lot of switching actm$. For these reasons, "i./ation of switched capacitance m order to reduce power consumption is adopted for the regular processor meta flow.
4.
Regular Processor design flow for signal processing modules
The starting pomt of the regular processor 'beta" design flow is an algorithmic description of the functionality that has to be mapped to the processor. This algorithmic description is then refined using the design flow illushated m figure 1, resulting Solution with efficient wordlength representations. 0 optimized dataflow to "ke transitions on the internal circuit buses.
6.
Operatiodistruction level issues
At this stage m the design flow, the development of a candidate VLSI architecture Win have been chosen. A simple approach is to instantiate a processor fbr each of the processing elements. Detailed architecture of the instruction level processor: A hierarchical approach based on simple, processors can be used. Alternatively, it is possible to develop the circuit architecture using, for example, the Co-ordinate Rotation Digital Computer (CORDIC) transformation. The choice of bit level circuit architecture can also impact power performance.
7.
Power Estimation of Processor Block
A mnnber of studies have been carried out into adders structure^^*^ and multiplier structures3. Some of these results are shown m table 1. The rrmltrplier strudures presented all have different capabjlities e.g. the Wallace-Tree structure can operate at much higher hquencies than the anay structures. This factor was taken into consideration. The table clearly shows that preserving hierarchy at the circuit level allows a power saving. This is particularly observable at higher wordlengths where the increased wordlengths result m many more longer nets which make have a greater @act on power consumption.
8.
AppIication to 1D DCT example The DCT is a key DSP hction that has been used extensively jkr Table 2 . Power consumption of multipliers procesSing random data at 20MHz
Conclusions
In this paper, a brief description of a low-power design flow has been presented. It is shown how a low-power design approach is achievable at an algoritldarchitectural level without resorting to Circuit or technology level techniques. Future works aims to target a design at each stage of the design flow to demonstrate more fully the design gains possible as the DCT exampIe only demonstrates the impact at the operatiodkutmctional and circuit levels.
