This paper investigates technical issues concerning the automated generation of highly regular VLSI circuit iayouts (e.g. RAMS, PLAs, systolic arrays) that are crucial to the designability and realizability of large VLSI systems. The key is to determfine the most profitable level of abstraction for the designer, which is accomplished by the introduction of macro abstraction, interface inheritance, delayed binding, and the complete decoupling of procedural and graphical design information. These abstraction mechanisms are implemented in the Regular Structure Generator, an operational layout generator with significant advantages over first generation layout tools. Its advantages are demonstrated by a pipelined array multiplier layout example.
INTRODUCTION
Circuit designs with highly regular and repetitive iayouts are an effective solution to the VLSI design bottieneck, and therefore occur quite often in large VLSI systems. Familiar examples of regular circuit structures are RAMS, ROMs, PLAs, and array multipliers. In addition, recognition of the importance of regularity in VLSI systems has given rise to a large and continually growing collection of new regular structures for applications in signal processing, image processing, #data structures, and CAD, to name a few. Since these designs are computationally powerful and widely applicable, there is a great demand for circuit design tools that make these structures generally accessible. This paper describes a CA:D tool, the .Regular Structure Generator (RSG) , that h lp e s meet this demand by performing automatic generation of regular structure layouts and providing the designer with the means to efficiently capture, in all their richness and variety, most practical regular circuit designs.
Despite the uniform and repetitive appearance of their layouts, effective regular structure circuits are not simply bland arrays of identical, abutting cells. In practice, there is always some degree of complexity along the edges of a Rrsearch sponsored by the Air Force Office of Scientific Research, Air Force Systema Command, USAF, nndcr Contract Number AFOSR F4862084-CcQcJ4. regular array, and each design instance must be parametrically personalized with respect to problem size and functionality. This requires the placement of a variety of ceil maskings that implement such options as transistor and bus sizing, cell interfacing, clock assignment, and functional encoding -a task which cannot be accomplished by the simple array generating commands found in graphics editors. Although regularity does permit most regular structures to be personalized in an algorithmic manner, a high degree of flexibility is still required in the placement and orientation of the ceils and cell maskings. Insofar as first generation VLSI layout tools lack this high degree of flexibility, there is an opportunity for developing more advanced module generators that fulfill this need.
The RSG was developed with this approach to regular circuit layout in mind. The input language used for the procedural specification of circuit architecture is a subset of Lisp. Consequently, abstraction mechanisms are available to support a highly functional set of primitives for defining regular structures and evaluating the complex conditionals required by personalization and edge effects. Personalization is further supported by the ability to arbitrarily place and orient cells according to interfaces defined-by-ezample in the graphical domain. All design information is efficiently partitioned into procedural and graphical form.
A circuit layout is generated from the following inputs ( Figure 1 ): a design file, which is a ,parameterized, procedural description of the architecture; a layout file, which is a graphical specification of cell layouts 'and interfaces; and a parameter file, which provides the size and functional specifications for the particular case. By completely decoupling the graphical and procedural domains, a level of modularity is obtained which achieves local efficiency in layout generation, and global efficiency in the management of new architectures, layouts, and interfaces to other CAD toois.
The RSG also supports macro abstraction, i.e. the specification of macrocells as interconnections of smaller ceils whose binding to actual layouts can be delayed to any desired time. In addition, interface inheritance relations provide a procedural means for defining interfaces between any two macrocells: a new interface between two macrocells can be computed from any legal interface between a subcell in 22nd Design Automation Conference Paper 2.3 16 the first macrocell and a subcell in the second. As a result, macroce[ls can be used to specify even more complex cells in an entirely procedural manner with no need for additional layout.
At this stage of the discussion, all of the RSG's functionality appears to exist in other layout generators, For instance, procedural specification of circuit layouts is as old as silicon compilation itself, and essentially defines it. The novelty of the RSG is not its use of procedural specification, but rather the level of abstraction at which it is used. Failure to choose an optimal level of abstraction complicates the user interface, and forces the designer to concentrate as much on the internal constraints of the generator as on the functionality of the circuit being designed. Examples of this are layout generators that require placement of cells by strict abutment, or that do not support true hierarchical macro abstraction.
The significant contribution of the RSG is ease of use through appropriate abstraction mechanisms. That is, the RSG does not produce any circuit layouts which, given unlimited effort, could not be produced by other layout generators. The result of these design ease principles, however, is a tool that performs well in practice, not just in principle, in a realistic VLSI design setting. The technical details of these features are discussed in Section 2 and illustrated in Section 3 with a pipelined array multiplier design.
TECHNICAL SPECIFICATION OF THE RSG
Interfaces and the Interface Table   The RSG uses previously defined cells to hierarchically build larger cells. A cell A consists of objects whose locations in the cell are defined in terms of an implicit coordinate system C,, with origin S,. The objects in A can be boxes of various layers, points, and instances of other cells. An instance of a cell J3 is the triplet (I,;, Ol;, (cell definitior where ~5: is the point of call of the cell B, 01 is the orientation in the call of B and (cell dejinition) is a pointer to the cell definition of B. The effect of having an instance of B in A with point of call Li and orientation 0; is that of performing the isometry' 0: on B (0; is an isometry that leaves Si, the origin of the coordinate system within B unchanged), placing the origin .S'l of B at location L; within the coordinate system of A, and finally adding to A the collection of objects in B. A key notion in the RYG is the interface. If instances of cells A and B (the cells A and B do not necessarily have to be distinct) are to be called within the same coordinate system then cells A and B have an interface between them. The interface between two cells A and B is the ordered pair (VabrOob) where V,b is the interface vector and O,, is the interface orientation. V*b is the vector whose starting point is the point of call of A and whose endpoint is the point of call of B, if the instance of A is held at orientation north (identity transform). O,b is the orientation that B would 'Animmt . * e ry w ather a rotation or a reflection.
have if the instance of A were held at orientation north. Treating the orientations as operators with "0" being the operator composition rule we have:
Znterfaces are a natural way of defining the relative placement and orientation between instances of cells. Hence knowing the calling information of a cell A in a cell C and knowing the interface between A and B it is possible to determine the calling information of B in C. The RSG allows the user to specify the primitive cells and interfaces between them graphically, by providing a layout jile which will henceforth be referred to as the sample layout. The sample layout contains the definitions of all primitive cells as well as interfaces between them. An interface between cells A and B can be defined by calling A and B together in a higher order cell C with the appropriate relative placement and orientation between them.
By virtue of the design-by-ezample feature of the RSG, the relative placement of neighboring cells in the final layout is such that each interface in the final layout is an instance of an interface in the sample layout. Since the relative placement of cells in the final layout is performed using inlerfaces between cells and not by using the sizes and shapes of the bounding boxes of those cells, the cells can be designed according to their functional boundary constraints and without regard to abutment constraints. Not only does this make cells easier to design and design rule check, the fact that cells are not cut at artificial boundaries helps reduce the proliferation of cells of essentially the same functionality but different abutment constraints. Using interfaces also allows cells to be easily encoded by superimposing several cells in order to modify the functionality of a basic cell. This too helps in reducing the proliferation of different cell types since the number of different encoding configurations is roughly exponential in the number of independent encoding decisions.
Cell encoding can also simplify the personalization process since instead of combining all the encoding decisions together to select a single cell of the appropriate type we can use each independent encoding decision to perform a simple encoding masking of one basic cell. An encoding cell may lie well within the bounding box of the cell it encodes and hence placement by abutment would be cumbersome since it would cause a proliferation of (spacing) cells that have nothing to do with functionality. By simply specifying an interface the relative orientation of the cells as well as whether the cells are side by side, one on top of the other, or one inside the other, is handled automatically.
The RSG program maintains an interlace table of all legal (user specified) interjaces between cells. This table is &t initialized with interfaces from the sample layout and can be augmented as new cells are created by the system.
Paper 2.3
Since there can be several different legal interfaces between two cells there can be a family of legal interfaces between two cells A and B. Figure 2 shows two different possible interfaces for a pair of cells A,B. If the set of legal interfaces between any two cells is indexed over the integers then the interface table ca,n be described as a mapping from triplets:
to interfaces:
Interface Inheritance Relations
In order for any cell to be used in the RSG it must have an interface with some other cell, otherwise there is no way to place it. When new cells are built up hierarchically by the system, in order to take full advantage of cell hierarchy, interfaces for new cells can be specified in terms of existing ones. In this way cells built up by the system can be used to build even larger cells in exactly the s<ame fashion as were the primitive cell:; of the sample layout.
If A (respectively B) is a subcell of a new cell C (respectively D), it is then possible to define a new interface Icd between C and D in terms of an existing interface Z,, between A and B. Icd is the interface that C and D would inherit if the subcells A and R within C and D were placed and oriented with interface lab (see Figure 3) . If (LL, OL), (respectively (Li, 0:)) is the calling information of A (respectively B) iin C (respectively D) and (V,b,Oab) (respectively (Vcd, O,d) ) is th e interface vector and interface orientation of lab (respectively Zcd) then :
vc,, = .L; -o&L; f o;&,.
The Algorithm
The RSG algorithm (see Figure 4) consists of first reading in the sample layout in order to define the primitive cells and build up the initial interface table.
New cells are then created in a two step sub-algorithm. The first step in the sub-algorithm consists of bnilding a connectivity graph for the new cell. The connectivity graph for the new cell is a graph whose vertices represent partial instances whose cell type is known but whose location aud orientation are as yet unspecified.
The edges between vertices represent interfaces between instances and the weights assigned to them are the interface index numbers. The connectivity graph need only be a spanning tree since cycles in the graph contain redundant information. For a given sample layout, each connectivity graph gives rise to a unique layout (see Figure 5) .
The second step consists of converting the connectivity graph into a layout. This is done by first selecting a root node in the graph and arbitrarily placing and orienting the corresponding instance. The graph ie, then traversed, and each of the nodes in the graph (which initially are all partial instances) gets expanded into a com;Dlete instance with a location and an orientation. The location and orientation Lb and Ob of a partial instance B can be computed from the location and orientation L, and 0, of one of its already traversed neighboring nodes A using ,the formula,
where (V&, O&,) is the interface between A and u. Finally once a new cell is created, if it is to be used in a larger cell, it is necessary to define new interfaces between it and the already existing cells. This (augmented) two step process of first determining connectivity and then using the comiectivity information along with cell definition and cell interface information to build a layout, provides a clean separation between the graphical and procedural information. The procedural information in the design file is used to build the connectivity graph and remains constant over diffe::ent implementations of the design as given by the sample layout. The graphical information from the sample layout is used to transform the connectivity graph into a physica!! layout of a particular implementation of the design. Cell spacing parameters which relate to the graphical information are never accessed or manipulated in the design file. This delayed binding on the location and orientation of instances allows for clean macro abstraction in the design file. Since in the design file, partial instances are connected together without assigning actual locations and orientati,Dns to them, it is possible to build subgraphs without prior knowledge of where and with which orientation the instances in the subgraph will be used. It is easier and cleaner to write and compose macros for sub-graphs, because the state of a calling macro does not encumber the called macro by imposing a starting location and a starting orientation at which to start assembling the subcells. Macro abstraction suppresses details of how and where a macro for generating a subgraph gets called and allows the designer to concentrate only on the connectivity of the subgraph.
The Language
In order to make use of this framework we must be able to build large and complex connectivity graphs easily and efficiently. It is therefore imperative that the lan'guage for specifying design files supports good abstraction and powerful decision making. The design j&: interpreter has been embedded inside a Lisp interpreter so that the full power of a structured programming language is available to the designer. The interpreter provides a variant of the Lisp Programming Language (a subset o:f it) with a few special primitives for building and manipulating connectivity Paper 2.3 graphs as well as for converting connectivity graphs into layouts. Primitives for manipulating encoding tables (such as PLA truth tables) have also been added.
Implementation
The RSG program is written in CLU (11 and consists of approximately GO00 lines of source code. The execution time is divided into roughly three equal parts: reading in the source file and building up the initial interface table, parsing and executing the design and parameter file, and writing the output file. Execution is fast because the operations to be performed are simple (building graphs and traversing them) and the system does not have to keep track of much state information. For example, a 32 x 32 Baugh-Wooley multiplier as discussed in Section 3 is generated in 5 seconds on a DEC-2060. Two layout file formats (CIF [2] and DEF [3] ) are supported.
A special compactor which will compact the cells in the sample layout while taking into account how they can potentially interface together is being investigated. The compactor will make the RSG technology transportable since cells designed in one technology can be transformed into cells for another technology. The compactor can also be used for device and bus sizing.
Limitations
The two step process as described in the algorithm section provides a high level of separation between the graphical and procedural part of the layout process. Since geometrical parameters are not accessed in the design file, however, decisions based on the size and shape of the final layout such as placement and routing are difficult to make.
Example: Pipelined Array Multipliers
A pipelined array multiplier provides a good illustration of the RSG's ability to generate layouts for the kind of nontrivial regular structures that typically arise in practice. Figure 6 shows a purely combinational 6x6 signed two's complement multiplier based on the Baugh-Wooley algorithm I?]. Th e multiplier consists of an array of two types of carry-save adders that reduces the product to the sum of two words, which are then added in a final row of cells connected as a carry-propagate adder. (The two diagonal connections have been condensed to one for clarity). Each cell type contains an AND gate and a full adder: cell type I adds the bit-product o;5i to its sum and CXQ' inputs; and cell type II adds a;~; to its sum and carry inputs. The carry-propagate adder consists of type I cells which are drawn as polygons to distinguish them from the carry-save cells.
Using retiming transformations [5] , the multiplier can be pipelined to any degree in a manner that preserves the regularity of the inner array, but adds irregularity to the periphery of the array in the form of input and output register stacks. Figure 7 illustrates two pipelined versions of the multiplier. (An integer on a connection arc represents the number of registers on that connection). The first version (2a) is a bit-systolic multiplier that has at most one full adder combinational delay between any two registers, and represents the highest possible degree of pipelining given the choice of the full adder as the largest indivisible cell. The second version (2~) implements a lower degree of pipelining, allowing at most two combinational delays between any pair of registers. From a circuit perspective, the optimal degree of pipelining is application and technology dependent, so it is necessary to be able to automatically generate any degree of pipelining.
A pipelined multiplier of given size and level of pipelinlng can be constructed by personalizing an array of basic cells which has been sized according to the number of bits in the multiplier and multiplicand. Each cell in the array must be personalized with respect to each of the following options depicted in Figures 6 and 7:
Ceil type: Each cell must be programmed as either type I or type II to correctly implement the signed two's complement algorithm. Type II cells occur on the left and bottom edges of the carry-save array, except for the cell at the lower left corner. All remaining locations require cell type I.
Cell interface: To obtain nearly identical circuit topologies, cell types I and II use different active input levels. Furthermore, active output levels are affected by the amount of pipelining. Therefore, each cell interface is determined by the type of cells being connected and the number of registers on the connection.
Register assignment: The placement of registers on connections between cells depends on the degree of pipelining and the locations of the cells being connected, Clock assignment: Pipelined systems generally require several clocks which must be assigned to registers according to their location in the array. Clock assignment is further complicated by the need to employ such circuit techniques as precharging to reduce area and power requirements.
In addition to the internal array configuration, there ate "edge effects" to consider as well:
Peripheral registers: In order to properly skew the inputs and deskew the outputs, registers must be placed along the periphery as determined by the retiming transformations.
Input assignment: Ones and zeros must be assigned to the unused inputs along the top and left edges as prescribed by the Baugh-Wooley algorithm.
Cell masking is used extensively to convert an array personalization to actual layout. A basic cell is created which contains the layout features common to all cell personalities and which can accommodate the variations in Paper 2.3 layout necessary to implement all design options. Mask celIs are instantiated on the basic cell to activate particular options by adding objects to the various layers. Figure  8 illustrates this with a basic cell designed to specifically optimize the electrical performance of the bit-systolic multiplier of Figure 7s . This cell contains input inverters, full adder circuitry, and six output registers. In this example, the basic-cell is programmed to type I by the mask-cell typeI, its carry input inverter is programmed by mask-cell car2 to interface with a type II cell, and it is assigned the clock 41 by mask-cells Phil-l, Phil-l, Phil-3, and phil-4. The inner array of the multiplier is built up one cell at a time by first personalizing a copy of basic-cell, and then adding it to the array. Then the multiplier is completed by adding registers to the periphery of the array.
propriate abstraction mechanisms -macrocells, interfaces, and interface inheritance -in generating layouts for realistic regular structures.
Tangible proof of the principles argued in this paper is provided by the RSG, a layout generator which, as a result of employing these abstraction mechanisms, can generate the imricate regular circuit layouts that arise in meaningful applications. The RSG presents a convenient interface to the user by modularIy separating the graphical and procedural description of a circuit along a natural boundary, making it an extremely easy tool to utilize, extend, and upgrade. REFERENCES Figure 9 shows two sections of the design file written to generate a bit-systolic multiplier for any m-by-n case, and demonstrates the use of macro abstraction, delayed binding, and interface inheritance.
The mcell macro of Figure 9a executes the personalization of basic-cell as a function of array size and cell index, and is used to hierarchically build the macrocell innerarray (the inner array of the multiplier).
Delayed binding on the absolute location of each personalized cell great1.y simplifies the definition and use of mcell in the creation of larger macrocells like innerarray. The code in Figure 91~ constructs the complete multiplier from innerarray and three boundary macrocells, tregs, rregs, and bregs, which are constructed from a single register cell. The three boundary cells are connected to innerarray using interfaces that are inherited from an interface between the basic: cell and register cell. This example is cited to emphasize that macrocells can be manipulated with absolutely no need to enter the graphics domain and manually define interfaces or add spacing cells, as required by layout generators with restricted powers of abstraction.
[ 
1
The input layout Jle in Figure  10 demonstrates the ease and generality with which cell interfaces are specified in the RSG. One merely provides an example of the interface, and places a numerical label in the overlapping region, as for example, interface number 1 (the only interface) between basic-cell and typeI. The RSG then creates an interface oector and orientation from this graphical specification, and uses it to iirplement all instances of this interface that occur in the final circuit layout. The layout file provides a natural means for the user specification of cell layouts and interfaces and greatly reduces the amount of redundant information needed to characterize regular circuit layouts. This can be appreciated by comparing Figure  10 with the 6x6 systolic multiplier layout shown in Figure  11 . This layout also illustrates the amount of complexity that exists in practical regular structures, eveu though this design has been simplified by omitting the register masking option. 
