We present a ROM compiler programmable from via 1 to via n -2, where n is the number of metal layers. The layer on which the code via is landed can be selected by the user. With the coding being able to take place as close to the topmost metal as possible, the turnaround time for a revision is shortened. In this paper, we discuss the array assembly scheme and its impacts on the design considerations by the choice of strapping period.
Introduction
Recent developments of structured ASIC most center on mask programmability of logic circuits [2] , [4] , [10] , [13] , [15] . These addressing fast design derivations on a common platform allow frequent product feature changes and achieve reductions in both non-recurring engineering cost and time to market. The wafers used to fabricate the products are pre-processed, starting from bulk silicon up to a certain mask layer, and banked at the foundry. Unfortunately, embedded read-only memory (ROM) which has long been an integrated part of a system on chip capable of providing the same flexibility is often neglected in the literature.
As the applications target at shortening the turnaround time, it is quite natural to implement the ROM code by a top via or metal layer. The structured ASIC platform, following a similar approach, programs, for example, a single via layer [4] or multiple metal layers [10] , [13] , based on a standard CMOS process. Consequently, the code via or metal must be promoted to the level(s) where the programmable logic circuits can utilize immediately. However, previous ROM designs often assume the coding on a fixed layer(s), such as diffusion [14] , poly, contact [12] , metal-1, via-1, or their combinations [11] , [14] , which do not seem very helpful.
In this paper, we present a flexible via ROM compiler. A single via layer is used for coding. It can be assigned up to via n -2 with metal n being the topmost metal layer.
This allows the fabrication running all the way to metal n -2. The average cycle time for wafer manufacturing typically ranges from one day per mask layer to two. Thus, moving up the coding every higher via layer reduces the turnaround time by at most four days. The flexibility possessed by such a compiler is unattainable by other types of ROMs.
The compiler is first realized using a 0.18μm standard CMOS process with the maximum number of n = 6 metal layers. Table 1 lists all code-via options provided by this compiler, in terms of permissible assignment of the topmost metal layer. We have presented its sensing scheme in [6] and [7] . The focus of this paper is on array assembly and its impacts on the design considerations. V1  M2  M3  M3  M4  M5  M6   V2  M3  M4  M4  M5  M6   V3  M4  M5  M5  M6   V4  M5  M6  M6 The remainder of our presentation is organized as follows. Section 2 discusses the cell layout and its size derivation. Section 3 introduces the strapping period for the array assembly and deals with the ground bounce which has to be taken into account for reserving a proper sensing margin. Section 4 deals with the array efficiency resulting from the strapping period and also shows the features of the compiler. Section 5 contains our conclusion.
Cell Layout and Size Derivation
The resulting architecture is a NOR-type memory array. With word lines on metal n, bit lines on metal n -1, and codes on via n -2, all other metal and via layers below serve as landing pads to the bottom NMOS transistors. To maximize the portability between different processes, we have excluded the topmost via n -1 on purpose, since wide (thick) top metal and large top via are prerequisite for inductor implementations and for better power distributions [3] .
Cell layout
The ROM cell has been laid out without violating any design rule. Its area is 0.7 × 1.07 = 0.749μm 2 which does not change with the code being on via 1 or n -2, thanks to that the design rules related to them are the same. Fig. 1 depicts the cell layout where the dashed square indicates the stack of vias underneath the code via and the dotted rectangles indicates the corresponding metal islands. The contact and via failure rates have been shown to be dependent on the pitch: the sparse area tends to be more vulnerable than the dense area [8] , [9] . Since the failure rate increases more than twice as the pitch doubles, it seems advantageous to let the common source (connected to V SS ) have a similar via stack as the drain (connected to BL) and plural horizontal metal wires.
The regular and repetitive patterns of the memory array actually allow challenging nominal design rules. The cell might be drawn smaller, had a set of aggressive design rules been followed. The facts that their optimal combinations must be learned by trial and error in order to overcome the notorious optical proximity effects and that the memory array can vary wildly in size in a compiler generated instance dissuade such an action. The latter is somehow soothed by directly shrinking from 0.18μm to 0.16 μm by the foundry.
Cell size derivation
The cell size can be derived by the related design rules in a straightforward manner. Referring to Fig. 1 , the width of 0.7μm is the sum of contact width (0.22μm), diffusion spacing (0.28μm), and two times the diffusion enclosure over contact (0.2μm). The height of 1.07μm is the sum of one and a half of the contact width (0.33μm), diffusion enclosure over contact (0.1μm), gate poly width (0.18μm), two times the contact on diffusion to gate poly separation (0.32μm), and a half of the diffusion spacing (0.14μm). Table 2 summarizes the cell size derivation. It is apparent that the cell size is virtually determined by the design rules from the bulk up to the contact layer. This is verified at other technology nodes, indicating that the flexible via coding can apply equally well. However, care must be taken of that for processes below 0.13μm, via n -2 coding becomes unavailable because the word line of the topmost metal n can no longer be fit in the tight pitch of cell height. This is due to the fact that the design rules related to the topmost metal and via layers do not scale with the ever-shrinking technology. Fig. 2 shows the cell sizes derived for the technology nodes scaling from 0.35μm to 90nm. We note that the 0.16μm FlexiVia ROM compiler is 90% shrunk directly from the 0.18μm, which has been proven successful on silicon. 
Ground Bounce
The NMOS transistor thus obtained has channel width and length equal to 0.42μm and 0.18μm, respectively. Its saturation current is the maximum current sink for a ROM cell (0-cell) to discharge the relevant bit line which is pre- Technology Node (um) Cell Size (um ) set to V DD during read access [6] , [7] . The value is about 275μA at V DD = 1.8V. This accrues to a large current sink on V SS when there are many such cells turned on simultaneously by a shard word line.
The excessive current sink resulting in ground bounces creates timing and reliability problems which can lead to chip failure. This power integrity issue is addressed in the following two ways: through good V SS network planning and by accurately analyzing physical layout to detect possible problems. For the latter, random parametric variations in individual cells do not seem to be crucial in view of the summation of the current sink.
VSS rail and strap
As mentioned earlier, each ROM cell has multiple V SS connections, but they all run horizontally. Vertical wires need to be added to form a V SS mesh network so as to distribute the current sink to the neighboring horizontal wires. We shall refer the horizontal wires as rails and the vertical wires as straps. This is done by inserting an extra row of dummy cells every m rows of ROM cells. The number m is called the strapping period.
The row-strapping cell is a dummy because it does not play any role in data storage and is treated as overhead in silicon area for array assembly. In practice, however, the cell also serves for well pickup and word-line bypass. For this reason, its area cannot be smaller than that of a ROM cell. We have drawn it as 1.26 × 1.07 = 1.3482μm 2 where the V SS strap is with minimum width of 0.28μm. Note that the dummy cell is 1.8 times larger than the ROM cell.
When the V SS rails and straps run out of the array border, there will be a sufficient space to lay out wider metal layers which are replete with vias. Hence, we shall focus only on the V SS mesh network made of minimum-width metal in the array.
Clearly, the array efficiency increases with the strapping period. In the following, m = 8, 16, 32, 64, and 128 will be considered.
Ground bounce simulation
Figs. 3(a) and 3(b) show the sub-circuit models for via-4 and via-1 cells, respectively, where the contact/via resistances and diffusion/metal sheet resistances are labeled. Intuitively, the interconnect resistances that are connected in series can be combined to reduce the simulation time.
Circuit simulations on a 512 × 512 memory array have been done based on the worst case that the code vias are present in all cells. That is, the array is fully coded with 0s. We note that, in reality, the array can never be more than half coded with 0s as long as the inversion technique [1] is applied to reduce the active and standby power. This, however, does affect our derivation of the worst case. There are 256 × 512 source nodes (denoted as S in Fig.  3 ) to be monitored. Ground bounces V ij , where 0 i 255 and 0 j 511, are derived by sequentially activating all 512 word lines with pulse width of 2ns [7] and cycle time of 10ns. For each source node, the maximum value is recorded.
To display the simulation results, Fig. 4 illustrates two cases with via-1 coding of n = 3 and m = 64 and via-2 coding of n = 4 and m = 128. We see how the strapping acts to abate the ground bounce which is more effective along the row direction than along the column direction. Also, it is not surprising to find that the maximum ground bounce occurs near the middle of the array. Table 3 shows the maximum ground bounce V max = max{ V ij } of the entire array. The effectiveness to promote the coding layer to an upper level depends on the ratio of metal sheet resistance and via resistance. Consider, for example, that the metal sheet resistance is 80m / and the via resistance is 6.4 . Then, a length of 80 , which is about 32 cell widths for a V SS rail or 21 cell heights for a V SS strap, will be needed to direct the same amount of current than to go to an upper level. This may explain why the maximum ground bounce is improved only slightly when changing the coding from via 3 to via 4. 
Sensing margin
The maximum ground bounce V max deserves our attention, since it requires reserving the sensing margin by the same amount. The sensing margin is ideally equal to a half of the precharged level V DD , but is often made much smaller in order to gain speed performance [6] , [7] . For a typical sensing margin of 150mV in which 50mV is reserved to meet the minimum speed requirement [7] , the strapping period of m = 128 can easily fail via-2 coding and that of m = 64 via-1 coding. Because we intend to maintain the same instance footprint regardless of which via layer is to be coded by the user, to choose the two strapping periods necessitates the increase of sensing margin. This means to trade reliability with speed. 
Compiler Feature and Array Efficiency

Compiler feature
The FlexiVia ROM compiler implements two memory array partitions (single-and dual-bank) and three column multiplexing options (16, 32 , and 64). It selects a dualbank partition if the number of columns exceeds 512. Hence, the 512 × 512 memory array studied in Section 3 is actually the largest to be encountered in the compiler.
The word width ranges from 1 to 64 and the word depth from 512 to 64K. The increment in the word width is one and that in the word depth is eight times the column multiplexing option. The total number of compilable configurations is 6704, with the capacity ranging from 512 to 512K bits. Table 4 lists the compiler features. Note that the choice of strapping period is independent of these features. circuits such as dummy cells and tracking cells [5] in the memory, and address decoders, sense amplifiers, and data output buffers in the periphery. Our implementations have already embedded power/ground connections and need no extra rings to be added. It is obvious that the surrounding power/ground rings whose space is not shared by any of the circuits are detrimental to the array efficiency.
Intuitively, the array efficiency approaches m/(m + 1.8) as the capacity increases and the overhead becomes negligible. Recall that the area of the dummy cell is exactly 1.8 times that of the ROM cell. This gives an optimistic result which encourages the use of a large strapping period, considering that with m = 128, m/(m + 1.8) = 99%. Unfortunately, it is not true for the capacity range implemented by the compiler. Fig. 6 shows the array efficiencies of all configurations with the five strapping periods, where the data are sorted in an increasing order. We derive their values by the area equations used in the layout generator. It is worthy pointing out that the overhead circuits indeed occupy a significant portion of the area. Thus, the array efficiency cannot be high. For the five strapping periods considered in this paper, the highest array efficiencies are, respectively, 60%, 65%, 68%, 69%, and 70%. Nevertheless, it suffices to say that the strapping period of 8 can be too costly for a diminishing gain in the sensing margin. Our final decision is to choose the strapping period of 32. It is justified by the fact that the choice increases the instance area at most 2.3%, compared to that with the strapping period of 64, and reduces the instance area at most 4.2%, compared to that with the strapping period of 16.
Conclusion
Previous implementations of a via-ROM compiler often assume a fixed coding layer. In terms of silicon area, coding on the via layer can rarely be comparable to that on the diffusion layer. In a manner similar to deriving the via-ROM cell, we can easily obtain a diffusion-ROM cell with area of 0.7 × 0.8 = 0.56μm 2 which is 25% smaller using the same 0.18μm process technology. It would be difficult, if not impossible, to regain the loss by maximizing the array efficiency to such high percentage.
In this paper, we showed that a via-ROM compiler can be made very flexible by selecting the code layer based on the top metal assignment. The approach leads to the reduction of turnaround time, which is unparalleled by other types of compilers. Here, our effort has been to optimize the array assembly by the strapping period. We related the strapping period to ground bounce, sensing margin, and array efficiency. Therefore, the choice of strapping period is a result of tradeoff among reliability, speed, and area.
