# A Configurable Radiation Tolerant Dual-Ported Static RAM macro, designed in a 0.25 µm CMOS technology for applications in the LHC environment.

K. Kloukinas, G. Magazzu, A. Marchioro

CERN EP division, 1211 Geneva 23, Switzerland Kostas.Kloukinas@cern.ch

# Abstract

A configurable dual-port SRAM macro-cell has been developed based on a commercial, 0.25 µm, 3 metal layer, CMOS technology. Well-established radiation tolerant layout techniques have been employed in order to achieve the total dose hardness levels required for the LHC experiments. The presented SRAM macro-cell can be used as a building block for on chip readout pipelines, data buffers and FIFOs. The design features synchronous operation with separate address and data busses for the read and write ports, thus allowing the execution of simultaneous read and write operations. The macro-cell is configurable in terms of word counts and bit organization. This means that tiling memory blocks into an array and surrounding it with the relevant peripheral blocks can construct a memory of arbitrary size. Circuit techniques used for achieving macro-cell scalability and low power consumption are presented. To prove the concept of the macro-cell scalability two demonstrator memory chips of different sizes were fabricated and tested. The experimental test results are being reported.

# I. INTRODUCTION

Several front-end ASICs for the LHC detectors are now implemented in a commercial 0.25  $\mu$ m CMOS technology using well established special layout techniques [1] [2] to guarantee robustness against total dose irradiation effects over the lifetime of the LHC experiment. In many cases these ASICs require the use of rather large memories in readout pipelines, readout buffers and FIFOs. The lack of SRAM blocks and the absence of design automation tools for generating customized SRAM blocks that employ the radiation tolerant layout rules are the primary motivating issues for the work presented in this article.

This paper presents a size-configurable architecture suitable for embedded SRAMs in radiation tolerant, quartermicron, ASIC designs. Physical layout data consist of a memory-cell array and abutted peripheral blocks: column address decoder, row address decoder, timing control logic, data I/O circuitry and power line elements to form power line rings. Each block is size configurable to meet the demand on word counts and data bits, respectively.

To minimize the macro-cell area a single port memory cell is used based on a conventional cross-coupled inverter scheme. Dual-port functionality is realized with internal data and address latches, placed closely to the memory I/O ports, and a time-sharing access mechanism. The design allows both read and write operations to be performed within one clock cycle. The scalability of the presented SRAM macro-cell is accomplished with the use of replica memory cells and self-timed control circuits.

The presented memory macro-cell has already been embedded in a number of detector front-end ASIC designs for the LHC experiment, with configurations ranging from 128words x 153bits to 64Kwords x 9bits.

#### II. CIRCUIT DESIGN

#### A. Memory Architecture

The internal architecture of the SRAM design is shown in Figure 1. It consists of an array of Static Memory Cells coupled with the necessary Write Drivers and Read Logic circuitry, a Row Decoder, a Column Decoder, a set of registers for the address and the data input ports, a latch for the data output port and a Timing Logic circuitry controlling the operation of the SRAM macro-cell.

The cells in the memory array are accessed by the Row Decoder that selects the wordline and the Column Decoder that selects the appropriate bitlines. The Row Decoder has a fixed size of 128 wordlines and its decoding function is hardwire-configured with via-hole programming on a 7-bit complementary bus. The 7 most significant bits of the desired memory cell address are routed to the Row Decoder and the rest (n-7) address bits are routed to the Column Decoder.



Figure 1 : Block diagram of the Dual Port SRAM.

The SRAM macro-cell has a synchronous interface with registered address and data input ports and a latched data output port. Figure 2 shows the timing diagram of the SRAM macro-cell interface. All inputs are clocked into the memory on the positive edge of the clock (Clk) signal. Both read and write operations are handled in one clock cycle. The same memory cell can be accessed for reading and writing in one clock cycle. In Figure 2, clock cycle #1 represents a write operation, cycle #2 a read and cycle #4 a read/write operation. The contents of the addressed memory location will be placed on the data output port and subsequently will be replaced with the new data present on the data input port.



Figure 2 : Timing diagram of the Dual Port SRAM Interface.

## B. Memory Cell Design

The memory cell is based on a conventional cross-coupled inverters design. The cell uses PMOS pass transistors as access devices, since they are smaller than enclosed NMOS transistors. In addition special care was applied in the design of NMOS transistors, to make them even smaller than what a conventional enclosed NMOS transistor requires. The resulting size of the memory cell is only  $5.60\mu m \times 8.42\mu m$ . The layout of the memory cell is shown in Figure 3. The memory cell design was first introduced in [3]. In that design the SRAM cell was implemented as a true dual-port cell using 8 transistors. By eliminating the 2 access transistors of the second port we gain in area a factor of 18% without compromising functionality or performance.



Figure 3 : Layout of the SRAM memory cell using radiation tolerant layout techniques.

#### C. Dual Port Functionality

The dual-port functionality is realized by employing a time sharing access of the memory array. The read and write

address busses are multiplexed into a single internal address bus and the read and write operations are executed in a serial manner, with the read followed by the write. Both read and write operations are executed within the same clock cycle. As a result the internal operating frequency of the memory array is twice the frequency of the SRAM macro-cell interface. Similar techniques in dual-port SRAMs using single-port memory cells have been reported in the bibliography [4].

## D. Replica Techniques

The scalability of the presented SRAM macro-cell is accomplished with the use of replica rows of memory cells and bit-lines that create reference signals whose delays tracks that of the word-lines and bit-lines [5]. An extra row containing replica memory cells (Dummy Wordline) is placed in the memory array to track the wordline charge-discharge delay that varies with the size of the macro-cell. It may also vary with temperature and process variations. An extra column containing replica memory cells (Dummy Bitlines) is placed to mimic the bitlines delay. These "reference signals" track the delay of the actual wordline and bitlines accurately, making the memory robust while preserving performance. The timing control of the memory operations is handled by an asynchronous self-timed logic (Timing Logic) that adjusts the timing of the operations to the delays of the reference signals.



Figure 4 : Design of the Replica Word-Line and Bit-Lines.

Figure 4 shows the design of the replica wordline and bitlines as well as the design of the Row Decoder and the bitline peripheral circuitry of the SRAM macro-cell. The Row Decoder is a dynamic NAND-type structure. The dynamic structure was chosen due to its speed, area and power advantages over the conventional static NAND gate. To mitigate any possible problems caused by a low frequency system clock and the dynamic nature of the Row Decoder we have introduced a latch circuit on its output.

The bitline peripheral circuitry includes the bitline precharge transistors, the write buffers and the read logic. The bitlines are precharged to GND since in the memory cells PMOS are stronger than the NMOS transistors. The memory cells are written in the conventional way using the true and the inverted bitlines, but are read using a single bitline through an asymmetric inverter. By substituting the sense amplifier with a simple inverter we gain in power consumption and have a stable operation even at low power supply voltage.

## E. Divided Word Line Architecture

To minimize further the power consumption, we have adopted a wordline selection scheme called Divided Wordline Decoding (DWL) [6]. The wordline has a hierarchical structure of two levels, namely the Global and the Local wordlines as shown in Figure 5. Part of the address is decoded to activate the Global Wordline and the remaining address bits are decoded by the Block Pre-Decoder and activate the vertical block select lines. The Local Wordline Buffers receive the select signals from the Block Pre-decoder logic and drive the selected Local Wordline in a selected block for read and write operations. The non accessed portions of the memory remain in the precharge state, thus reducing the power consumption. Wordline selection time is also improved since the RC delay in each divided wordline is small due to the short length



Figure 5 : The Divided Word Line (DWL) architecture in the SRAM.

#### F. Operation and Timing

Although the SRAM macro-cell is externally synchronous for the user application, the internal timing of the various control signals is completely asynchronous and self-timed. The Timing Logic uses a combination of hand-shaking and transition detection to realize internal timing loops that are initiated by the edges of the system clock and terminate upon completion of the operation. Figure 6 shows the timing loops for a read followed by a write operation. The read operation is performed during the high period of the clock (clk) and the write during the low. The operation is completely static. All control signals are forced back to their initial state so that the wordline and bitlines are immediately precharged to prepare for the subsequent tasks.

A read is performed by starting with both *BL* and *BL* low and the Row Decoder in the precharge state. Asserting the  $\overline{WL_{pc}}$  signal sets the Row Decoder to the evaluate state and the desired Global Wordline is selected. With the combination of the appropriate Block Select signal the desired Local Wordline is selected. At this time the data in the cell will pull either the *BL* or the *BL* line high. The condition is detected by the NOR gate and the BL0 signal goes low signifying that the data can be safely latched and that the Row Decoder can return to the precharged state. At this time the reference signals *WLdummy* and *LWLdummy* are subsequently deasserted signalling that the bitlines can safely return to the precharge state.



Figure 6 : Timing diagram for a Read and a Write operation.

To write into the cell, data are placed on the *BL* and *BL* by the Write Logic. Then the wordline is activated. This will force the memory cell to flip into the state represented on the bitlines. The cycle ends with the wordlines and bitlines returning to the precharged state.

A large part of the operating power consumption of a static memory is due to the charging and discharging of the column and bit-line loads. To reduce the wasted power during standby periods the Timing Logic does not initiate bitline and wordline precharge cycles if there is no access to the memory.

# III. MACRO-CELL IMPLEMENTATIONS

#### A. Cell Library

The design approach was taken so as to allow the use of as many common circuits and circuit blocks as possible in all supported configurations. The physical layout library consists of a Memory-Cell Array, Write Drivers and Read Logic blocks, a 129 line Row Decoder, a 129 line Wordline Buffer, a Data In register, a Data Out latch, an Address Register/Multiplexer and a Timing Logic. All the blocks in the library apart from the Timing Logic, the Row Decoder and Wordline Buffer are size configurable to meet the demand on word counts and data bits. Each block can be configured by abutting a certain number of times a leaf library cell.

The SRAM macro-cell uses 3 metal layers. The  $1^{st}$  metal layer is used for local interconnections in the leaf cells. The bit-lines and power lines run vertically on the  $2^{nd}$  level metal, and the global and local word-lines are run horizontally on the  $3^{rd}$  level metal leyer.

#### B. Macro-Cell Assembly

Figure 7 shows the floorplan of the common building blocks and typical configurations [(a) 128words X 36bit, (b) 512words X 18bits and (c) 4Kwords X 9bits] of the SRAM macro-cell. The Memory-Cell Array consists of Memory-Cell Blocks. Each Memory-Cell Block is divided into Memory-Cell Columns each of which comprises 128 words. A Memory-Cell Block can have from one up to four Memory-Cell Columns that share a common Wordline Buffer Block. The smallest memory size that can be composed is 128 words X 9 bits, making use of a single Memory-Cell Column. For bigger memories, the Memory-Cell Array can be split in two halves and placed on both sides of the Row Decoder Block. All the peripheral circuits (Timing Logic, Address and Data Input Registers, Data Output Latch) are placed on the right side of the Memory-Cell Array.



Figure 7 : Floorplan of common building blocks and typical configurations of the SRAM macro-cell.

## IV. FABRICATION AND EXPERIMENTAL RESULTS

## A. SRAM Chip Fabrication

To prove the concept of the SRAM macro-cell scalability and to evaluate the performance of the proposed circuitry two test chips were designed and fabricated using a commercial 0.25µm, 3 metal layer, CMOS technology.

Figure 8 shows a microphotograph of the fabricated 1K X 9bit SRAM macro-cell. The size of the macro-cell is 560µm X 1,300µm and contains two Memory-Cell Blocks and peripheral circuitry. Each block is configured as 512 X 9bit and is composed of four 128 X 9bit Memory-Cell Columns. Figure 9 shows a microphotograph of the fabricated

4K X 9bit SRAM macro-cell. The size of the macro-cell is 1,850µm X 1,300µm and contains eight 512 X 9bit Memory-Cell Blocks, each composed of four 128 X 9bit Memory-Cell Columns.



Figure 8 : Microphotograph of a 1K x 9bit SRAM test chip.



Figure 9 : Microphotograph of a 4K x 9bit SRAM test chip.

## **B.** Experimental Results

Both test chips were tested and found functional up to the frequency of 70MHz for simultaneous Read/Write operations. The typical read access time ( $t_{acc}$ ) of the 1Kword X 9bits was found 4.5 nsec and that of the 4Kword X 9bits 7.5 nsec. A Schmoo plot of the 4Kwords X 9bits SRAM access time versus the power supply voltage for a checkerboard test pattern is shown in Figure 10. A stable operation over a wide range of power supply voltage was achieved. The chip was operating at 50MHz without faults from 2.0 to 2.7 V, demonstrating design robustness.

The power dissipation measurements for different operating modes of the 4Kwords X 9bits test chip are shown in Figure 11. For the Read, Write and Read/Write operations a checkerboard test pattern has been used. In Idle mode there were no accesses to the memory while the system clock was running with the address and data input ports changing in every clock cycle. In Standby mode the system clock was running but the input ports were kept at a fixed state. Table 1 summarizes the measurement results. The power dissipation in the Idle mode is higher than in the Standby mode because of the activity in the registers of the address and data input ports. Idle operation is smaller than Read, Write and Read/Write operations in power dissipation, demonstrating the power reduction benefit of the static design of the Timing Logic.



Figure 10 : Schmoo plot of the access time versus the power supply voltage for the 4Kwords X 9bits SRAM macro-cell using a checkerboard test pattern at 50MHz.



Figure 11 : Power Dissipation measurements for the 4Kwords X 9bits SRAM test chip at 2.5Volts.

Table 1 : Power dissipation figures for the 4Kwords X 9bits SRAM test chip.

| Operation  | Power (µW/MHz) |
|------------|----------------|
| Standby    | 0.10           |
| Idle       | 1.90           |
| Read       | 7.40           |
| Write      | 10.60          |
| Read/Write | 14.05          |

Both test chips were tested for total ionising dose effects using an X-ray source. The chips were irradiated in three steps to 1 Mrad, 5 Mrad and 10 Mrad (SIO<sub>2</sub>), at a constant dose rate of 21.2 Krad/min. The chips were annealed for 24 hours at room temperature. They were kept under bias and in standby mode during the irradiation and the annealing period. A set of measurements were carried out after each step. The results showed that there was no measurable performance degradation in terms of maximum operating frequency and read access time and there was no increase in the circuit power dissipation.

# V. CONCLUSIONS

A radiation tolerant SRAM macro-cell has been developed and implemented in a commercially available 0.25  $\mu$ m, 3 metal layer, CMOS technology. The scalability of the

macro-cell has been demonstrated by means of two test chips, carrying two macro-cell configurations of different memory sizes, which were fabricated and tested successfully. The experimental results show that our proposed design can be used in embedded applications for detector front-end ASICs in the LHC experiment environment.

This work has been primarily initiated for the needs of the "Kchip" ASIC for the CMS ECAL Preshower detector. The modularity and scalability of the macro-cell allowed the generation of SRAM configurations suited for a number of other ASICs used in the LHC experiment detectors. A list of ASICs that are currently making use of the presented macro-cell is shown in Table 2.

| Chip Name | LHC Detector        | Laboratory     | Config.   |
|-----------|---------------------|----------------|-----------|
| SCAC      | ATLAS tracker       | Columbia Univ. | 128 X 18  |
| DTMROC    | ATLAS TRT           | CERN           | 128 X 153 |
| MCC       | ATLAS pixel         | INFN Genova    | 128 X 27  |
| AMBRA     | ALICE Silicon Drift | INFN Torino    | 16K X 9   |
| CARLOS    | ALICE Inner Tracker | INFN Bologna   | 256 X 9   |
| Kchip     | CMS Preshower       | CERN           | 1K X 18,  |
|           |                     |                | 128 X 18  |
| SYNC      | LHCb Muon system    | INFN Cagliary  | 256 X 9   |

Table 2 : Application of the SRAM macro-cell in various ASICs for the front-end electronics in the LHC experiments.

## VI. REFERENCES

[1] "Development of a Radiation Tolerant 2.0V standard cell library using a commercial deep submicron CMOS technology for the LHC experiments", K. Kloukinas, F. Faccio, A. Marchioro, P. Moreira, Proceedings of the 4th Workshop on Electronics for the LHC Experiments, Rome 21-25 Sept., 1998.

[2] "Radiation Tolerant VLSI circuits in standard deep submicron CMOS technologies for the LHC experiments" Practical Design Aspects", G. Anelli et al., IEEE Transactions on Nuclear Science, Vol. 46, No. 6, pt1, pp 1690-1696, Dec. 1999.

[3] "SEU effects in registers and in a Dual-Ported Static RAM designed in a 0.25  $\mu$ m CMOS technology for applications in the LHC", F. Faccio et al., Proceedings of the 4th Workshop on Electronics for the LHC Experiments, Rome, Sept. 1998.

[4] "Pipelined, Time-Sharing Access Technique for an Integrated Multiport Memory", Ken-ichi Endo, Tsuneo Matsumura, IEEE Journal of Solid State Circuits, pp 549-554, Vol. 26, No.4, April 1991.

[5] "A Replica Technique for Wordline and Sense Control in Low Power SRAM's", B. S. Amrutur and A. Horowitz, IEEE Journal of Solid State Circuits, pp 1208-1219, Vol. 33, No. 8, August 1998.

[6] "A divided word-line structure in the static RAM and its applications to a 64K full CMOS RAM", M. Yoshimoto et al., IEEE Journal of Solid State Circuits, pp 479-485, Vol. SC-18, October 1983.