Abstract-A fast, 128-b implementation of both SRAM and LPROM with integrated periphery in a thin-film amorphous indium-gallium-zinc oxide technology is reported. The SRAM block can be read in 265 µs/byte and written in 110 µs/byte, consumes 12.3 mW, and has an area of 11.9 mm 2 . Furthermore, after power down, an SRAM memory state retention time of 83 s is shown. The LPROM can be read in 40 µs/b, consumes 4.50 mW, and has an area of 3.75 mm 2 . The SRAM enables fast volatile RAM memory for thin-film microprocessors, while the LPROM can be used to store the identification code for state-of-the-art thin-film RFID tags.
I. INTRODUCTION
A FTER personal computing, the laptop and the smartphone, the Internet of Things is the next big wave in electronics. Billions of everyday objects will become smart and need to be connected to the Internet, paving the way for additional functionality [1] . However, to enable this, these devices need to have a sufficiently low price. These circuits might have sensors, actuators, and communication to enable their revolutionary role. Using thin-film technology is a very promising route to fabricate high volume, low cost, low functionality circuits to integrate in these smart objects. Thinfilm electronics can be manufactured on large areas, and is used today to create flat panel displays and sensitive plates for digital X-ray machines [2] , [3] . Within the field on the Internet of Things, the first major achievements with thin-film technologies have been shown, demonstrating direct smartphone readout of a flexible near field communication (NFC) tag [4] . This circuit shows that thin-film electronics can provide sufficient speed for a range of low-end applications, at a price point, which is significantly lower than what is possible using bulk silicon. In the future, these NFC tags will be equipped with sensors to monitor the environment, e.g., as a temperature patch for health monitoring. The data from the sensors need to be processed and stored. To reduce the required transmission power, they need to first be compressed and stored locally, before they are transmitted in batch using the NFC protocol. This requires a local memory. In 2012, Myny et al. [5] have shown a thin-film microprocessor, which was used for implementing a digital filter. Still missing are the memory elements to store both the processor instructions and the data with high performance and low area footprint. It requires a non-volatile memory for storing the program instruction or identification code, and a random access memory as a working memory for the processor.
II. STATE OF THE ART This paper shows the design and challenges of thin-film memories on foil. Although there is only little literature available, there are a few works that have already shown some first steps in this direction. With respect to non-volatile memories, Yang et al. [6] showed a one-time programmable ROM array using anti-fuse capacitors in 2013. This work showed only 16 b, with a footprint of 4.41 mm 2 /b. Neither the operating speed nor the peripherals have been reported. In 2014, Myny et al. [7] showed a print programmable read only memory (P 2 ROM), but it is slow compared with the processor (500 Hz versus 2 kHz) and has a high area footprint. With respect to random access memories, only single memory cells have been shown, usually as a technology demonstrator for the thinfilm technology. Previous digital systems [5] , [7] , [8] used registers to store single bytes, but no random access memory block was used. In 2011, Fukuda et al. [9] showed an organic SRAM cell, but with a cell area of 21 mm 2 , and a write speed of 1500 µs. In 2015, Geier et al. [10] showed a single SRAM cell with very low power consumption, implemented in a complementary CNT technology, which is not yet a proven technology in production. In 2016, finally, Avila-Niño et al. showed a matrix of 4 × 4 cells in an organic technology, but did not integrate periphery into the design [11] .
This paper shows a non-volatile memory and a fast accessible random access memory with high performance, low area consumption and low power. Area remains one of the most critical parameters for thin-film circuits to outperform silicon in terms of cost. Thin-film technology is inexpensive per unit area compared with traditional silicon, but this advantage can be offset by the increase in area. The smallest SRAM cell in 0018-9200 © 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. the thin-film technology shown today is 0.6 mm 2 as presented by Geier et al. [10] , compared with 0.027 µm 2 for the most recent 7-nm CMOS technology [12] . For most envisioned applications, at least 1-kb memory is needed. Assuming a cost of around 265 $/m 2 [13] and a cost target of below 0.5 c$, the area should be reduced by at least a factor 26.5, down to 0.023 mm 2 /cell. In terms of ROM, the smallest cell shown is the P 2 ROM, using 0.09 mm 2 . This means that the memory cells alone for a 1-kb memory would use 9 mm 2 , more than doubling the size of a recent thin-film flexible NFC tag [14] . The second most critical parameter of these memories is speed. If these memories are combined with integrated microprocessors, the memory should not become the bottleneck. Previous works have reported speeds of 650 Hz and below or have given no data at all. The thin-film microprocessor shown by Myny et al. [5] in 2012 had a clock period of 2.1 kHz or around 190 inverter stage delays. For the technology used in this paper, the estimated operating frequency would be 1095 Hz, and this should be the aim for the memory design [5] . In NFC tags corresponding to ISO15673, data rates of up to 6.62 kb/s (fc/2048) or 26.48 kb/s (fc/512) are required, imposing another spec on the speed.
III. TECHNOLOGY
This paper uses a thin-film amorphous indium-galliumzinc-oxide (a-IGZO) technology on flexible polyimide foil. The device stack is an etch stop layer (ESL) structure with critical dimension (CD) 5 µm for SRAM or 3 µm for LPROM, and a minimal channel length of 3 × CD, as shown in Fig. 1 . The device is comparable to the devices published by Chiang et al. [15] and Tripathi et al. [16] . The thin-film technology has a few key parameters, which significantly impact design. Due to its unipolar nature, standard CMOS design techniques cannot be used. According to Myny et al. [19] , the best topology for the thin-film design is a diode load topology with backgate [18] , but the noise margin remains limited, as shown by the transfer curves in Fig. 2 . Furthermore, mobility is limited to around 10 cm 2 /Vs, a factor 50-100 lower than bulk Si. Due to long channel lengths and big overlap capacitors, the inverter stage delay for a 5-µm technology is on average 4.78 µs, as displayed in Fig. 3 . Variability is also significant, with a standard deviation of 1.60 µs for the inverter stage delay. These technological constraints will guide the further design. At this point, the gate oxide thickness is 200 nm, in accordance with typical display processes [15] , [16] . This requires a supply voltage of 15 V, which can, in an IoT context realistically, be generated using a short distance communication link with the right antenna design. However, for circuits, our technology provider has a roadmap for scaling, reducing the oxide thickness as well as the CD. This evolution can be seen in the work of Myny et al. [15] , [19] between 2015 and 2016.
IV. SRAM DESIGN

A. SRAM Cell Design
The single SRAM cell is shown in Fig. 4 . There is no good pull-up available in a-IGZO technology as only n-type SRAM cell schematic. The pull-ups are long to reduce power consumption and achieve the required pull-up pull-down ratio. devices can be used. This introduces new tradeoffs compared with CMOS. The power consumption is dominated by the permanent ON-current through one of the pull-ups. Additionally, a ratioed diode load inverter topology is used, requiring at least a 1:10 ratio for static stability, as shown in Fig. 2 . Making the load smaller (larger L) would introduce lower power consumption but higher area use. Writability is not a design challenge, because the ratioed logic automatically requires a weak pull-up and therefore easy writability. Fig. 4 details the optimal sizing selected as a tradeoff between speed, power consumption, and area for the 5-µm design rule, and Fig. 5 shows the layout. The measured static transfer curves for this SRAM cell are shown in Fig. 6 , as well as the dynamic read-write behavior, which is shown in Fig. 7 . The cell achieves a noise margin of 2.37 V in read and 2.57 V in hold at 15-V supply. Furthermore, the measured speed of discharge is 0.361 V/ms on a bitline with 1.08-nF parasitic load due to the measurement setup, which translates to a 0.39-µA discharge current. Finally, the measured static power consumption per cell is 54 µW in hold. The 128 cells together therefore consume 6.91 mW, a significant portion of the total power consumption, as discussed in Table I .
Thanks to the thin-film a-IGZO technology's property of very low leakage [19] , the cells will still keep their value on the internal node for a certain time after the supply is turned OFF, and can be regenerated when the supply is applied again. This effect depends on the Ids current at Vgs = 0, and is therefore dependent on the exact properties of the technology. Retention times between 10 and 83 s have been Fig. 8 . Architecture of the SRAM matrix. The hexagons represent externally applied signals, and the parallelogram represents the data measured by the system. The square blocks all represent structures integrated on the foil. measured in this technology, allowing for a significant reduction in power consumption when the memory is not used. The retention is fundamentally not limited to this value, however, but dependent on the threshold voltage (V T ). V T can be influenced by many technological parameters, like gate oxide thickness, semiconductor thickness, and so on. Increasing V T will have a positive effect on retention, however, at the cost of speed. Fig. 8 shows the architecture of the 128-b SRAM block. All components (squares) are integrated on the foil, and all timing and input signals (hexagons) are applied externally. Components include the 16 × 8 matrix core with SRAM cells, a 4-to-16 decoder, a precharge module, a bitline driver for write operations, and a sense amplifier (SA) for cell readout.
B. SRAM Periphery
The 4-to-16 decoder is shown in Fig. 9 and is implemented with a standard single logic layer. NOR gates with high Fig. 9 . Decoder is integrated in the thin-film technology. A NOR gate with fan-in of 6 is used. The simulation shows that having a decoder of up to 6 b implemented in a single layer is advantageous in this technology, being both faster and more power-efficient.
fan-in come with less penalty for unipolar diode load logic than for complementary logic, as no devices are put in series, comparable to dynamic circuits in CMOS. In this technology, a decoder with up to six inputs is faster than the two logic layer equivalent and it consumes less power, as shown in Fig. 9 . Bigger arrays therefore benefit using the two-layer implementation. The data on the bitlines can be detected in two ways. Either a regular inverter (see Fig. 2 ) is used or a more complex SA can be selected. Both inverter and complex SA have been implemented. Fig. 10 shows the classical SA, with replacement of the p-type pull-ups by n-type pull-ups as is required using n-type only technology. Due to the substantial V T variability in this technology, a large bitline swing is required to overcome the offset in the input pair of the SA, thereby limiting the usefulness of the SA. In technologies with lower V T variability (e.g., manufacturing lines versus laboratory environment), this disadvantage is reduced. Fig. 11 shows the measurement of one column of a 128-b, SRAM matrix with SA. First, data are written in every cell. Then all the data are read out correctly. Fig. 12 shows the most optimal timing signals. The SRAM can be read at a rate of 265 µs/byte and written at 110 µs/byte without introducing bit errors, as shown in the timing diagram in Fig. 12 . If an SA is used, this slows down to 280 µs/byte read rate, while the write speed stays 110 µs/byte write rate. The performance decrease in this case is due to the extra stage delay introduced by the SA, which is relatively high compared with the gain thanks to the incomplete discharge of the bitline. Simulations show that for bitlines of 32 cells and more, using the described SA will significantly improve the performance of the system. The access speed of the memory is in the order of the present state-of-the-art thin-film microprocessor, showing clear promise for the memory. Fig. 13 shows the schmoo plot for this device, and Table I shows the power consumption. 11 . Timing diagrams of the SRAM timing for full matrix readout: write diagram and read diagram. Phase 1 (P1) of the write cycle decodes the address, phase 2 enables the wordline, and phase 3 is the address hold time. In phase 1 of the read cycle, the bitlines (BL and nBL) are precharged, followed by settling time in phase 2. Phase 3 activates the cell by enabling the wordline, and after sufficient discharge on the bitlines, the SA can be enabled in phase 4. If an inverter is used as SA, phase 4 becomes obsolete.
V. LPROM DESIGN
The architecture of the LPROM is similar to the architecture of the SRAM omitting the write circuitry and is shown in Fig. 14 . Furthermore, traditionally a programmable fixed pull-up is used in every line [7] . Since the main application for this memory today is coding flexible RFID tags, the output needs to be serialized so that it can be transmitted directly. Therefore an 8-to-1 multiplexer is added. For full readout, a 7-b digital counter can be connected as an address generator. The memory cells can be programmed using selective laser ablation, as exemplified in Fig. 15 . Similar to programming using ink-jet printing [7] , selective laser ablation enables memory programming as a post-process, rather than at design level like in mask ROM. This allows us to make foils with different versions of software for a microprocessor or unique Fig. 16 . Design of laser programmable ROM. This paper compares between traditional LPROM as in [7] and the LPROM with precharge. Adding precharge increases cell density and speed and decreases power consumption.
IDs for NFC tags. Resolution and throughput are critical to manufacturability and depend on laser conditions and material stack, but are beyond the scope of this paper. In this paper, laser spots down to 10 µm × 10 µm are used, which is a significant reduction in area compared with the printing implementation of Myny et al. [7] . In this work, it is proposed to use a precharge on every line instead of a fixed pull-up, as shown in Fig. 16 . Contrary to CMOS, there is no pMOS available to do this. An nMOS device is used, resulting in a V T drop across the pull-up. The advantages are speed, area, and power consumption. The cell area can now be reduced, because it is no longer required to keep the 1:10 ratio. This implies a smaller load for the wordline driver, which in turn can be made both faster and smaller. The size improvement can be seen on the individual cell level, as the cell size decreases from 4674 to 1440 µm 2 , but for our chips, the size is the same, since the same periphery layout was used. The speed is increased by using a precharge phase by the implementation of a larger precharge transistor compared with the original pull-up transistor, as is shown by the measurement of the individual lines in Fig. 17 . Power is saved by eliminating a direct leakage path when both the pull-up and the pulldown are active together. The periphery blocks of the LPROM are comparable to the SRAM design. To sense the bitline, an inverter is used, but in a faster design, an SA (as demonstrated in the SRAM) can be used in the precharge LPROM implementation. The measurement results of two rows of the chip are shown in Fig. 18 . The timing signals are applied externally, as specified by Fig. 19 . For the 3-µm design rule, the maximum speed for the LPROM with precharge is 25 kHz, compared with 22.7 kHz for the LPROM without precharge transistors. Fig. 20 shows how the speed varies with supply TABLE II   COMPARISON TABLE FOR THIN-FILM SRAM DESIGNS   TABLE III  COMPARISON TABLE FOR voltage. Also the power consumption improves significantly, as shown in Table I 2 , a factor 21.7 improvement on the state of the art. It can be written in 110 µs/byte and read in 265 µs, which is at least a factor 4 more than the state of the art. The power consumption is around 98.4 µW/b with periphery included, or 12.6 mW in total. This is rather high compared with the total energy budget available in an NFC tag, and further research will need to focus on improving in this respect. The use of an SA is investigated, and it can be concluded that, though its effectiveness is likely smaller than for CMOS, it is still useful for larger arrays. With respect to LPROM, this paper shows the clear advantage of using a precharge on the lines rather than a fixed pull-up. The chip with pull-up shows a 10% increase in speed, a 47% decrease in power consumption, and a factor 3.25 decrease in area per memory unit. Compared with the state of the art, the proposed solution is 38 times faster, requiring only 40 µs to read a bit.
Energywise, the system uses 35.1 µW/b, leaving significant room for improvement. Improvements in power consumption could be made by reducing the gate oxide thickness. The 128-b Laser Programmable ROM chips occupy 3.75 mm 2 , but can be significantly improved by a dedicated layout for a system with precharge. The individual cells are only 1440 µm 2 , a factor 63 improvement to the current state of the art.
VII. CONCLUSION
In conclusion, the flexible thin-film SRAM and LPROM with integrated periphery exhibiting characteristics far beyond the state of the art are shown. A 128-b SRAM matrix with cells of only 0.028 mm 2 is read out at 265 µs/byte. Furthermore, a 128-b LPROM readout is demonstrated, which is read out in just 40 µs and 32.6 times smaller compared with previous implementations thanks to the use of a precharge mechanism.
Further improvements in technology will significantly improve the performance of these circuits, mainly thanks to thinner gate dielectrics and smaller CDs. From the circuit perspective, this design could still be improved by adding timing circuits and expanding to larger array sizes. This paper shows, for the first time, that both SRAM and LPROM with integrated periphery can be fabricated in thinfilm technology with low area and high speed, demonstrating the feasibility of integrating these components in an NFC tag.
