Abstract-Integrating numerous cores of different types into an SOC, makes the test process a complex activity. The need for a standard test infrastructure has led to the development of IEEE Std 1500 which is a modular and scalable test interface, enables test and diagnosis of embedded cores and interconnect. Although this standard can drastically simplify test challenges of SOCs, its hardware architecture, usually called wrapper, may occupy a noticeable silicon area when used for memory cores. In this paper, we present a specialized wrapper for memory cores compatible with IEEE Std 1500 which support parallel and at-speed testing of all memory cores with reasonable area overhead. We specially focus on the design of Wrapper Boundary Register which is mostly responsible for this overhead. Simulation and synthesis results on a group of embedded memory cores confirm that the proposed wrapper has been effectively reduced the area overhead.
I. INTRODUCTION
Today, the fast innovation in VLSI technology enables design of complex systems on a single chip. Although system chips offer advantages such as higher performance, lower power consumption and smaller volume and weight, the recent methods of design like synchronous production of different cores and using reusable and heterogeneous IP cores, poses new test constraints to the test community [1] . On the other hand, since direct access to core ports is virtually impossible, the basic features of testability like controllability and observability are hard to achieve [1] , [2] .
All these issues have led to the development of IEEE Std 1500 for embedded core testing which defines a scalable architecture for independent and modular test development, and test application for embedded design blocks [3] . This standard handles two main requirements i.e. easy integration and interoperability on one hand, and flexibility and scalability on the other hand. The second feature makes the standard adjustable with any test method which depends on the core types.
One of the core types which occupy a significant part of the chip is memory cores. The high density of embedded memory cores makes them more prone to manufacturing defects than other types of on-chip circuitries. Since direct access to memory ports is virtually impossible and at-speed R. Niaraki Asli is with the Electrical Engineering Department, University of Guilan, Guilan, Iran (e-mail: niaraki_asli@yahoo.com) testing is difficult to achieve, BIST is a more practical method to test and diagnose embedded memories [4] . But a serial BIST like BIST based on IEEE Std 1500, has a long and unacceptable test and diagnosis time, because this standard can't implement a parallel test and manage multiple cores concurrently [5] . On the other hand, the area overhead of IEEE Std 1500 wrappers are high, which is mostly due to Wrapper Boundary Register. Since most embedded memory cores have wide data word, the number of wrapper boundary cells increases significantly which directly influence the area overhead of the wrapper.
Researchers have proposed several BIST schemes using IEEE Std 1500 serial links [6] - [10] . In [6] , a modular test wrapper for small wide memories has been introduced. The interfaces between the memories and BIST circuit are based on IEEE Std 1500. This scheme can implement at-speed test at low area overhead. In [7] - [9] , a serial test interface based on IEEE 1500 for small memories has been proposed. But the area overhead of the IEEE Std 1500 wrappers is high. On the other hand, testing multiple memories using a serial BIST based on IEEE std 1500 requires a long and intolerable test time. In [10] , another wrapper based on IEEE Std 1500 is introduced which is in complete accordance with the standard but has no modification and optimization for memory testing, so it can't support at-speed and parallel test.
In our previous work, we presented a structure for an IEEE Std 1500 wrapper optimized for Built-In-Self-Test of embedded memory cores. This structure is capable of implementing at-speed and parallel test of multiple memory cores concurrently. Although the area overhead improved compared to the existing schemes, its main drawback still lain in the significant amount of occupied silicon area. So in this paper, we mainly focus on the design of Wrapper Boundary Register which is responsible for high area overhead of IEEE 1500 wrappers. We design a new structure for wrapper boundary cells with lower area overhead allows for preserving previous functionalities.
The organization of this paper is as follows: Section 2 presents a brief explanation about the general specification of the proposed IEEE Std 1500 wrapper. In Section 3, we describe the structure of the proposed Wrapper Boundary Register in detail and compare it with the previous one. Section 4 is allocated to the results of simulation and synthesis on a group of embedded memory cores that confirm the proposed structure effectively reduces the area overhead. The paper is concluded in Section 5. (WIR) circuitry, a Wrapper Boundary Register (WBR), a Wrapper Bypass register (WBY), a Wrapper Data Register (WDR) and two multiplexers. All components of EMW are specialized for memory core testing while adopting with IEEE Std 1500. In the reminder of the section, we describe EMW components in short detail.
II. EMBEDDED MEMORY WRAPPER (EMW) STRUCTURE
One of the basic parts of the wrapper is WIR circuit which receives Wrapper Serial Port (WSP) signals from a BIST controller as an input. The main function of WIR circuit is to generate control and input signals for other components of the wrapper and gather test responses. Our proposed WIR circuit is composed of three main parts which are WIR Controller, WIR Register and WIR Selector. WIR controller is used to generate control signals of the wrapper and receives all WSP signals in addition to a signal for parallel shift from the BIST controller. Initializing the modified WSP through the BIST controller can produce different instructions which are received by WIR register and set the whole wrapper in each of the operational modes. When there is no instruction to shift, this register shifts the current operational mode data. The last part of WIR circuit is the selector which set only one of the wrapper registers between serial input and output signals of the wrapper based on IEEE std 1500.
Another part of the wrapper is WBY which is a one bit register and bypasses the wrapper in functional mode.
To support diagnosis capability, we add an optional register called WDR to shift test results out. When the memory is under test, this register is set between Wrapper Serial Input and Wrapper Serial Output signal and shifts the diagnosis information out.
Finally the last part of the wrapper is WBR which has the responsibility of applying test and functional stimulus to the core and receiving core responses. In our previous work, we designed a WBR specialized for memory testing which was capable of supporting at-speed and parallel test. Although this proposed register had a lower area overhead compared to the existing scheme, but in this paper we improve its area overhead. In the next section, we describe the proposed structure of this WBR and compare it with the new design. Fig. 2 shows the structure of the proposed WBR which is specialized for Built-In Self-Test of embedded memory cores. The main function of this register is to apply test and functional stimulus to the core and receive the core responses through input and output cells respectively. The operation of the register is determined by the instruction shifted to WIR register. This instruction initializes WBR control signals and set it in each of the of the wrapper operational modes. As seen in Fig. 2 , the proposed WBR is composed of several input and output cells, multiplexers and some glue logics. Fig. 3 and 4 show the design of input and output cells, respectively. These cells are designed to operate in each of the three mandatory modes of IEEE 1500 Std which are functional, internal test and external test.
III. WRAPPER BOUNDARY REGISTER (WBR)

A. Structure of Previous WBR
Each cell has two input and output ports for shifting functional and test data. CFI (Cell Functional Input) and CFO (Cell Functional Output) ports are used to receive and One of the most important points in memory testing is at-speed test for detecting delay faults. Since IEEE std 1500 needs many clock cycles to shift in test commands, update operation to the memory and capture and shift out memory responses, BIST based on this standard have a long test time and cannot test the memory at-speed.
International Journal of Computer
In the proposed structure, due to parallel applying test stimulus and receiving memory responses, test data can send to memory in each clock cycle and the memory responses compare in real time. In every rising edge of the clock cycle, test data is set in the input cells and applied to the memory in the falling edge of the clock. This is the same for output cells, where memory responses is set in output cells in the rising edge of the clock and shift out in the falling edge of the clock. This time scheduling not only reduces test time, but also applies valid data to memory in each clock cycle.
To realize this approach, a design like Fig. 2 is used. As can be seen, a 2-to-1 multiplexer is set between two input cells. One input of this multiplexer is used only for parallel test and the other derives from CTO of the previous cell to produce a scan chain. Similarly, some glue logics set between output cells and implement the same operation.
B. Structure of New WBR
Although the proposed structure can effectively test the memory cores in a parallel and at-speed way, but its high area overhead is an important drawback. In a typical IEEE std 1500 wrapper structure, WBR has the most area overhead compared to the other elements. This is due to the massive number of input and output ports in a memory core which needs a large number of WBR cells to form WBR. So if we can compact WBR cells, the area overhead of WBR is reduced effectively.
In our approach, per core terminal, a wrapper cell is added to the wrapper which only depends on the direction (input, output) of the core terminal. The functionality of the cells is determined by the current operation mode which can set the cells in each of the two types (functional only, test data) and can be used for each of the memory core terminals (assuming test control signals are always input to the cores). Fig. 5 and 6 show the new design of WBR cells which are composed of a less number of logic gates while preserve the functionality in three operational modes. In table II, the value of control signals in three different operational modes for new WBR cells is shown.
These cells have three control signals like the previous version and so can be imported in the previous wrapper structure. This library of wrapper cells has the following properties: 1) Access for functional mode, as well as core-internal and core-external testing 2) Capability of implementing core-internal testing in a parallel and at-speed form 3) Specially designed for memory core testing 4) Low area overhead compared to the previous design
The structure of the new WBR composed of new cells is shown in Fig. 7 . 
IV. SIMULATION AND SYNTHESIS RESULTS
In order to evaluate the proposed design, it has been simulated and synthesized with Xilinx 11.1 for memories of different configurations. It is synthesized using Xilinx 11.1 with 0.18 CMOS standard cell library. At first, we synthesized the new wrapper for a 16K×16 SRAM to evaluate test time. Table III shows the results of test time for some march algorithms with different test time which is dependent to the length of the algorithm. The proposed structure guarantees an at-speed test with the 145.8MHz clock frequency and it has an access time of 6.8 ns which are resulted from the synthesis. These values are obtained assuming at-speed test is implemented by the BIST structure. To study the area overhead of the new design, we synthesized both structures with Xilinx 11.1 and compared the results. As we explain before, the main drawback of the previous structure was the area overhead which arose from WBR. By designing new wrapper boundary cells with lower area overhead, we overcome this problem. The new cells preserve their functionality to operate in three IEEE Std 1500 mandatory modes and have three control signals like the previous ones. So the new WBR can be imported into the previous structure. Table IV summarizes the results on area overhead for the two proposed structures. These wrappers are both synthesized for a 16K×16 SRAM and using 0.18 CMOS standard cell library. The first column denotes the components that constitute the whole wrapper. The second and third columns denote the number of gates for the two wrapper structures. The results show the area overhead of the new wrapper is 8.4% improved compared to the previous one.
Also to evaluate the area overhead of the proposed wrapper for different memories, it is simulated and synthesized using Xilinx 11.1 for memories with various size and configurations. Table V summarizes the synthesis results on area overhead of the proposed wrapper. The first column of the table represents the memory configuration. The second and third columns represent the area overhead of the wrapper based on the number of gates and silicon area respectively. Finally the last column shows the overhead relative to the pure memory. The memories are high density embedded SRAMs and the wrapper structure is synthesized using 0.18um CMOS standard cell library. In Table V , N×W memory configuration denotes a memory with N words and each word has W bits. As can be seen, by increasing the size of the memory, the area overhead is significantly decreasing. Because by changing the configuration of the memory, only the number of the input and output cells of the WBR needs to be changed and the rest of the structure is the same for memories of different configuration. On the other hand, due to the results of table IV, WBR has the most area overhead. So by decreasing the number of cells compared to the memory size, the area overhead reduced significantly. 
V. CONCLUSION
In our previous work, we proposed the structure of an at-speed IEEE std 1500 wrapper which optimized for memory core testing. The proposed wrapper has the capability of implementing parallel test and handling multiple cores concurrently. Although the structure had a low area overhead compared to some existing schemes, but its high area overhead was the main drawback. In this paper, we compact WBR cells to reduce the overhead of WBR which is responsible for this high area overhead. The new WBR design has three control signals like the previous scheme and preserves its previous functionalities to operate in three mandatory modes of IEEE std 1500 and so can be imported in our previous wrapper design.
