This paper presents a BIST architecture, based on a single micro-programmable BIST Processor and a set of memory Wrappers, designed to simplify the test of a system containing many distributed multi-port SRAMs of different sizes (number of bits, number of words), access protocol (asynchronous, synchronous), and timing.
Introduction
Multi-port SRAMs are nowadays widely used as embedded memories in a plenty of digital systems like telecommunications ASIC's or in multiprocessor systems. They allow to speed up the system, particularly when the memory has to serve many concurrent requests. Today's technologies allow the design and manufacturing of memory chips up to 11 ports, and Multi-port RAM generators are commonly available in many ASIC vendors library as LSI-Logic, Texas Instruments and ST Microelectronics. Due to the high complexity of this new integrated circuits, resorting to BIST techniques is nowadays a must. In this scenario, the test engineer has to define the BIST strategy of a complex SoC including several multi-port SRAMs of different sizes (number of bits, number of words), access protocol (asynchronous, synchronous), and timing. Apart from the required design time, the mentioned task usually poses several issues, including minimizing BIST area and routing overhead, selecting the proper number of BIST controllers to be used (that is, choosing the proper memory clustering for BIST controller sharing), fulfilling power budget constraints, and supporting diagnostic capabilities.
Commercial tools are nowadays available for the automatic insertion of the RAM BISTing [l], [2] . The present paper presents the efforts and the results obtained in designing a proprietary BET architecture to tackle the above-mentioned set of problems.
A BIST architecture based on a p-programmable BIST controller used to test large capacity dynamic memories was proposed by [3].
The BIST architecture proposed in this paper (Figure 1 The proposed scheme presents several advantages. Among the others, we would like to highlight the following ones:
It allows running concurrently the BIST of a set of SRAMs with different number of ports, sizes, accessing protocols and timing.
The set of memories to be tested can be freely selected by the designer, using either ad-hoc test primitives stored in the test program, or a dedicated scan chain to properly set an ad-hoc status bit in each memory. Using a single BIST controller and a minimum set of communications signal allows minimizing the BIST area overhead and the connectivity around each SRAMs. Implementing the BIST Processor as a microprogrammable machine provides the test engineer a flexible and reusable block, which can be used to manage the BIST of any number of memories of any size, and it is independent from the test algorithm.
Experimental results gathered on a realistic case study are discussed in Section 6, and Section 7 eventually draws some conclusions.
The BIST Processor
As introduced in the previous section, the proposed scheme is based on a single BIST Processor used to test all the memories of the system. To increase flexibility, the BIST execution is based on a micro-programmable approach. The rest algorithm (a March Algorithm 141)' is stored in a dedicated program-Memory, coded through a set of rest primitives. The Crprogram-Memory can be either a ROM or an In-System Programmable module. In the former case, the test program is fixed at design time, whereas in the latter one a custom and appropriate test algorithm can be loaded into the memory at test time. sequence, whereby the cell B goes from 0 to A -I ; and for each value of B, cell C goes from B + I to n.
The paper is organized as follows: Sections 2 and 3 describe the two main blocks that compose the proposed architecture. Section 4 details the diagnostic capabilities of the architecture, whereas Section 5 presents two possible optimization to minimize the area overhead when dealing with a set of identical memories and to reduce the test length using a topological approach. Each test-program step is coded in the programmemory as a sequence of test primitives, one for each memory port.
As an example, let's consider the following March Algorithm used,to test an &bit dual port SRAM (the convention for the operation is (p0rtA:portB)):
March Algorithm Test Primitives
The March elements &-M3 realize the MATS algorithm,
properly expanded as proposed in [6] to cover intra-word CFsts faults, whereby BPo through BP7 are taken from the set of Background Pattems from Table II[6] .
Back round Pattern -1 The March elements M4-M9 represent the March 2PF2,2 proposed in [4] to test wCFi&wCFi.
The proposed March Algorithm can be coded using the set of primitives shown in Table III . sends a test command per clock cycle to all the dispatcher (the fist command is driven to all the FPWs, the second one to all the first OPWs,etc.).
Since each wrapper has no information about the other wrappers' size, a run signal is sent after all the commands. The dispatcher saves all the commands in a temporary register and, when receiving the run signal, it delivers them to each port wrapper.
As an example, the execution of the (W0:RO) instruction for a dual port memory is shown in Figure 3 . 
Port Wrapper
The internal structure of a FPW is drawn in Figure 4 . The Address Generator (AG} is in charge of generating the correct address where the test pattern, provided by the Background Pattern Generator (BPG), has to be written or verified. Several BPGs are available, to target different faults type [6]. The correctness of the content of a memory cell is evaluated through a simple Comparator.
Test instruction execution diagram

Dispatcher
The dispatcher receives the test primitives for all the port wrappers from the BIST Processor. The BIST Processor (Table IV) . 
I the memory Wrappers has completed the execution of the instruction
IncDec
I End of Address
I Set to '1' when the whole addressing space has been visited by the AG The structure of the OPW is similar to the FPW. In order to execute the March algorithm seen in 2, this wrapper includes some additional blocks, since it must generate a subset of the entire addressing space, depending on the address generated by the previous port wrapper. An 
Diagnosis
When a faulty memory is detected, the proposed approach allows collecting diagnostic information concerning the location of the faulty SRAM, the ports where the fault was detected, the address of the faulty cell, and the detecting pattern. These information items are stored into the Result Status Bit, the Address Generator, and the Background Pattem Generator of each Port-Wrapper and can be scanned-out via the Results-Scan-Chain. In particular, This optimization is made at the Port-Wrapper level. For each Port-Wrapper only one Address Generator and one Background Pattem generator are needed. The only difference with the previously described Port-Wrapper structure is that a shared Port-Wrapper contains a pair of Status Bits and a comparator for each RAM. In this way, when a fault is detected, the Result Status Bit of the faulty memory is set, the RAM is disconnected, and the Wrapper continues testing the remaining memories of the cluster. Obviously, in this case, the status of the Address Generator and the BPG of the faulty RAM are not preserved. To collect diagnostic information, the test must be re-executed targeting the faulty RAM, only, by properly setting its Mode Status Bit.
C12D
Using a Topological approach for complex coupling fault testing
The approach proposed in this paper is useful to describe March Algorithms for multi-port RAMS with complexity of O(n"') where n is the number of cells and m the number of ports. For practical applications, these algorithms result in very long test sequences. It is possible, as proposed in
[5], to optimize the address generator of each OPW in order to generate the address for a Topological Approach. The approach consists in detecting all coupling faults between adjacent cells only. Using this optimization the test complexity can be reduced to O(n) without significant fault coverage reduction.
Casestudy
A case study has been used to evaluate the proposed approach and to gather experimental results. The circuit. named VC12AD, is a part of a telecommunication ASIC designed by Italtel SPA. The same circuit has also been used by both Italtel SpA and Siemens ICN as a benchmark for the evaluation of commercial BIST Insertion Tools.
The target circuit has been described in VHDL and synthesized using the GI0 LSlhgicTM library, which provides a set of SRAMs of different sizes.
The VC12AD counts up to 860K SynopsysTM equivalent gates (excluding RAMS), plus 36 small-sized SRAMs, for a total of 14,704 bits ( Figure 6 ).
The case study aims at evaluating: the BIST architecture complexity when applied to a set of SRAMs with very different characteristics; 0 the area overhead after the BIST insertion. Figure 6 and (asynchronous quadruple port R A M with two ports dedicated to write and two dedicated to read).
Case Study Architecture
Case Study BIST Architecture
In the BIST Architecture definition, we tried to minimize the number of wrappers resorting, whenever possible, to clusters of SRAMs (see Section 5.1). As a consequence:
Within C12A, the 2 modules tpa21x8 and the 2 Within C12D, the 2 modules spa21x34 and the 2 Within SYNDES, the memories are organized as modules spa*21x26 are treated as two clusters.
modules spa* are treated as two clusters four clusters of 7,7,6. and 1 element, respectively.
The design of the BIST architecture has been strongly influenced by the actual floor plan, where, for example, the 3 spa21x34 SRAMs (2 located inside C12D and 1 in PDH-INT) are too far to be included in a single cluster.
The overall VC12AD structure after the BIST insertion is in Figure 8 . 
Experimental results
The area occupation of each memory and its Wrapper is in Table V . whereas Figure 9 shows the contributions of the functional blocks of each Wrapper. As shown in Table VI , the BIST processor and the program-memory area overhead is a fix contribution and it is not influenced by the number of SRAMs in the system.
Comparison with a commercial tool
To evaluate its :ffectiveness, we compared the area overhead introduced by the proposed approach with the one obtained using a commercial tool on the same test case. The area overhead introduced by the tool is around the 8%, and therefore slightly higher than the one obtained inserting the proposed BIST schemes. Nevertheless, it is necessary to take into account that the mentioned test case has been specifically chosen to stress the tool and, probably, on a real system the overhead would be smaller.
Moreover, our approach is designed to target memories only, whereas the commercial tool is able to introduce test logic for all the different parts of the circuit.
Conclusions
In the present paper we presented a proprietary solution for a particular industrial scenario, in which it is necessary to define the BIST strategy of a complex system including several multi-port SRAMs of different sizes, access protocol, and timing. 
