Multiport memories a r e widely used as embedded cores in all communication systemon-chip devices. Due to their high complexity a n d very low accessibility, built-in self-test (BIST) is the most common solution implemented to test the different memories embedded in the system. This article presents a programmable BIST architecture based on a single microprogrammable B E T processor and a set of memory wrappers designed to simplify the test of a system containing a large number of distributed multiport memories of different sizes (number of hits, number of words), access protocols (asynchronous, synchronous), and timing.
INTRODUCTION
Silicon area is now so cheap and integration technologies so advanced that industries can embed in a single chip, usually referred to as system-on-chip (Sac), all t h e components and functions that historically were placed on a hardware board. Each component or function is now available as a predesigned complex functional block, or embedded core.
Embedded memories a r e t h e most densc components within an SoC, accounting for up to 90 percent of its real estate. Today's technologies allow the design and manufacturing of memory cores with many 110 ports, and multiport RAM core generators are commonly available in many application-specific integrated circuit (ASIC) vendors' libraries (e.g., LSI-Logic, Texas Instruments, and ST Microelectronics) . T o get a n idea of today's SoC complexity, it is enough to consider that typically more than 30 embedded memories are placed o n a single chip; they a r e s c a t t e r e d around the device rather than concentrated in o n e location; they all have different types, sizes, a n d access protocols and timing; and they can even he doubly e m b e d d e d inside embedded cores. From a testability point of view, memories are also the most sensitive to process defects, making it essential t o thoroughly test them in the SoCs.
This new dcsign philosophy, based on the use of embedded cores, leads to a radical change in the test engineering process. First of all, direct accessibility to interconnections and Cores' boundaries is not possible; however, test patterns and test responses still need to he delivered to the core or the SoC boundaries.
In the case of memory cores, the test methodology of choice is built-in self-test (BIST). BIST offers a simple low-cost means to test for failures of embedded memories without significantly impacting device performance. In this scenario, t h e implementation of an efficient BIST strategy for SoCs including several multiport RAMS requires taking into account the different sizes (number of hits, number of words), access protocols (asynchronous, synchronous), and timing of the memories embedded in the system, to minimize the BlST area and routing overhead and fulfill power budget constraints. Moreover, while it has been used primarily for production passifail testing, BIST should he extended to provide the diagnostic data required for process monitoring and repair. A successful BlST for embedded memories has to guarantee core accessibiliy, scalabi/ily, in-system programmabi/itv (ISPI. low overhead. and tlexibi/itv in the test
scheduling.'
This article nresents the efforts a n d t h e results obtained in designing a proprietary BIST architecture.to tackle the above-mentioned set of problems.
The article is organized as follows. We summarize some of th; most significant memory BlST architectures presented in literature; we give a general overview of the proposed BlST architectures and then detail the structure of the main blocks of the architecture. and diagnosis facilities of the proposed BIST are detailed, and a possible optimization is discussed. We present a real application of the proposed approach on an industrial case study, and finally we summarize the main contributions of the work and conclude the article.
STATE OF THE ART
Several memory BIST solutions have been proposed to test both single-and multiport memories [I, 21, and static and dynamic memories [3,
41.
Programmable memory BIST has been proposed in [5-71 to increase flexibility in applying different combinations of test patterns targeting different types of faults. Despite their effectiveness, all these solutions are designed to address the problem of testing a single type of memory, and none focuses on the problem of concurrently testing several heterogeneous embedded memory arrays. This problem has been addressed in [S, 91, where the authors propose a built-in self-diagnostic method to simulteneously diagnose spatially distributed memory modules with different sizes. The approach is based on the serial interfacing technique proposed in [lo] . The basic idea is to synthesize the 110 port of each buffer as a scan chain from which the test patterns can be provided and memory contents can be read. The solution is very easy to implement, but it is not so efficient in terms of test speed and area overhead, and does not take into account power consumption constraints. Moreover, all the memory tested in parallel must he of the same type. A deterministic BIST state machine was designed in [ I I ] to test multiple RAMS with different Characteristics. Although all the memory modules arc tested (truly) concurrently, each memory module receives its own control signals from the BIST controller. This solution has the disadvantages of large routing area overhead and a complex design of the BIST controller.
MEMORY BIST MANAGEMENT ARCHITECTURE
The goal of this article is the design of a proprietary BIST scheme to tackle the problem of testing the memory subsystem of a complex SOC. Figure 1 gives an overview of the proposed BIST architecture.
A single BISTprocessor is in charge of perf o r m i n g t h e test of all ( o r a subset of) t h e memories of the system. Using a minimal set of communication signals, the BIST processor coordinates, executes, and synchronizes t h e test algorithm of the memories under test. The BIST processor is y-programmable: the test algorithm is stored as a sequence of elementary test primitives in a dedicated memory (pprogram memory); these instructions include (but are not limited to) update of the address generators, application of a test pattern, and comparison of a memory cell with an expected value. This solution allows, if necessary, programming t h e system at runtime t o execute any required test algorithm. The BIST processor functionalities and communication protocol a r e i n d e p e n d e n t of t h e n u m b e r a n d characteristics of the memories embedded in the system.
The different test primitives that constitute the test algorithm are received by a wrapper placed around each memory. In particular, each wrapper is composed of a set of port wrappers (one per each memory port) and a dispatcher. . and timing.
The following sections will further detail the blocks compo& the architecture.
THE BlST PROCESSOR
The proposed memory BIST is bascd on a single BIST processor used to test all the memories of the target SOC. To increase flexibility, BIST execution is based on a w-programmable approach.
Due to their regular structure, the most popular and widely accepted detcrministic test algorithms for memory B l S T are March tests. A March test is a finite sequence o f operations (March elements) applied to each memory cell in the memory array in either ascending or descending order before proceeding to the next cell [l] . March tests are popular because of their low temporal complexity, regular structures, and their ability to detect different types of faults. T h e proposed BIST processor has therefore been optimized to implement March tests. The chosen algorithm is stored in a dedicated p -p m gram memory, coded through a set of fesf primifives. T h e p-program memory can be either a ROM or an ISP device. In the former case, the test program is fixed at design time, whereas in the l a t t e r any custom test algorithm can be downloaded into thc p-program memory at test time.
After selecting the set of memories under test, the BIST processor reads from the p-program memory one test primitive at a time, forwards i t to all the wrappers of the memories undcr test, and waits until its completion by all the target memories.
When the test program is completed (i.e., all the test primitives have been applied), the BlST processor reads the test results from each memory. If a fault is detected, the faulty memories can be located hyresorting.to a set of diagnosis capabilities.
The architecture of the BIST processor and the p-program memory are strongly influenced hy the peculiar characteristics of multiport memories. In fact, due to the possibility of concurrently accessing several cells, new fault models must he targeted [12] , and ad hoc March algorithms must be adopted to cover these new types of fault. In particular, the proposed implementation is optimized to implement March algorithms for multiport memories presented in [13] .
The main characteristic of these algorithms is the use of nested cycles to access the different memory ports, The external interface of the BIST processor can he designed in order to match the target system requirements. Possible solutions are a P1500 compliant interface, an addressable device on the system bus, or a JTAG interface, as in the case study presented later.
THE MEMORY WRAPPER
The wrapper placed around each memory has to execute the test primitives broadcast by t h e BIST processor regardless of t h e particular memory access protocol. The wrapper is therefore the only element in the architecture taking carc of the number of ports, the size, and the access protocol of the memory it wraps.
The wrapper generates the correct test patterns and memory addresses required to execute the received test primitives, and compares the values read during the tcst with the expected ones.
T h e wrapper architecture consists of a dispatcher and a set of port wrappers.
DISPATCHER
Each R A M under test has a dedicated dispatcher, which receives the test primitives for all the port wrappers from t h e BIST processor. Since the primitives are sent sequentially hut must h e applied at the same time in order to execute the required operations concurrently on all the ports of the memory, each dispatcher saves all the primitives in a temporary register and delivers them to each port wrapper only after receiving a synchronization test primitive (RUN). This solution allows a dramatic rcduction of the routing ovcrhead that would h e required to send all the primitives in parallel using a dedicated bus for each port.
PORT WRAPPER
Each memory port has a dedicated port wrapper that generates the test patterns (address and data) and verifies the correct behavior of t h e memory according t o the primitive received from the dispatcher. Thc rcsult of each primitive is signaled on an output line.
T h e internal structure o f , a port wrapper is drawn in Fig. 2 . The addre.s.7 generaror (AG) is in charge of generating the correct address where the test pattern, provided by the pattem generator (PG), has to be written or verified. PGs can easily be customized in order to target different fault types [ 131. Its implementation is nevertheless always very simple, and never more complex than an up counter. The correctness of the content of a memory cell is evaluated using a simple comparator.
Two status hits are used to set the memory in lranspamnt or rest mode (the mode srarus bir) and to store the resf results at the end of the BIST algorithm (the result status bit). All the memories set in test mode are tested in parallcl, whereas those set in rransparenr mode arc bypassed and not tested; this feature is requircd to allow flexible scheduling of the memories under test. T o set and read them, the status bits of all the port wrappers are dynamically connected in a global scan chain.
Finally, each port wrapper includes an interfacing block able to receive the test primitives (command) from the dispatcher a n d execute them on the memory using the required protocol. Moreover, the interfacing block receives a synchronization signal (Sync-lN) from the previous port wrapper, and produces an output synchronization signal (Sync-OUT) needed by the other wrappers and the BIST processor to synchronize the scheduling of the next test primitive.
The Sync-1,N signal of each'port wrapper is directly connected to the Sync-Out,signal of the previous one, except for the last port wrapper whose Sync-OUT signal is connected t o t h e BIST processor. T h e Sync-OUT signal is enabled only when the Sync-OUT signal of the previous port wrapper is asserted. Therefore, the BIST processor receives the logic-AND of the output signals generated by all the port wrappers.
From a functional point of view, Sync-OUT assumes different meanings depending on the received test primitive. As an example, f o r a read or write operation, it has the meaning of end ofinsrruction ( E O N ) . It is asserted when the memory actually ends the execution of the command. This mechanism guarantees thc synchronization among memories with different timing and access protocols. For a primitive to increment or decrement the value of the address geuerator, Sync-OUT has the meaning of end of address (EOAD). It is asserted when the addressing space has been visited by the address generator, allowing t h e synchronization among memories of different sizes. Two types of port wrappers are available: one for the first port of each memory a n d one for the other ports. the two lies in the fact that the port wrapper connected to the first port of the memory implements thc main addressing loop of the March test family discussed earlier, whereas t h e addresses applied to thc mcmory by port wrappers connected to the remaining ports are relative to the value of the address generated by the previous port wrapper.
In ordcr to minimize the routing overhead, the signals exchanged between the BIST processor and the memory wrappers (command signals, synchronization signal, scan chain signals) are multiplexed. In particular, thcsc signals are multiplexed at the port wrapper level. All thc information is routed using only six signals (four command signals and two synchronization signals).
TEST SCHEDULING
An important issue to he faced when running concurrently the BIST of several modules is fulfilling power budget constraints. In fact, BIST typically results in a circuit activation rate higher than t h e normal o n e , and overdissipation of power may seriously damage the devices. Moreover, the variety of memories that can be found in a complcx architecture may requirc different test algorithms. To address thesc two issues, the proposed approach implements a very flexible scheduling mechanism. In particular, it is possible to select the set of memories t o be tested using either a dcdicdted test primitivc as part of the test algorithm or setting the mode sfarus bit flag into the memory wrapper through a scan chain. Only the wrappers of the selected memorics will execute the test primitives received from the BIST processor; all the others will be set in transparent mode and therefore bypassed. In this way, sevcral test algorithms may be stored in t h e p-program memory and may h e applied sequentially to different sets of memories. The definition of algorithms or guidelincs for sclcction of the hest scheduling is a task that depends on the particular targct system and is therefore outsidc the scope of this article. Our main focus is o n the design of an architecture that allows flcnihle definition of test scheduling. The two mechanisms implemented to allow the schcduling of t h e memories under test a r e bricfly explained in the following.
SCHEDULING USING THE CONF PRIMITIVE
Using t h e CONF primitive, it is possihlc t o embed scheduling information into the test program. The representation of this primitive in the p-program memory is defined as follows: * The CONF opcode.
-Thc number of 4-bit words used to code the Activatio~Mask. * The ActivatibnMask, a mask of hits where each bit corresponds to one memory in the system. To include a memory in the set of thc SRAMs under test, the corresponding hit in the ActivationMask has to he set. As a n example, let's consider the system in Fig. 3 .
When thc BIST processor reaches a C O N F primitive during the test program enecution, it reads thc ActivationMask and configures all the memory wrappers using the scan chain defined earlicr in order to activate the required scheduling plan. The first ActivationMask shown in In order to define different test sessions and collect test results, at the end of each algorithm the BIST processor stops the test program execution and waits for a new start primitive to continue with the next one.
SCHEDULING USING THE SCAN CHAIN OPTION
In order to give the designer greater flexibility, the set of memories under test can also he set loading the appropriate ActivationMask directly from the outside using a scan chain protocol. In ordcr to jump to the appropriate test algorithm in the Fprogram memory, the starting value of the p-program memory Address Regi.yter can also he loaded in the BIST processor using the same protocol.
DIAGNOSIS
Fail map extraction is required to output the relevant data needed to determine why a failure occurred within a memory. This data is post-processed using diagnostic software to isolate the defective memory and location within the memory. Therefore, when a faulty memory is detected, the proposed approach allows collection of diagnostic information about the location of the faulty memories, the ports where the fault has been detected, the addresses of the faulty cells, and the detecting patterns. This information is stored into the result status bit, address generator, and background pattern generator of each port wrapper and can h e scanned o u t via t h e Results-Scan-Chain. To allow even m o r e detailed diagnostic capabilities, it is also possible to include in the Result-Scan-Chain the test primitive that triggered the detection of t h e fault. To reduce the scan chain length, dcpending on the result of the test (Result-Status-Bit) , each port wrapper configures its portion of the Resufts-Scan-Chain in one of the following two ways (Fig. 4): -Result-Stntus-Bit=1: T h e memory is not faulty; only the Result-Status-Bit is placed on the scan chain.
-Result-Status-Bit=O: The memory is faulty;
the Result-Status-Bit is chained to the content of the address generator and the background pattern generator.
FURTHER OPTIMIZATION
To further reduce the BlST area overhead, the designer can share a single wrapper for a cluster of identical memories (same type, width, and size) to he tested in parallel.
This optimization is made at the port wrapper level, For each port w r a p p e r only o n e address generator and one background pattern generator are needed. The only difference from the previously described port wrapper structure is that a shared port wrapper contains a pair of status hits and a comparator for each memory. In this way, when a fault is detected, the result status hit of the faulty memory is set, the memory is disconnected, and the wrapper keeps o n testing the remaining memories of the cluster. Obviously, in this case the status of the address generator and pattern generator of the faulty memory are not presewed. To collect diagnostic information, the test must he reexecuted on the faulty memory only by properly setting its mode status hit.
Finally, since a fault in the BIST logic can he detected only if it causes a n e r r o r that is detectable as a memory fault by t h e test algorithm, the stuck-at fault coverage cannot he precisely computed a priori and, anyway, will h e quite low. Therefore, to allow high fault coverage at the end of production, the BIST logic can he synthesized and tested using full scan.
CASE STUDY
A case study has been used to evaluate the proposed approach and gather experimental results. The target circuit, VClZAD, is part of atelecom. municatious ASIC designed by Italtel SPA. Both Italtel SpA and Siemens ICN have also used the same circuit as a benchmark for the evaluation of commercial BIST insertion tools. The target circuit has been described in VHDL and synthesized using the GI0 LSILogic'" library, which provides a set of RAMS of different sizes.
The VClZAD counts up to 860,000 equivalent gates (excluding RAMs), plus 36 small-sized RAMs, for'a total of 14,704 hits and 380,503 equivalent gates.
The case study aims ai evaluating the BIST architecture complexity when applied to a set of memories with very different characteristics, and the area overhead after the B E T insertion.
The 36 RAMS of the circuit are grouped into four distinct macro areas whose characteristics are listed in Fig. 5 .
BlST ARCHITECTURE
In the definition of the BIST architecture, we tried to minimize the number of wrappers resorting, whenever possible, to clusters of memories (described earlier). As a consequence: * Within C12A, the two modules tpa21x8 and the two modules spa*21x26 are treated as two clusters.
-Within C12D, the two modules spa21x34 and the two modules spa' are treated as two clustcrs.
* Within SYNDES. the memories are orea-I nired in four clusters of seven, scven, six, and one elements, respectively. T h e memory clustering has been strongly influenced by the actual floor plan: for example, the three spa21x34 memories (two located inside C12D and one in PDH-INT) are too far apart to he included in a single cluster.
The overall VC12AD structure after BIST insertion is shown in Fig. 5 .
BlST SCHEDULING D u e t o the different characteristics of the VC12AD memories (readiwrite ports, read-only ports, and write-only ports are present), It is not possible to adopt a single March algorithm for all of them. We therefore organized the BIST in four sessions. each executing a different March algorithm: The total area overhead introduced by the port wrappers is 68,177 equivalent gates. This area is not proportional to the number of memory ports, but depends more on the port sizes and functionalities.
The BIST processor and y-program memory area overhead (5431 and 4459 equivalent gates, respectively) are a fixed contribution and are not influcnced by the number of memories present in the system.
The total area overhead.is, in this case study, 17.02 perccnt. Although this result may secm quite high, it is necessary to consider that the target circuit has a lot of small memories, and therefore the overhead introduced by the wrapper is significant. With larger memories the overhead would be much lower.
The area overhead introduced by a commercial BlST insertion tool is 22.5 perccnt.
I CONCLUSIONS
In this article wc present a proprietary solution for a particular industrial scenario in which it is necessary to define the BIST strategy of a complex communication SoC, including several multiport memories of different sizes, access protocols, and timing. The proposed architecture consists of a single BIST processor, implemented as a p-programmable machine and able to execute different test algorithms, a wrapper for each memory (or cluster of memories), each wrapper including o n e port wrapper for each memory port and a special block named dispatchcr. Each port wrapper contains standard memory BlST modules and an interface block to manage the communications hetwcen the memory and the BIST processor. T h e dispatcher collects the 96 IEEE Communications Magazinc -September 2003 instructions from the test processor and delivers them to the port wrappers. The proposed scheme presents scveral advantages. I t a l l o w s running concurrently the BlST of a set of memories of different number of ports, sizes, and access protocols; minimizing the BIST area overhead and connectivity around each memory. In addition, the set of memories to he tested can he freely sclccted by the designer, as well as the test algorithm to be executed on each set.
T h e p r o p o s d m e m o r y BIST architecture deals with memory modules only. If additional modules (e.g., random logic, legacy cores) have to be BlSTed as well, more complex and sophisticated approaches will have to he adoptcd.
