For stacked integrated circuits, effective test access requires the design-for-test (DfT) features in the various dies to operate in a concerted way to transport test stimuli and responses from and to the external I/Os up and down through the stack. This 3D-DfT can be proprietary if all dies in the stack are made by a single company. However, in the likely case that the various dies in the stack originate from different companies, standardized 3D-DfT is required to guarantee inter-operability. IEEE Std P1838 is a standard-under-development that addresses exactly this issue. This paper presents a status report of P1838 and describes its three main hardware components: a serial control mechanism, a die wrapper register, and a flexible parallel port.
Introduction
Technology advances with respect to micro-bump inter-die interconnects, through-silicon vias and associated wafer thinning, and passive interposers enable a new generation of vertically-stacked ICs. We distinguish a number of popular die stack architectures.
• 2.5D-SIC: Multiple active dies are placed side-by-side on top of and interconnected through an interposer base die [1, 2] (see Figure 1 (a)).
• 3D-SIC: Multiple active dies are placed on top of each other in a single tower [3] (see Figure 1 (b)).
• 5.5D-SIC: Multiple towers, each consisting of one or more stacked active dies, are placed side-by-side on top of and interconnected through an interposer base die [4] or an active die (see Figure 1 (c)). These new IC product compositions offer various compelling benefits: (1) heterogeneous integration, i.e., the ability to use the most efficient technology node for each die in the stack, for example optimized for digital logic, memory, or analog circuitry; (2) inter-die communication with high bandwidth, low latency, and low power consumption; and (3) higher component yields (and hence: lower costs per component) in case an originally monolithic, large die is partitioned over multiple smaller dies that are stacked on top of each other -provided faulty dies can be adequately removed from the manufacturing process before stacking by means of a so-called pre-bond test.
Like all micro-electronic products, these die stacks need to be tested before they can be shipped with acceptable quality levels to their customers. We distinguish the following tests [5] : (1) prebond tests prior to stacking, (2) mid-bond tests on incomplete, partial stacks, (3) post-bond tests on complete yet still not packaged stacks, and (4) final tests on the final packaged product. The number of possible test flows grows quickly with the number of dies in the stack [6] and hence is subject of automated trade-off evaluation and optimization [7] .
A well-architected design-for-test (DfT) test access infrastructure is indispensable for achieving a high-quality test. Not only do we need conventional ('2D') DfT structures (such as internal scan chains, test data compression circuitry, IEEE Std 1500 wrappers around embedded cores, and built-in self-test (BIST) engines) that provide test access within a single die. We also need novel '3D' DfT structures that provide modular test access from (and to) the external stack I/Os to (and from) the various dies and inter-die interconnect levels, thereby transporting test stimuli and responses up and down through other dies on the way. Several ad-hoc 3D-DfT architectures have been proposed, based on IEEE Std 1149.1 [8] [9] , IEEE Std 1500 [10] [11] [12] [13] [14] , and on IEEE Std 1687 [15] [ 16] . These architectures all have their specific strong and weak points. However, their underlying 3D-DfT architectures do not inter-operate together. Hence, there is a need for a per-die 3D-DfT standard, such that if compliant dies are brought together in a die stack, a basic minimum of test access features are guaranteed to work across the stack. IEEE Std P1838 is such a standard; currently still under development.
978-1-4673-9659-2/16/$31.00 ©2016 IEEE
21st IEEE European Test Symposium (ETS) Text
The rest of this paper describes IEEE Std P1838. Section 2 briefly describes the history and current status of the standard. Section 3 defines the scope and some key terms that are used in the draft standard. Subsequently, Sections 4, 5, and 6 describe the three main hardware components of IEEE Std P1838: its serial control mechanism, die wrapper register (DWR), and the flexible parallel port (FPP). Section 7 concludes this paper. The first three years of the Working Group's existence were spent on education of the team and definition of the scope of the standard and associated terminology (see Section 3). Three hardware components were identified and corresponding sub-groups ('tiger teams') were started to work out the details in parallel.
History and Status
1. A (one-bit) serial control mechanism, based on the IEEE Std 1149.1 Test Access Port (TAP) [8] , for configuration of and low-bandwidth test data access to the DfT resources of this die and dies further up in the stack (see Section 4).
2.
A die wrapper register (DWR), based on IEEE Std 1500 [10] , consisting of wrapper cells at the die boundary that provide test controllability and observability and hence enable a modular test approach by supporting inward-facing (INTEST) and outward-facing (EXTEST) test modes (see Section 5).
3. A flexible parallel port (FPP), a new native P1838 development, which provides optional n-bit (with n ≥ 0 userdefined) test access to the DfT resources of this die and dies further up in the stack (see Section 6).
These three tiger teams have largely defined their hardware architecture and are currently in the process of capturing their ideas in standards' language: Rules, Recommendations, and Permissions. The Working Group at large is reviewing the draft texts produced by the three tiger teams, and at the same time discussing the requirements for a formal language that can serve as a P1838 specification and implementation description language.
Scope and Terminology
The aim of IEEE Std P1838 is to define a standardized and scalable generic test access architecture to and between dies in a multidie stack, especially stacks with TSV-based interconnects such as 2.5D-, 3D-, and 5.5D-SICs. The focus of the standard is on testing the intra-die circuitry as well as the inter-die interconnects in pre-bond, mid-bond, and post-bond cases in pre-packaging, postpackaging, and board-level situations.
The standard is die-centric, i.e., compliance to the standard pertains to a die (and not to a stack of dies). Standardized die-level DfT features comprise a stack-level test access architecture. In this way, the standard enables interoperability between die maker and stack maker. The standard does not address stack-level challenges and solutions. The most prominent example of this is that the standard does not address compliance of the stack to IEEE Std 1149.1 Boundary Scan [8] for board-level interconnect testing (although the standard should certainly not prohibit application thereof).
P1838 aims to standardize (1) mandatory and optional on-chip hardware components and (2) a formal language in which implementation choices could be specified and described. The on-chip (3D-)DfT hardware is based on and works with digital scan-based test access. The P1838 Working Group aims to leverage existing (2D-)DfT wherever applicable and appropriate, including test access ports (such as IEEE Std 1149.1 [8] ), on-chip DfT such as internal scan chains and wrappers of embedded cores (as in IEEE Std 1500 [10] ), and on-chip design-for-debug and embedded instruments (such as in IEEE Std 1687 [15] ). The standard does not mandate specific defect or fault models, specific test generation methods, nor specific die-internal DfT features.
Stacking of dies requires that the vertical interconnects (microbumps and TSVs) are aligned with respect to footprint (i.e., matching x,y locations), mechanical properties (i.e., matching materials, diameter, height, etc.), and electrical properties (i.e., matching driver/receiver pairs). As a generic DfT-only standard, P1838 does not govern these items. Similar to IEEE Std 1149.1 [8] and IEEE Std 1500 [10] , it only defines a DfT architecture:
• Number, name, type, and function of test I/Os
• On-chip DfT hardware and corresponding description 
Serial Control Mechanism
The main purpose of P1838's serial control mechanism is to configure the dies in a stack into one of their many test modes, while high-bandwidth test data access is handled by the (optional) FPP. In addition, the serial control mechanism also provides lowbandwidth test data access (at a rate of one bit per clock cycle) that remains accessible even when the die stack is soldered onto a printed circuit board and the FPP (which is typically multiplexed onto functional terminals) might no longer be directly accessible.
The P1838 Working Group has decided to base the P1838 serial control mechanism on IEEE Std 1149. [8] .
A compliant die has each of its secondary interfaces equipped with a secondary TAP. 1 A secondary TAP is meant to plug into a primary TAP; consequently, it consists of the same five terminals, but with reversed direction. For secondary interface n (with n ≥ 1), the secondary TAP input terminal is named TDI Sn, while its four output terminals are named TDO Sn, TCK Sn, TMS Sn, and TRSTN Sn. Figure 3 shows the serial control mechanism for a middle die with two secondary interfaces. In this figure, the primary interface (including the primary TAP) resides on the bottom side of the die, while the two secondary interfaces (including their secondary TAPs, i.e., TAP S1 and TAP S2 respectively) reside on the top side of the die. • Instruction Register Figure 3 , IR controls multiplexers m2 and m3.
• Configuration Registers: dedicated TDRs that configure specific P1838 test resources.
TAP Configuration
Register: the TDR that determines which of this die's secondary interfaces are activated and have their serial control mechanisms included into the TDI-TDO path of this die. The TAP Configuration Register controls multiplexers m4, m5, m7, and m8.
FPP Configuration
Register: configures P1838's optional flexible parallel port -see Section 6.
• Regular Test Data Registers (shown in purple in Figure 3 There could be other TDRs, e.g., for electronic chip identification; these are not shown in Figure 3 .
The serial control mechanism transports instructions, configuration data, and actual test data (i.e., test stimuli and responses) between TDI and TDO of its primary TAP. This reconfigurable scan path consists of two concatenated parts (in Figure 3 a red star indicates the concatenation location): (1) registers on this die, and (2) registers in the die towers that are stacked onto this die (or their bypass). Multiplexers m5 and m8 determine whether the serial scan paths from the die towers at secondary interfaces 1 and 2, respectively, will be included into this die's serial scan path; their control signals come from the user-programmable TAP Configuration Register. If a tower is activated and its corresponding secondary interface is selected, its TDI-TDO scan path is included in the serial scan path of our die, and TCK, TMS, and TRSTN of our die are passed on to the corresponding terminals of the selected secondary TAP. If a secondary TAP Sn is deselected, its TMS Sn output receives a user-programmable value from the TAP Configuration Register, switched in by multiplexers m4 and m7, respectively; these multiplexers are controlled by the same signals that also control m5 and m8, respectively.
Generally, scan chains are sensitive for data loss due to setup/hold-time violations, as there is typically little to no propagation delay between subsequent flip-flops in the scan chain. In 3D die stacks, the die boundary crossings are prone to such timing violations, as the stacked dies typically originate from different design teams and/or fabrication facilities and consequently, it is quite likely they implement different clock-tree distribution approaches.
To protect the serial control mechanism against loss of scan data due to set-up/hold time violations at die boundary crossings, P1838 has adopted rules (similar to IEEE Std 1149.1) that incoming TDI data shall be acquired on the rising edge of TCK, while TDO outputs shall change on the falling edge of TCK. These rules hold for the primary as well as secondary TAPs. In Figure 3 , TDO, TDO S1, and TDO S2 are all equipped with a retiming element driven by the falling edge of TCK. TDI always drives the IR or one of the selected TDRs, which all run on the rising edge of TCK. Dedicated pipeline flip-flops clocked on the rising edge of TCK are added after TDI S1 and TDI S2, in order to assure that these TDI inputs acquire their incoming data on the rising edge of TCK. In Figure 3 , these pipeline registers are fed by multiplexers m6 and m9 respectively, which receive SHIFT IR OR SHIFT DR from the TAP Controller as control input in order to be able to hold their data in the TAP Controller's Pause states.
Die Wrapper Register
IEEE Std P1838 mandates a die wrapper register (DWR) that provides controllability and observability to the inputs and outputs of a die. Currently most of the rules created for the DWR pertain solely to digital signals. Some digital signals, such as test signals, clocks and asynchronous signals are exempt from the DWR rules.
There is still ongoing discussion on whether this standard will be extended to other signal types (e.g., analog) for control or observability of those signals during test.
The DWR provides isolation to enable internal test (INTEST) of the die. It also enables external testing (EXTEST) of the interconnect (micro-bumps, TSVs, and/or interposer wires) between dies without requiring access to the entire die. These DWR requirements are much the same as the IEEE 1500 Wrapper Boundary Register (WBR) requirements are on an embedded core [10, 17] . Utilizing only the DWR for interconnect testing enables die providers to protect their design IP, as only the wrapper design must be delivered to enable the test. It also enables full testing of the interconnect with a small amount of logic which allows the CPU time and memory requirement during test creation to be much smaller than it would be if the complete design of both dies had to be utilized. Figure 4 shows the DWR in yellow. There must be a DWR configuration with a single scan chain where the input to the scan chain is connected to the test input (TI) of the DWR and the output is connected to the test output (TO) of the DWR. During the various serial modes, the TI and TO of the DWR would be connected to the primary TAP's TDI and TDO, respectively.
There is currently an ongoing discussion around possible requirements with regard to a configuration for each interface type; for instance, mandating a DWR segment for the primary interface and one for the secondary interface. If there are multiple secondary interfaces a DWR segment for each of additional interfaces may be required to enable a more optimized mid-bond test access path of the interconnect as shown in Figure 5 . Die 1 has one primary interface and two secondary interfaces. If only Die 2 has been stacked onto Die 1 and there is a desire to perform mid-bond testing, only the DWR segment from Secondary Interface 1 must be enabled. Beyond the DWR segmentation per (primary/secondary) interface, the total number of segments in a DWR is at the discretion of the implementer. There may be a desire to add more DWR segments so that the scan chains can be shortened. The FPP is utilized to access multiple DWR segments in parallel when they are not concatenated into a single DWR chain (see Section 6).
Die Wrapper Register Cell
The DWR comprises DWR cells. A DWR cell can be a shared cell which reuses a functional storage element (e.g., flip-flop). This generally occurs on registered terminals (i.e., terminals that have a register directly connected to the terminal). The DWR cell can also be a dedicated wrapper cell, as shown in Figure 6 . A dedicated wrapper cell uses one or more dedicated storage elements and must have a mode control (e.g., INTEST or EXTEST). Scan access enables the controllability and observability of each DWR cell.
There are two classes of DWR cells: fully-provisioned and partially-provisioned. The standard has rules that define the requirements of a fully-provisioned DWR cell, a partiallyprovisioned DWR cell, and when it is appropriate to use each type of DWR cell. A fully-provisioned DWR cell must adhere to the following requirements that are illustrated in Figure 6 .
1. At least one storage element connected between the cell functional input (CFI) and cell functional output (CFO).
2. At least one storage element connected between the cell test input (CTI) and cell test output (CTO). This storage element can be the same storage element as described in Requirement 1.
3. The capability to service the capture event.
4. The capability to service the shift event.
5. The capability to enable the apply event.
The capture event allows the value on the CFI or CFO to be captured into a storage element. The apply event (shown in Figure 6 ) is when the test data becomes active at the CFO output of the DWR cell as test stimuli. The shift event moves the value from the storage element to CTO and from CTI into the storage element. Figure 6 shows the functional and test terminals of a dedicated wrapper cell.
A fully-provisioned cell is required on all digital inputs and outputs, except for digital signals that cause data to be loaded into a sequential element (e.g., clock, asynchronous reset) or dedicated test signals specified by the standard.
A partially-provisioned cell must, at a minimum, be able to service the shift event and either the capture or apply event. If a signal is asynchronous and needs direct control from the tester, then there is no reason to have control capability (i.e., apply) in the DWR cell during scan test. However, if there is a desire to check the connectivity of the signal to the terminal of the die under test, an observeonly cell (e.g., shift and capture capability only) can be added to the DWR scan chain and connected to a terminal as shown in Beyond the basic requirements, there are many options permitted for a DWR cell, as discussed in Section 5.2.
As mentioned earlier, a shared DWR cell generally reuses the functional storage element of a registered input or output die terminal. One of the items under discussion in the IEEE Std P1838 Working Group is whether 'inland' wrapping should be allowed in this standard [18] . This is a methodology used often on 2D wrapping of embedded cores to reduce test logic and to get true functional timing during test from the external logic to the internal logic through a minimal amount of combinational ('shore') logic. An example of an inland shared wrapper cell is shown in Figure 8 (a). The AND gate connected to the unregistered input terminal would be tested during EXTEST, rather than INTEST. Commercial ATPG tools know how to support inland wrapping automatically. The benefit of this methodology is that a dedicated wrapper cell (as shown in Figure 8(b) ) is not required. This reduces area and prevents test logic from being added into what can be a timing-critical functional path. In addition, during EXTEST mode, it allows the true flop-to-flop functional path to be tested enabling a more accurate at-speed test. Another place where a fully-provisioned DWR cell is required is the control signal of a three-state gate. It is also required that during shift, the state of the control signal of a three-state gate be persistent and a safe state should be maintained on the output of the three-state gate.
DWR Cell Naming Convention
The IEEE P1838 wrapper cell naming convention is inspired by IEEE Std 1500 and can be described by the following regular expression:
/DC( S[DF]\d+)( C([IO]([IO\d+]))|N)?( U)?( O)?( G[01]?)?/
• The first character field, "(DC)", is mandatory and indicates that it is a die wrapper cell.
• The second character field, "( S[DF]\d+)", describes the shift storage element ( S). If the wrapper cell is dedicated, it is described with a "D"; if it reuses a functional storage element (shared), it is described with an "F" ([DF]). The number of shift storage elements in the wrapper cell must also be described (\d+).
• The third character field, ( C([IO]([IO\d+]))|N), describes the capture storage element ( C). The functional input terminal from which the data is captured is described next ([IO]).
If it is captured from the functional input of the DWR cell, the description character is "I". If it is captured from the functional output of the DWR cell the description character is "O". The next set of characters ([IO]|\d) describes into which storage element, with regard to the test input, data is captured. If there are multiple shift storage elements, the data can be captured into any of these storage elements. If the cell is closest to the test input or test output, then the "I" or "O" character, respectively, can be used. The \d+ must be used for any storage element that is not closest to the test input or test output. "N" is used when there is no capture capability in the DWR cell.
• The optional fourth character field ( U) describes if there is an update register.
• The optional fifth character field ( O) is used when the DWR cell only has observe capabilities.
• The optional sixth character field ( G[01]) is used to describe a gate that outputs either a 0 or 1 when enabled.
The description, using the DWR cell naming convention, for the dedicated DWR cell shown in Figure 6 is DC SD1 COI. It is a dedicated DWR cell and there is only one shift element (SD1). Data is captured into an element that is next to functional output and closest to the test input (COI).
Flexible Parallel Port
State-of-the-art integrated circuits can have millions of flip-flops on chip. Having only a serial port would require millions of clock cycles only to shift in one test pattern, thereby leading to significantly long test times. The cost of ICs increases due to the elongated test time. Therefore, using a parallel port that can transport multiple stimulus and response bits simultaneously becomes highly desirable [12] . The on-chip scan chain is divided into multiple shorter scan chains that receive test stimuli from and transfer test responses to the parallel port. To avoid the overhead of having additional pins, the parallel port can share pins with existing functional pins through multiplexing. One particular focus of IEEE Std P1838 is to develop and standardize a flexible parallel port (FPP). The same as a conventional parallel port [12] , this FPP is expected to be used for high-throughput testing; however, the FPP provides additional flexibility and configurability compared to the conventional parallel port. The FPP provides a flexible template that covers common, advanced, and even exotic test scenarios. The design and optimization of IEEE P1838's FPP are described in this section. 
FPP Registered Lane
IEEE Std P1838's FPP registered lane is shown in Figure 9 . Alternatively, flip-flops at positions C and D can be omitted if not needed for horizontal pipelining when the horizontal distance between two lanes in the same die is short. Two lock-up latches are in the template, which are used while transferring data from die to die. The lock-up latches guarantee that transported data leave a die at the negative edge of the clock signal, thereby minimizing the chance for hold-time violations [8] .
FPP Non-Registered Lane
The FPP's non-registered lane is shown in Figure 10 
Multiple FPP Lanes on One Die
Multiple registered and/or non-registered lanes can be used simultaneously on one die, as illustrated in Figure 11 . Multiple registered lanes, which share the same control and clock signals, are bundled together as a channel.
To control all the FPP lanes on a die, an FPP Configuration Register is essential. During the implementation of an FPP lane, a control signal is hardcoded if a specific logic value is assigned and does not need to be implemented in hardware. 
The Flexibility of IEEE P1838's FPP

Case Study with IEEE P1838's FPP
To verify the feasibility and capability of the IEEE P1838's FPP registered and non-registered lanes, a test case for a single-tower 3D stack is studied in this section.
In [12] , a conventional parallel port (CPP) is proposed for testing single-tower 3D-SICs. This proposed CPP is redrawn in Figure 13 in the style of IEEE P1838's FPP. For testing a single-tower 3D-SIC with two dies, one CPP is required on each die. To achieve the mapping from FPP to CPP, two registered lanes have to be used to implement one CPP on each die. One FPP data lane is used for up- When the on-die scan chains are bypassed, the test patterns have to go through a pipeline flip-flop for maintaining timing robustness. Alternatively, when the test patterns go through the on-die scan chains, the pipeline flip-flop can be bypassed. Therefore, FPP CORE SEL and FPP REGPU BYP can be combined into one control bit 'Bypass', as shown in Figure 14 (a). After removing all the unnecessary components, terminals, and control signals, the upward FPP data lane is shown in Figure 14(a) . The test patterns can be transferred from the primary port (FPP PRI) to the secondary port (FPP SEC), while going through or bypassing the on-die scan chains depending on the value of the control signal 'Bypass'.
The mapping process of the downward FPP data lane to CPP is similar to the upward FPP data lane. As shown in Figure 14 After finishing the configuration of the FPP data lanes and clock lanes, the test infrastructure that is intended to implement the same functionality as the CPP (see Figure 13 ) is shown in Figure 16 . The same test infrastructure is repeated on the two stacked tiers with different clock lane implementations. The FPP-based test infrastructure can test Die 1 or Die 2 alone, or test Die 1 and Die 2 with the same or different test patterns. Note that besides the normal test capability, this FPP-based infrastructure can also perform a special function that is able to be performed by the CPP in Figure 13 , which is the combination of bypass + turnaround, thanks to the employment of the two terminals FPP FROM SIDE and FPP TO SIDE. By setting both 'Bypass' and 'Turn' to 0, the test patterns do not go into the core logic and leave the upward data lane from FPP TO SIDE. The test patterns go into the downward data lane from FPP FROM SIDE and come back to the primary port (FPP PRI) of the downward data lane, as indicated by the dashed line in Figure 16 . The testing infrastructure that is intended to implement the same functionality as the CPP by using the initially-proposed FPP data lane.
Conclusion
3D-SICs and their 2.5D and 5.5D variants will soon be hitting the markets. Effective mid-bond and post-bond test access requires a 3D DfT architecture that transports the test control and test data signals up and down the stack. IEEE Std P1838 is a standardunder-development for such a 3D DfT architecture. This paper provided a status update of P1838, with description of its three main hardware components: the serial control mechanism, the die wrapper register, and the flexible parallel port.
