Design for Testability in
Ha rdwa re-Softwa re Sys tern s CREATING TESTABLE designs is key to developing complex hardware and/or software systems that function reliably throughout their operational life. Without testability, design flaws may escape detection until a product is in the hands of users; equally, operational failures may prove difficult to detect and diagnose. Increased system complexity makes thorough assessment of system integrity by testing external black-box behavior almost impossible. System complexity also complicates test equipment and procedures. Design for testability should increase a system's testability, resulting in improved quality while reducing time to market and test costs.
The term system means many things to different people. We consider asystem to be an integrated set of hardware and/or software modules. Each module is an identifiable part of the system with strictly defined functionality. Hardware, software, or a mix of both, can implement each module. Our view of DIT therefore relates to testing at this abstracted system level, and is a function of the combined testability of all system modules. Traditionally, hardware designers and test engineers have focused on proving the correct manufacture of a design and on locating and repairing field failures. They have developed several highly structured and effective so-0740-7475/96/$05.00 0 1996 IEEE lutions to this problem, including scan design and self test. Design verification has been a less formal task, based on the designer's skills. However, designers have found that structured design-for-test features aiding manufacture and repair can significantly simplify design verification. These features reduce verification cycles from weeks to days in some cases.
In contrast, software designers and test engineers have targeted design validation and verification. (Unlike hardware, software does not break during field use. Design errors, rather than incorrect replication or wear out, cause operational bugs.) Efforts have focused on improving specifications and programming styles rather than on adding explicit test facilities. For example, modular design, structured programming, formal specification, and object orientation have all proven effective in simplifying test.
Although these different approaches are effective when we can cleanly separate a design's hardware and software parts, problems arise when boundaries blur. For example, in the early design The access problem Limited access to individual modules often limits a complex system's testability. Design and implementation of a system comprising multiple modules is very attractive, because we can subdivide the system complexity into comprehensible parts. Nevertheless, after assembly, the complete system's behavior turns into one black box with the multiplied complexity of all its components.
For instance, we can model a module's behavior using a state machine, expressing behavior in terms of states, transitions, and conditions. Verifying all state transitions can test a module's valid behavior. In general, if a module has N states (state space N), there will be at least Nstate transitions, requiring at least N different tests. In a complex system of several modules, the number of states increases rapidly. The system's black box behavior consists of all the modules' state spaces. If the system containsKmodules with N states each, the composite system has state space W. We call this exponential growth state space explosion.
Clearly, the testability of modular systems improves considerably if we can test the modules separately. Hence, modular hardware-software designs should incorporate access paths for test-80 ing to enable the testing of separate modules. Researchers have widely applied this divideandconquer approach to testing complex, modular, digital circuits.
Design for system level testability
We base design for system level testability" on a clearseparation between implementation-independent system specification and the actual hardwaresoftware system implementation. In the design process, we first create a specification of the system's functional behavior. Such a behavioral specification leads to a clear, thorough understanding of the system to be developed-one not blurred by implementation details.
This specification provides a solid basis for partitioning the system into hardware and software, and choosing an appropriate architecture.
To achieve system level design for testability, we must add system level test requirements to the Specification. This aims at improving the controllability and observability of system-embedded modules. Next, we must transform the implementation-independent test requirements into actual hardware and/or software requirements. Placing test requirements in the specification can have a serious impact on the actual system implementation. A design may implement test requirements as existing test facilities like boundary scan paths. On the other hand, test requirements may also demand new hardware and software test facilities.
Separating specification from the actual implementation is a basic principle of modern design methods. Such methods include structured4 and object-ori-ented5 analysis and design, as well as hardware-software Therefore, design for system level testability fits well within these modern design methods.
System level testability in the specification
Our basic principle is that we can ackle system test complexity by partiioning the system into modules. inserting test functionality into the sysem a1 em testing 3y firs the 1 modules md then the interactions between modules. Hardware testing (for example, q y toward design for system level testability has two parts.
Partitioning the system-Structured, modular design methods automatically lead to improved testability. However, we can further improve testability by making it a major criterion for system partitioning. Adding test functionality-This allows us to control and observe individual modules and the interactions among them for testing purposes. We first state test functionality in the system specification, without concern for implementation details. In the next step of the design process, we incorporate test functionality in the actual hardware-software system implementation.
Partitioning. There are many heuristics and rules of thumb for system partitioning. Minimizing dependencies between modules and minimizing parallelism inside a module are important to improving testability.
The partitioning criterion of minimum dependence between modules means that we should partition the system into independent modules. We achieve this by minimizing the interactions and communication between modules. During testing, we now can isolate a module from its environment with relative ease.
Minimizing parallelism inside a module offers the advantage of producing well-testable modules. A module's par-
allelism is an important complexity measure. As indicated in the previous section, the number of possible states increases exponentially if a module consists of interacting, parallel, finite-state machines. Since this state space explosion dramatically increases the number of required test scenarios, minimizing parallelism also reduces this number.
Ideally, we can model a system as a set of communicating processes. The minimumdependence criterion aims to minimize process interactions. Minimum parallelism allows each module to correspond to a single sequential process. Figure 1 shows an example of a partitioned system with five modules. We have applied the minimum-dependence criterion to partition the system into a limited number of modules. We next applied the minimum-parallelism criterion to split complex modules into smaller ones. For instance, a complex module initially contained both modules B and C, but we decomposed this module to limit test complexity.
Adding test functionality. To test individual modules and their interactions, we offer test stimuli to the modules and observe the responses at their boundaries. This requires us to control the module boundaries and observe them directly in the system environment. In general, however, this is not possible; we require paths via other modules to offer test stimuli and observe the responses of a module under test. In Figure 1 , we can neither control module C's boundaries, nor can we directly observe them in the system environment. Testing thus requires test paths through other modules.
These limited control and observation capabilities seriously reduce testability for several reasons.
We must set up and maintain test paths to and from the module under test. This may be infeasible or require significant effort. When the test detects an error, we do not know whether the error occurred in the module under test or in the paths. In real-time systems, the order and timing of events is critical. Thus, during test, we should be able to control the timing of incoming events and observe the timing of outgoing events. This is difficult to achieve without direct access to the module under test.
To improve testability, we add test functionality to the system specification and use three kinds of test functions.
Transparent test mode (7TM). We can eliminate the accessibility problem if the modules that constitute the path to the module under test are transparent. They are transparent in the sense that they convey signals without change. We achieve this by extending the module's behavior with an additional transparent-test operating mode. Whenever a module switches into this TTM, it passes incoming events directly to outgoing events in a predefined way, providing a transparent path from module inputs to module outputs.
Additionally, if it is not possible or desirable to make a test path from the tester to every access point of the module under test, we can include a test r e sponder. This test responder more or less inverts the test path: It returns controllable signals from the module under test to the tester. that we often design them specifically A disadvantage of these functio Nonetheless, the TTM concept can be Built-in self-test (BIST). We can equip a module with self-test, which reduces the required controllability and observability from the system environment. The module's BIST functionality offers test stimuli to the module and observes and evaluates the responses. We start and control the BIST in the module from the system environment. When the test ends, it returns a go/nc-go response or diagnost the system environment.
Point of control and observation (PCO) . At the module boundaries, we can insert points allowing us to control and observe interconnections between modules directly in the system environment. We insert a PCO in an interconnection between two modules. As shown in Figure 2 , a PCO has three operation modes, selected by the mode input. Table 1 lists how we use these different modes. (Readers can contrast this abstracted representation with the analog test access tec for standardization 1149.4."') supports all three modes. Of cou is also possible to implement on1 modes. For instance, a point of observation (PO) supports only the transparent and observation modes. Besides using PCOs for observing and CO ling interconnections, we can equip data stores with them. In observation mode, we can use the PCO to monitor the contents of a data store. In test mode, we can use the PCO to read and write a data store.
In a system, we can control each PCO individually via its mode input. However, a common mode select can control multiple PCOs.
TTM and PCO functionalities offer paths between the system environment and embedded modules. In addition to test information, these paths can also transport system management information, such as programming updates and data. Observation This mode monitors the system during normal operation. The PCO input passes to both the PCO and observation outputs. The observation output monitors data passing through the PCO in the system environment. The control input is also of no concern in this mode. Figure 4 shows the TRI-BUNE test configuration. The test system ships its testing data over a TCP/IP (Transmission Control Protocol/ Internet Protocol) channel to the system under test. The TCP/IP channel uses a simple test management protccol to perform multiplexing, flow control, and identification of data and control signals. In the system under test, it demultiplexes and directs the signals to the appropriate PCOs.
TRIBUNE systems implement a limited number of PCOs, each placed between two communicating entities, that is, at a protocol interface. The PCOs have functions to observe and control the behavior of the protocol layers under test (implementation under test). Figure 5 shows a PCO with two identical switching cells to control information streams in both directions at the interface. The basis for this technique is the IEEE standard boundary scan conceptlo developed for hardware testing. In Figure 4 , two PCOs are necessary to completely test the implementation under test (LSAS and SSCS). We could easily add more PCOs, although TRI-BUNE purposes did not require them.
PCO design. The PCO control part
uses separate signals to exchange data and control information (see Figure 5 ). The control signals can be transferred directly to the cells. We may, however, have to adapt the data to the coding of service primitives at the upper and lower interfaces of the implementation under test. These interface adapters depend on the implementation. Two symbolic switches control the information stream within a switching cell. Switch S1 controls whether or not the test system can observe the incoming information stream. AVieW (V) or Blind (B) control signal can set S1. Switch S2 controls whether or not the incoming information stream should transfer transparently to the adjacent layer. A Connect (C) or Disconnect (D) control signal sets S2. Four different switch combinations yield four possible modes (or states) for each switching cell.
Experiment evaluation. The TRIBUNE
test architecture allows various test configurations not possible with conventional test techniques. The architecture not only allows direct testing of internal modules, it also facilitates interoperability testing of the communication stacks without using the application software.The test architecture ( Figure  4) shows how we can test the combination of layers WAS and SSCS using a multilayer, local test method. The test system can control both the upper (to WAS) and lower (to SSCS) tester, which implies a substantial increase in the number of traversed states. In this way, we achieve a more balanced test coverage. Comparing this technique with conventional test methods, we estimate that it is possible to double the number of implementable and executable test purposes. Moreover, the actual tests can be shorter, due to significant reduction of the synchronizing sequences' lengths.
The addition of explicit test functions in the system under test does take extra money and time. We represent these costs as lines of programming code. Since we specified the TRIBUNE systems formally in the Specification and Description Language, we measure the code in units of SDL lines.
An average protocol layer takes an estimated 5,000 lines of SDL code. Asingle PCO contains about 800 lines, while the passive test control function needs 500 lines. Any additional PCO may require 80 new lines of code. Thus, a system under test containing a protocol stack with five layers and five PCOs will need 26,620 lines of code, of which test functions consume 6%. 
Implementation aspects
We next discuss the step from specikation to hardware-software implementation, focusing on how to implement the specified test functionility in hardware and/or software. When constructing the system's harduare-software architecture, we can in-Zorporate a recursive test hierarchy into :he system ( Figure 6 ). For full test and diagnostic control, this test architecture should at least offer the functions in the 'allowing list. Preferably, these funcions should be available at every test iierarchy level:
initialization of system, subsystems, modules, or components (mode setting, reset) access to and control of system components at lower levels in the hierarchy transportation of test stimuli control of built-in test facilities collection of test results identification of components In general, two opposing strategies exst for incorporating a hierarchical test irchitecture: centralized and distrib-~t e d .~J~ In a centralized strategy, a sin-;le test control module at the top level iccesses and controls all lower levels in he system. The distributed strategy dis-tributes test control over the individual levels as much as possible. Although both strategies have advantages and d k advantages, we prefer the distributed strategy for the reasons discussed next.
The centralized strategy does not require us to equip every test level with modulespecific test and maintenance knowledge, providing asimple and lowcost test architecture. However, because the central test module contains the implementation knowledge, we may not interchange modules that are functionally equal but implemented in different technologies. Hence, the centralized strategy is rather inflexible. Furthermore, it introduces significant communication overhead for transporting test data between hierarchy levels.
The distributed test strategy is more flexible, because it locates test knowledge at the individual system levels. This facilitates concurrent testing and thus reduces test time. Distribution of test functions also reduces the system test control module's complexity. Furthermore, locating implementation knowledge at the individual test levels stimulates implementation independence, because the system test controller can operate at an implementation-independent level. This also eliminates the need for complex and application-specific test interfaces. By providing standardized, general-
purpose test interfaces, the distributed strategy facilitates use of commercially available products that are fully interchangeable. In this way, we can produce highly testable systems with little effort. These advantages show that the distributed approach is highlysuitable for testing complex systems.
The centralized and distributed test strategies represent extremes. In practice, designers may often use a mixed strategy that incorporates features of both.
ARDWARE AND SOFTWARE designers have developed different DFT techniques. Hardware DIT focuses on implementation, while software DFT focuses on specification.
The application of these techniques alone does not produce satisfactory system level testability. The power of the individual DFT techniques may well need to come together in one overall DFT approach.
Modular systems, composed of multiple parts, offer no advantages to testers, unless we can test the system parts separately. Modular design should imply modular testing. The next challenge will be developing a complete d e sign process. Such a process should provide implementation-independent testability requirements in the specification, like transparent test mode, builtin self-test, and points of control and observation. We must incorporate these testability requirements in the hardware-software system implementation. This implies translating high-level testability requiremqnts onto existing or new test function implementations. e
