Abstract-This paper proposes a test coprocessor for a 32-bit MicroBlaze CPU core. A microprogrammed architecture is used to implement the coprocessor control path, offering a flexible solution that ensures a straightforward expansion of the test command set. The current version supports a set of SVF-like commands that is able to control one built-in IEEE 1149.1 boundary-scan infrastructure. The proposed test coprocessor is useful in a wide range of online test applications, and namely in the case of mission-critical embedded systems, where online fault detection and diagnosis become particularly important.
INTRODUCTION
Over 20 years of wide industry acceptance enabled the development of many test controllers for the IEEE 1149.1 Standard Test Access Port and Boundary-Scan Architecture [1] . These test controllers range from high-end automated test systems to dedicated chips that are embedded into custom solutions, and support a variety of test specification formats, from SVF (Serial Vector Format [2] ) to dedicated command sets. The later approval of IEEE 1149.6 [3] , and more recently the approval of IEEE 1149.7 [4:7] , reinforced the importance of looking into the expansion of the test commands commonly supported in the IEEE 1149.1 world, so as to fully exploit the potential offered by a much broader application domain that includes debugging and embedded instruments [5:8] .
The range of tools available to implement 1149.7 is yet very limited, which delays the identification of additional features needed by enhanced test controllers capable of handling 1149.1 / 1149.6 / 1149.7 / P1687 test programs. On the other hand, the ever increasing integration density that enables SoC / SiP / PiP, brings additional interest to the development of embedded test coprocessors [6] , tailored to the specific requirements of each system or application domain. A standard Fast Simplex Link (FSL [9] ) was used to interface the proposed coprocessor to a 32-bit MicroBlaze CPU, and the experimental prototype was implemented in a Spartan 6 Xilinx FPGA. The following section introduces the subject of 1149.x test controllers and summarises their functional requirements. The control path and the complete test coprocessor architecture are presented in section III. Section IV addresses the test command design process, from ASMD chart specification to microprogram memory content. Section V presents experimental data and provides information on performance and logic resource usage. A final section informs about future research plans, and is followed by a list of references.
II. CONTROL OF 1149.X TEST INFRASTRUCTURES
The set of boundary-scan (BS) test cells is at the core of the IEEE 1149.x standards. By controlling the operating mode of these cells, and scanning in an appropriate test pattern, it is possible to set up the logic conditions necessary to activate and detect any structural fault. The short circuit shown in figure 1 RUNTEST RUNTEST forces the target IEEE 1149.1 bus to the specified run state for a specified number of clocks (either Test Clocks or System Clocks), a specified length of time, or both, then moves the target bus to the specified end state.
As an illustrative example, the following sequence may be used to detect the short circuit shown in figure 1 To address these shortcomings, our proposed test coprocessor offers a straightforward way of expanding the test command set. In order to cope with existing and anticipated requirements of test program generation for IEEE 1149.xcompliant boards, the architecture described in this paper supports the simple IEEE 1149.1 instruction set core that is represented in table II.
Test command Description

RESET
III. CONTROL PATH AND COPROCESSOR ARCHITECTURE
Any formal representation of the functionality of the commands shown in table II assumes a given hardware architecture, bringing into evidence the corresponding control and data flow operations. The latter will determine the blocks required in the coprocessor data path, which will be implemented with regular sequential circuits. There is a higher degree of freedom in what concerns the implementation of the control path, which can be hardwired or microprogrammed, and the formal representation of each test command will be influenced by the decision of implementing the control path as either a Moore or Mealy state machine. The former will only update its outputs (the control signals to the data path) upon the rising edge of the system clock, while the latter can update its outputs at any moment. Figure 2 shows an excerpt of an ASMD chart that can be used to represent the behavior of the SHF test command. States 2 and 3 in figure 2 include conditional output boxes used to define Mealy outputs that depend on the condition indicated in the preceding decision box. On the other hand, states 2, 4 and 5 comprise non-empty state boxes used to define Moore outputs, which will be asserted while the controller remains in the corresponding state. Notice that:
⋅ State 2 contains two Moore assignments (load serializer and read FIFO), has two decision boxes, and three Mealy outputs ⋅ State 3 asserts no Moore outputs, has a single decision box, and two Mealy outputs ⋅ States 4 and 5 assert Moore outputs, and have no decision boxes / conditional outputs 
978-1-4799-5743-9/14/$31.00 ©2014 IEEE
The representation in figure 2 can also be used to estimate the execution speed, by considering the time required to move from state 2 to state 4, assuming that 1) conditions B and C hold true, and 2) condition D holds false for five times. The corresponding state transition will therefore be 2-3-5-3-5-3-5-3-5-3-5-3-4, requiring a total of 12 clock cycles.
The implementation of the same functional behavior as a Moore machine will increase the number of states, since each conditional output box will have to be converted into an equivalent state box, generating a corresponding extra state. The formal representation of the same ASMD chart, converted to a Moore machine, is shown in figure 3 . This conversion increases the number of states by a factor of 2.25 (from 4 to 9), but does not cause a proportional degradation of cost or performance, be it in number of logic gates or microprogram memory positions (spatial resources), or in number of clock cycles (speed). Considering the same initial and final states, and the same assumptions as indicated for the Mealy case, the equivalent transition will go through states 2-3-8-5-9-5-9-5-9-5-9-5-9-5-6-7, and require a total of 15 clock cycles (only 1.25 times more than the corresponding Mealy representation).
In order to compare the corresponding implementations in terms of logic resources, a choice will have to be made between a hardwired or a microprogrammed control path architecture. Our choice of a microprogrammed architecture is due to two main reasons: 1) A hardwired controller needs to be completely redesigned when a new test command is required, and the designer will have to build the HDL code for each new ASMD chart. 2) Even if major differences are not to be expected, a hardwired architecture will correspond to a new sum-of-products (or a similar canonical form) for each set of test commands, meaning that there will be variations in the critical path and maximum propagation delay. The basic control path architecture for a microprogrammed implementation is shown in figure 4 (adapted from [10] ). In this case, the most significant address bits of the microprogram memory are determined by the test command opcode (loaded into the Bank_reg latch), and the ASMD state encoding defines the least significant address bits. This architecture is best in terms of simplicity, and enables a very straightforward implementation of any test command -each state in the ASMD chart will correspond to one word in the microprogram memory, and the operation flow can simply be specified as a sequence of "Continue", "Jump" or "Branch if" microinstructions, directing the state transition from beginning to end. However, it can only implement Moore machines, since each ASMD state selects a single microprogram memory position. Since the ASMD blocks frequently contain conditional output boxes, or more than one decision box, there is a need for preprocessing the ASMD chart, in order to ensure a pure Moore behavior. The total number of states increases, and so does the number of microprogram memory positions, as well as the number of system clock cycles needed to complete the execution of the corresponding test command (one clock cycle per ASMD chart state). The general rule for preprocessing the ASMD charts consists of eliminating all conditional output boxes, and splitting the states when more than one decision box is present. In addition, and since the most significant bits come directly from the test command opcode, the number of microprogram memory positions used to implement each command is fixed. There is a waste of FPGA floorspace, since the most complex command, with the longest ASMD chart representation, will dictate the number of positions that will be used for all other commands. The control path architecture of figure 4 is able to implement any Moore ASMD chart, and would take 9 memory positions to represent the example represented in figure 3 . If this was the highest number of states in any command, we would need 4 least significant bits, meaning that the total storage requirements would be given by 2 4 *O = 16*O, where O is the number of opcodes comprised in the test command set.
If we want to enable the implementation of Mealy behaviors, the microprogram memory least significant bits will have to be driven directly from data path conditions. The main drawback of this solution is that the number of microprogram 978-1-4799-5743-9/14/$31.00 ©2014 IEEE memory positions will be equal to S*2 D , where S is the number of states, and D is the maximum number of decision boxes existing in a single state (16 memory positions for the example illustrated in figure 2) An improved solution, although with the same execution speed of the basic Moore architecture, is represented in figure  5 , and was used to implement the control path of our test coprocessor. Its main advantage over the basic Mealy and basic Moore architectures consists of eliminating the waste of memory positions, since the microcode storage requirements are in this case limited to the number of states in the ASMD chart (instead of the being fixed, and dictated by the chart with the highest number of states). Table III summarises the pros and cons of three alternatives: Moore (no conditional outputs allowed, maximum state decomposition), Mealy 1 (one decision box and its corresponding conditional output boxes per state), and Mealy 2 (up to two decision boxes and four conditional output boxes).
The specific nature of scan test infrastructures dictates that the number of required micro-operations is very small, and accordingly the data path architecture will have a small number of elements, comprising counters, latches and serializers. The conditions associated with these data path elements consist essentially of detecting if the latches and counters reached one or zero. Our microprogrammed test coprocessor architecture is illustrated in figure 6 . The data path needed to support the simplest shift operations requires basically two main types of blockscounters to keep track of the number of clock cycles required, and serializers, both for converting the parallel words coming from the MicroBlaze into the serial bitstreams that feed the board TDI (b_TDI), and also for feeding the comparator that checks if each bit coming from the board TDO (b_TDO) matches its expected value. 
IV. TEST COMMAND DESIGN
The microprogrammed control path illustrated in figure 5 is able to implement the control flow associated with any ASMD chart that is specified in the form of a Moore machine. Each state in the chart corresponds to one microprogram memory position, which comprises the following three fields: As an illustrative example, figure 7 shows an ASMD chart for the MTCK N test command, which is used for various types of built-in self-test functions (e.g. those that rely on pseudorandom pattern generation and parallel signature analysis modes). The MTCK command starts by loading the number of required b_TCK pulses (present in the FSL_S_Data bus) into one of the data path counters (cbits_cntr in this example), which will be decremented for each b_TCK pulse generated. State 1 requires two microprogram memory positions, since the decrement control signal for cbits_cntr may, or may not, be active in this state. Figure 8 represents the equivalent Moore ASMD chart, and shows that splitting a state, in order to ensure that a conditional output box is converted into a corresponding state box, does not necessarily increase the number of states in the ASMD chart. Each state in the ASMD chart represented in figure 8 now corresponds to a single microprogram memory position, and we are ready to move into the third test command design step, where the microprogram memory template is filled to represent all control and data flow operations associated to MTCK. Table  IV shows the content of the four microprogram memory positions that are needed to specify the execution of MTCK. 
V. PERFORMANCE AND LOGIC RESOURCE USAGE
The architecture represented in figure 6 was implemented in a Spartan 6 FPGA, using a Digilent Nexys™3 board, and Xilinx's ISE Design Suite System Edition. The logic resources used and timing performance data are shown in tables V and VI. The rationale behind the collection of the data shown in these two tables can be summarised as follows: ⋅ The columns showing data for each test command (the four rightmost columns) correspond to the implementation of a single command, without the FSL interface ⋅ Since the FSL interface is predesigned and independent of the proposed microprogrammed architecture, tables V and VI include two columns showing the implementation data for all commands when no FSL interface is present, and when one 5-word 32-bit FIFO FSL interface is added (from the MicroBlaze to the test coprocessor) 
