# A PRACTICAL WSI EXPERIMENTAL PROGRAMME

Ian.P.Jalowiecki<sup>1</sup>, Stephen.J.Hedge<sup>2</sup> and R.M.Lea<sup>3</sup>

## Introduction

Í

At Brunel University, research has been underway for several years to assess the architectural, electrical and physical benefits and constraints of the WASP wafer-scale Associative String Processor (ASP). This is intended to implement a massively parallel processor entirely within the constraints of WSI.

WASP 1 and WASP 2 were the technology demonstrators of the UK funded Alvey programme (starting 1984), researching fundamental design methodologies for WSI. They are both examples of the Associative String Processor (ASP) architecture, developed by Brunel University. Further demonstrators are currently funded by a 3½-year US ONR IS&T programme (starting 1987), involving both further technology demonstration, applications research and fundamental packaging and manufacturing design issues.



## ASP Architecture

A generic ASP module. c o m p r i s i n g communicating ASP substrings, each comprising an ASP Data Buffer (ADB), and an ASP Control Unit (ACU), as shown in Figure 1.

Each ASP substring is an SIMD parallel processing structure, comprising a string of identical APEs (Associative Processing Elements), as shown in Figure 2. Each APE incorporates a 32-bit data register, a 5-bit activity register, and a

Figure 1 ASP module

(32+5)-bit parallel comparator. Each APE also includes a single-bit full-adder, 4 status flags and logic for communicating with other APEs via an Inter-APE Communication Network. All APEs in a substring share common bit-parallel Data, Activity and Control busses and a single feedback line (Match Reply, MR). The ASP is based on content-matching, thus APEs are selected by comparing their registers with the states of the corresponding Data and Activity busses. Data I/O is supported by the Vector Data Buffer, a dual-port memory which has a bit-parallel interface to the ADB via the Secondary Data Exchange (SDX) port, and which can perform bit-serial exchanges all APE data registers via the Primary Data Exchange (PDX) port.

Since the ASP comprises a long string of simply linked, small, identical content-addressable APEs, the ASP structure is highly amenable to defect/fault-tolerance, by simply adding APEs to the end of the

<sup>&</sup>lt;sup>1</sup> Brunel University, Uxbridge, Middlesex, UB8 3PH, UK

<sup>&</sup>lt;sup>2</sup> Aspex Microsystems Ltd., Brunel University, Uxbridge, Middlesex, UB8 3PH, UK

<sup>&</sup>lt;sup>3</sup> Brunel University, Uxbridge, Middlesex, UB8 3PH, UK



required ASP substring length and by-passing faulty APE-blocks. Furthermore, the W A S P m od u l e inter-connection strategy offers h i e r a r c h i c a l defect/fault-tolerance by selective by-passing of faulty APE-blocks, faulty groups-of-blocks (i.e. a 'chip') or faulty substrings.

# WASP Architecture

As indicated in Figure 3, a WASP device is physically composed from 3 different VLSI sized blocks known as Data Routers (DRs), ASP substrings and

Figure 2 ASP substring schematic

Control Routers (CRs). The DR and CR blocks incorporate routing to connect ASP substring rows to a common Data Interface (DI) and a common Control Interface (CI) respectively. Moreover, both these blocks incorporate LKL and LKR ports to effect row-to-row extension of ASP substrings.

#### WASP 1

A fundamental demonstration of this class of wafer-scale device was successfully made by WASP1, fabricated in 3Q88. This comprised individual ASP substrings, each with four ASP modules and a dedicated CR/CI module.

Manufacturing methods were based on standard fabrication technology, involving the use of standard steppers and VLSI die masks. Indeed, the WASP 1 & 2 demonstrators employ only one stepper reticle (i.e. maximum 13mm x 13mm) which is subdivided to achieve cost-effectiveness by manufacturing all blocks through selective exposure of a single reticle. Wafer fabrication is therefore based upon the selective exposure of shuttered portions of the reticle by the Canon FPA-1550 stepper.

Experiments carried out on this demonstrator included

- 1. zoned clock and signal distribution
- 2. selective power isolation of modules failing through short-circuits
- 3. selective ASP module isolation and bypass (inter-module fault-tolerance)
- 4. selective bypassing of APE blocks (intramodule fault-tolerance)



Figure 3 Generic WASP device floorplan

After extensive testing, this method of DSW (Direct-Step-on-Wafer) reticle "stitching" (to interconnect ASP substring and CR/CI blocks) was fully proven. In addition, defect/fault-tolerance within and between ASP substring blocks and selective power isolation were successfully demonstrated, as was the Wafer Scale clock and signal distribution across the ~4cm devices.

### WASP 2

Two WASP2 variants have been fabricated, based on the successful ASP block from WASP1. These are described below and detailed in Table I.

### 1. WASP2A

WASP2A integrates 864 APEs in 6 substrings, each with four ASP blocks, a DR incorporating an ADB buffer memory, and a new CR/CI design. This was implemented as a less than full-wafer device as a safe intermediate to a whole wafer WASP. Four devices are on the wafer, with the remaining area occupied by test chips.

### 2. WASP2B

The WASP2B composition is representative of a whole wafer WASP device, and comprises an array of 180 APP devices, on a 6 inch wafer. This device tests some of the fundamental issues of whole wafer monolithic integration, especially signal and power distribution.

These variants are fabricated in relaxed 2-micron design rules at Plessey Roborough. WASP 2A completed fabrication in 1Q90 whilst WASP 2B completed fabrication in 3Q 90.

#### WASP development

As (monolithic) WSI technology demonstrators, WASP 1 and 2 have been highly successful. However, as functional WASP prototypes, they leave much to be desired. Indeed, WASP 1 and 2 provide only a partial implementation of ASP Moreover, substrings. budgetary restrictions constrained designs to representative rather than realistic 'chip' blocks, suitable only for proof-ofprinciple testing.

**Table I** Characteristics of the WASP 2a and WASP2b Wafer

 Scale Integration demonstrators

|                | WASP 2A    | WASP 2B    |
|----------------|------------|------------|
| #.APEs         | 864        | 6480       |
| area           | 3.9cm x    | 9.1cm x    |
| transistors    | 3.9cm      | 9.8cm      |
| power          | 1.26M      | 8.43M      |
| external       | 5.2 - 8.0W | 29.6-51.2W |
| clock          | 6MHz       | 6MHz       |
| internal clock | 12MHz      | 12MHz      |

Currently, a 3-phase prototype development of the 58mm x 58mm 15.360-APE WASP device is scheduled for 1991 through 1992. This project, funded under a US SDIO-IST contract, involves the design, fabrication and evaluation of

WASP 3: constituting a bold step towards a full implementation of the ASP substring, with 320-APE ASP substring blocks, but with the DI and CI blocks designed only to facilitate testing and evaluation of ASP substring rows

- WASP 4 : consolidating ASP substring blocks and incorporating full CI and DI blocks
- WASP 5 : consolidating and debugging WASP 4 to deliver a definitive WASP.

The WASP3 ASP substring block design is now well advanced, based entirely on the principles pioneered in the early WASP devices. Special emphasis is being placed on the implementation of power and, especially, signal distribution. Novel methods are being investigated to enhance the reliability of large area bus structures, based on the experience of WASP2.

#### Conclusions

WASP 1 experimental results have confirmed the feasibility of the basic architecture and design methodology. Furthermore, the WASP 2 development represents an expansion of the scope of the original programme, with the originally planned WASP 2A ULSI device being augmented by a full-wafer WASP 2B variant, assembled from the same reticles. Testing of the WASP2A has determined that it achieves most of its major objectives. Continued development is underway on the latest demonstrator in the series, WASP3.