Development of a highly programmable, restructurable VLSI IC is an extremely important step toward effecting maximum impact of VLSI. Not only do programmability and flexibility provide new creative opportunities for the system designer, they help overcome two major obstacles to pervasive use of VLSI: design cost and design cycle time.
able VLSI IC is an extremely important step toward effecting maximum impact of VLSI. Not only do programmability and flexibility provide new creative opportunities for the system designer, they help overcome two major obstacles to pervasive use of VLSI: design cost and design cycle time.
Flexibility, reliability, and cost
Typically, a state-of-the-art LSI custom design consisting of 5K gates and 5K bits of read-only memory costs approximately $500 thousand and takes about 18 months to design and lay out. This comes to about $100 per gate in the design. If design costs fall an order of magnitude to $10 per gate over the next five years, a typical state-of-theart VLSI custom system consisting of 50K gates and 50K bits of read-only memory will still cost $500 thousand to design. A very flexible VLSI chip that can be restructured to provide a wide range of capabilities and state-of-the-art performance presents a viable alternative to custom designs in low-volume applications.
Reliability, testing, and maintenance considerations are extremely important in complex VLSI systems. Because poor reliability is reflected in the cost of service calls and returned products, it can, over the life of the system, incur costs greater than those of initial manufacturing. Traditionally, reliability has improved with each increase in the level of integration. However, reliability also benefits from accumulated learning, as shown in Figure 1 . A generic programmable chip, in this case the Texas Instruments TMS-1000 microcomputer, increases in reliability as more are manufactured. Although specific programmations of the chip might not be made in significant volume, each programmation benefits from improvements in the reliability of the generic device.
Custom designs, on the other hand, even those using identical processes, do not benefit significantly from the high-volume reliability learning that applies to other chips. A restructurable VLSI circuit will provide a high degree of reliability learning, even if the volume of most programmation types is small.
Overview of the RIC
The RIC is a semicustom IC that serves much the same purpose as gate arrays and masterslices. The gate-array approach allows a logic diagram to be translated into silicon through software. The software creates an interconnect pattern among the gates so that the logic diagram is implemented in silicon. The gate array approach is very flexible, but provides no special structure for implement- ing programmable digital systems. The RIC approach uses the large number ofgates on a VLSI IC to build a chip that is highly flexible in implementing programmable digital systems. The RIC approach differs from gate arrays in that it commits the vast majority of its silicon to a specific design. RIC's flexibility is achieved through the design of a programmable mechanism for controlling the hardware resources on the chip. A block diagram of the restructurable IC is shown in Figure 2 .
The RIC is a multimicrocomputer that contains four 16 
Multichip structures
Multichip configurations are used to achieve functionality and performance beyond that The data path in an MPS is 16 bits wide. It contains a
The information in the ROM is accessible to all MP dual port register file of twenty-four 16-bit-wide registers. through a shared ROM bus. Instead of a separate RO The data path contains a high-performance ALU. Two for each MPS, we use a centralized ROM because it allo, registers in the register file can be simultaneously accessed easier code sharing and flexibility in the amount of co from two 16-bit-wide buses. The data path also has a that can be dedicated to an MPS. The bus is arbitrated a hardware unit to shift, extract, and rotate data; it is called cording to a round-robin scheduling discipline. When the SERU. In addition to its application to the usual shift MPS issues a ROM access, the ROM manager buffers t and rotate operations in the data path, the SERU can be tag(s) associated with the sending MPS. Using the tag( used to extract the fields of a machine instruction as it is the ROM manager also routes the microinstructions emulated and then pass the extracted fields as parameters the appropriate MPS(s For example, in the independent mode where all four MPSs are working as four independent processors, the routing logic isolates the carry chain into four independent segments. In the internal lockstep mode, the routing logic establishes a separate carry chain for each lockstep on the chip. Figure 8 shows the carry chain when MPSs I and 2 are working in a lockstep and MPSs 0 and 3 are working as independent processors (one's complement arithmetic demands the end-around carry). Figure 9 shows the carry chain when all four MPSs 
Internal RAM
The internal RAM of the RIC is organized as four independent, byte-addressable memory modules. An NMOS RIC with a minimum geometry feature of one micron (lambda = 0.5 micron) could contain about 16K to 32K bytes of dynamic RAM. This RAM would occupy one-third to one-half of the chip area. The internal RAM subsystem of the RIC includes four independent memory modules and a data bus interconnecting the RAM to the four MPSs. The data bus is designed to support the three basic internal MPS structures (independent, lockstep and pipeline). The memory subsystem contains a memory mapper, which automatically directs memory accesses to internal locations when the data is resident internally and to external locations when it is not. The memory subsystem is illustrated in Figure 10 .
The data bus and memory modules. The data bus supports four concurrent accesses to memory, provided there is no interference between processors and memory. This bus allows a direct path from each MPS to its owr memory module. If each MPS accesses its own memory niodule, four simultaneous memory accesses can occur. If MPSs access memory modules other than their own, they can cause memory interference. The bus is shared and multiple MPSs can access the same module; this can result in queued memory requests.
As shown in Figure 10 , each memory module has a memory scheduling unit, or MSU and a bus control unit, or BCU. When an MPS accesses its own memory module, it is directly connected through its BCU to its MSU. The MSU indicates whether there are pending memory requests; itthere are none, the access occurs immediately. If accesses are pending, the MSU queues a tag indicating "hich MPS requested memory service. An MSU queues NIPS requests on a first-come-first-served basis. The MSU signals the MPS when its request reaches the head of the queue. The MPS then reissues its request, and menmory is accessed immediately.
When an MPS accesses a memory module other than its own, the BCUs are configured to make the connecting bus a shared bus, as shown in Figure 11 . The MPS first waits for access to the shared bus. The shared bus is scheduled according to a round robin by demand discipline; the first MPS or memory module gets access to the bu.s in round-robin order. After an MPS gains access to the bus, it sends the memory information and a destination tag indicating the destination memory module. The destination module sends the memory request's queue positionl on a separate bus. If 00 is sent, the memory request is processed immediately. Otherwise, the two-bit number indicates the number of pending requests ahead of the current one. Anv memory module can have, at most, four memory requests pending, since al NIPS can have only one memory request pending at a time.
Each MPS has circuitry to monitor the bus. When the memory module with a pending MPS memory access request completes a memory access, that MPS decrements the number of pending requests by one. When an MPS decrements this number to zero, it means that its request is at the head of the queue. When this MPS gains control of the shared bus, it reissues its request and the request is processed immediately.
The operations described above support the memory accesses made by independent, lockstepped, and pipelined NIPSs.
The RIC memory system. The RIC memory system supports lockstepped and pipelined MPSs. Lockstepped MPSs can make simultanenous requests to their own memory modules. It is possible for lockstepped MPSs not to receive their requests for memory service simultaneously, since the queue length at one MNPS memory module can differ from the queue length at another lockstepped MPS queue. To avoid this problem, lockstepped MPSs are synchronized. When lockstepped MPSs make a memory request, a wired-AND line, which connects all MPSs in the lockstep, is pulled low by each MPS. After an MPS has had its memory request serviced, it quits pulling this line down. When the last MPS memory request is serviced, the line rises to a logic one, indicating that the lockstep process can continue. Lockstepped processors can also access memory modules other than their own. In such a case, each lockstepped MPS issues its request when it gets access to the bus. The lockstepped memory access is synchronized as above.
In addition to accessing memory, pipelined modes send data over the data bus to other MPSs in the pipe. Figure 12 shows the BCU configuration for pipelined data transfers. In this BCU configuration, the data bus is segmented to allow all adjacent MPSs in the pipe to transfer data in parallel; this includes transfers to MPSs on different RICs.
Each memory module is addressed with a 16-bit address. This allows for eventual growth of up to 64K bytes of directly addressable space for each of four MPSs. However, an MPS supports both 16-and 32-bit addresses. Sixteen-bit addresses are used to directly access an MPS's own memory module. Thirty-two bit addresses are used to access other memory modules or external memory. In accessing other memory modules, the most significant 14 bits comprise a tag indicating that the address is for an internal memory module. The next two significant bits select one of four memory modules. The remaining 16 bits point to an address in an internal memory module. If a 32-bit address does not point to an internal memory mdoule directly, it is either an external or a mapped address, depending upon MPS control. If the address is designated as external, it is sent to the external memory interface for processing. Otherwise, it is sent to the memory mapper. The memory mapper uses an associative search to determine whether the address is internal or external. If it is internal, the associated internal address is sent to internal memory. If it is external, it is sent to the external memory interface.
External interface
The external interface for the RIC is designed to support multiple RIC configurations, interchip communication, and data-path communication between system memorv and system 1/0. Two versions of pin assignment are planned, an 82-pin version with two 16-bit data/address ports and a 114-pin version. They are the same, except that the 114-pin version has two 32-bit data/address ports. The 82-pin RIC is discussed below. Figure 13 illustrates the RIC pin assignment. There are five types of pin functions for the RIC:
* data/address, * control * interrupt, * status, and * power/clock.
The number of pins dedicated to each function group is listed in Table 1 . I)ata port. Each of the two 16-bit data/address ports has 16 bidirectional lines for carrying data and addresses. Associated with each port is a pair of handshake signals for gaining control of shared resources. The arbitration nmethod for the shared resources is either round robin by demand, master-slase, or determined by external cir- cuitry. The round-robin and master-slave arbitration methods support a shared bus, while the external arbitration circuitry supports a network with a general topology. Each port has a pair of signals for synchronizing the sending and receiving of data and addresses on the bus. Also, each port has a bidirectional set ofthree signals to indicate bus status. These three signals indicate four types of read/write operations:
* access a user-specified RIC, * access system RAM, Figure 13 . A restructurable IC pin assignment.
* access system I/O, and * access the resource whose destination address is sent at the beginning of the access. Finally, each port has a bidirectional pair of MPS tag identifiers. These are used to indicate the source MPS at the sender and/or the destination MPS at the receiver. The two data/address ports are independent. However, they can be combined into one port by internally performing the same operation to both ports concurrently and externally treating the two as one.
Status port. There are two identical status ports. Their main function is to provide the signal to lockstep two MPSs on different RICs. Status port 1 can be used to lockstep MPS 0, MPS 1, or to lockstep MPSs 1 and 0 to external MPSs. Status port 2 can be used to lockstep MPS 2, MPS 3, or internal locksteps, including MPS 2 and/or MPS 3 to external MPSs. In Figure 6, Interrupt port. The interrupt port serves two purposes. First, it receives and processes interrupts in a manner similar to that of conventional microcomputers and microprocessors. Second, it provides for interchip communication. The interrupt concept has been generalized to include the capability of sending interrupts to other receivers, thus providing for interchip communication.
The purpose of interchip communication is to coordinate RICs to carry out a task, to initiate a task, and to transfer information. The interchip communication system is used to transmit commands and/or small amounts of data. The bulk data part of an information transfer is communicated between memories. For example, a disk read operation is initiated by using the interrupt port of a RIC to send commands to a disk controller. The data transfer between the disk system and the memory system is accomplished on a separate data path. The interrupt port contains pins for arbitration, information, and data transfer synchronization.
The interrupt port of the RIC has eight pins. Two are for arbitration of shared resources used during the sending of an interrupt. The same three arbitration modes used for the data ports apply to the interrupt port: round robin, master-slave, and general arbitration.
Four pins of the interrupt port are dedicated to data transfer. The data protocol has minimal specification and maximal user definition. In the interchip communication mode, the first information sent on these pins is an address. The length of the address is designated by the user.
When an interrupt is sent, all chips on a common interrupt bus receive and store the address. The status signals indicate whether the information lines carry address or data. The receiver buffers the address portion as long as the status indicates address bits are being sent. After the destination address has been sent, each receiver uses the address to access a bit in the chip's RAM, to determine whether this chip is an intended receiver of the interrupt. In the conventional interrupt scheme, the first informa- The major goal of the RIC project is to create a highly flexible part that can be used to form a wide variety of specific hardware designs through programmation. The designed-in flexibility of the RIC provides for this programmation. RIC's flexibility features include user-definable microlanguage and assembly language, user-programmable microcode, dynamic coordination of multiple internal processors, coordination of processors on multiple RICs, internal memory that can be used either as a cache or as an element ofa virtual memory hierarchy, general topology for interchip communication and external data paths, and a user-definable interrupt mechanism.
A single RIC has been programmed to implement a VAX-I 1/780 instruction set processor. The 
