Abstract-The Mod 2 Neurocomputer, the latest in a series of neurocomputing systems at the Naval Air Warfare Center Weapons Division, is a neural network processing system incorporating individual neural networks as subsystems in a layered hierarchical architecture. The Mod 2 is designed to support parallel processing of image data at sensor (real-time) rates. Basic concepts implemented in the Mod 2 are (1) maintaining data representations as frames of data processed as a whole at each layer, (2) a general interconnect design supporting data transfer requirements such as generation of parallel pathways, fan-up/fandown, and feedforward and feedback, and (3) a neuroprocessing block supporting several neural network paradigms. The basis for the system implementation is the INTEL, 80170NX neural network processor. Examples are given for the implementation strategy for neural substructures such as the multilayer perceptron and temporal and spatiotemporal image processing, as well as the implementation of a multifunction processing system.
I. INTRODUCTION
A . Overview HIS paper describes the architecture and system design of T the Mod 2 Neural Network Processor, which is being built at the Naval Weapons Center. The Mod 2 has been designed to be a neural network processing system, incorporating separate individual neural networks as subsystems in a flexible, modular architecture. Both the individual neural network blocks and the system blocks integrate into a layered hierarchical structure.
The basic structure is a hierarchy of locally densely connected, globally sparsely connected networks. The locally densely connected network is implemented in a modularblock structure based upon the INTEL 80170NX (ETANN) neural network chip. The building blocks of the system are the "hardware layers." Each hardware layer is based upon one INTEL 801 70NX (ETANN) neural network processor. Each hardware layer is designed to process data at a rate of 1.2 G-int/s rate and incorporates 384K of data memory. Fig.  1 shows an overview of the Mod 2 system. The Mod 2 is designed to be, in concept, infinitely extensible. However, the Mod 2 initial implementation is being built with 12 "hardware layers," each of which will comprise one or more neural layers. Physically, they will be arranged into groups of six hardware layers.
An individual hardware layer is designed to support any of several neural network paradigms. The basic paradigm is a recurrent variation on the multilayer perceptron, incorporating temporal feedback at each layer. The option exists to implement this recurrence as single-layer (local) feedback or as external information feed from higher levels of processing. Manuscript received July 11, 1901; revised October 24, 1991 . The authors are with the Naval Weapons Center, Chma Lake, CA 93555. IEEE Log Number 9106979.
Also, iterative networks such as Hopfield networks can be implemented in a clocked (discretely iterated) fashion.
B. Background
The application for the Mod 2 processor is to perform real-time demonstrations of image processing using neural network technology. As an image signal processing system, the Mod 2 is designed to incorporate multiple interacting neural networks as subsystems within a block/functionally defined structure. The intended application requires that the processor be capable of real-time processing of incoming data and that it also minimize factors such as transport delay.
The Mod 2 is the fourth generation of Naval Weapons Center neural network processor developments. This computational system is the first to incorporate a system development architecture, such as described in this paper.
The motivation for using neural network approaches is the biologically inspired features incorporated in the nonlinear processing paradigms with distributed information representations. The parallel nature of these concepts severely limits the capabilities of conventional (von Neumann) processing designs to process the image data in real time. Even in very fast general-purpose computers, simulations of the processing are typically very slow. The Mod 2 has been designed to implement this very powerful nonlinear pattern processing paradigm in a very fast, flexible laboratory test-bed.
The basic inspiration for the system architecture of the Mod 2 is derived from studies of neurobiological systems. Many studies have shown that such disparate systems as the visual [1]-[3], the olfactory [4], and the auditory (see [5] for a good overview) are arranged as hierarchical layered systems that are locally densely connected but globally sparsely connected.
In the vision system, the basic features are local processing to extract features and structures, building to higher level representations which have large fields of view through the progressive fan-in of succeeding layers. It is the structure of these types of biological systems that has inspired the Mod 2 system architecture.
C. Development Goals
The objective of the Mod 2 development is to build a flexible, reconfigurable, fast general-purpose neural processor which can be easily tested in an interactive manner and integrated with an imaging sensor for real-time integrated processing demonstrations. In the past there have been many hardware component developments [6]-[8] but relatively few computational system developments [9], [lo] for neural network systems. One of the design goals of the Mod 2 system is flexibility. It is based upon the INTEL E T A " [6] .
The Mod 2 has been designed primarily as a processing system. The interconnect structure and weight matrices for the system will be defined through off-line analysis and computation. The capability has been designed into the system to do on-line learning, although that is not a primary mode.
Flexibility in the architecture was important, since the image processing development is being done concurrently and interactively with the Mod 2 development. The Mod 2 development was defined so that the supportable structures match generically, while specifics are being refined in the simulation. The Mod 2 must also be flexible to support integration with sensor systems having various data structures and formats.
At this stage in the development, the Mod 2 is designed to be a laboratory test platform. It is not designed to fit the very stringent form factor constraints of a missile processor.
D. Design Philosophy
The Mod 2 has been designed to support processor systems engineering in a very natural manner. The requirements allocation portion of the systems engineering process is an iterative top-down/bottom-up process to ensure that the functional requirements (top-down) can be allocated onto the hardware and that the hardware (bottoms-up) can support the allocated functional requirements. The modular hierarchical structure of the Mod 2 naturally supports the block allocation of the processing requirements onto subsystems. The design engineer is required to functionally define a subsystem as a functional block. Then the Mod 2 supports this functional block as a unit, defined by the inputs, outputs, and throughput time allowance. A functional block may itself consist of one or more separate neural networks.
The systems engineering procedure assumes that a separate simulation (or emulation) of the neuroprocessing design will exist in a general-purpose computer system. The simulation would proceed in a parallel development with the Mod 2 subsystem block definition and allocation process. The simulation would be expected to migrate onto the Mod 2 as the subsystem blocks are matured, passing through a hybrid simulation/processor-in-the-loop stage. This hybrid stage can be very useful for development, since for large processing problems it has been found that even on very fast generalpurpose computers the neural network system computational speed is often painfully slow.
Another design goal of the Mod 2 is flexibility. It is designed to be modular and expandable, with the processing hardware expandable in cages of up to six hardware layers. Within a physical card cage, the layers can be configured or partitioned in a relatively arbitrary fashion. Each layer offers different neural network paradigms, essentially based upon the perceptron-type model. It is important to note that the memory in each hardware layer directly supports spatiotemporal processing, especially important for processing images which evolve dynamically over time.
E. System Requirements
The qualitative system goal was to develop a processing system, composed of layers of processing hardware operating synchronously, with relatively arbitrary interconnect structures. It was required that the processor be capable of operating at real-time speeds synchronously with a sensor and accommodating the sensor data frame size.
Structurally, the system requirement is that the architecture support synchronous operation, processing imaging sensor data at the frame rate using a variety of architectural structures. Specific interconnect structures are defined during specific system development and are expected to remain constant during operation.
At the lowest level of the architecture, the hardware layer, the design requirements were driven primarily by the application to image processing. In these applications, the requirement was to process local information in the scene, with the same local processing being applied across the scene. More global representations were to be built up through the hierarchical structure.
Quantitatively, it was required that a hardware layer, the smallest level of processing blocks, be able to process up to a 64K data frame in 6.25 ms, where the data are of 8-bit accuracy. Local processing within a hardware layer is supported by the neuroprocessor device accepting a 64-or 128-wide input vector, with a 64-wide maximum output vector, while maintaining 8-bit accuracy. The neuroprocessor device must be capable of processing the 64-or 128-wide input vector in 6.5 ps, resulting in a 1.2 G-int/s rate. System throughput is determined by the number of layers, the frame rate, and any special requirements of a specific layer. The basic requirement is that each hardware layer accommodate up to a 64K data frame within the synchronous frame time allowance. Noise and accuracy are designed to be determined by the performance of the neuroprocessing device.
Analysis of the application requirements showed that each hardware layer required two separate input channels, local memory, and the capability to fan up or fan down the data frame size. Two typical uses of the separate input channels are 1) for direct and feedback inputs, and 2) for current and previous frame data. Local memory was required so that output data could be stored for integration with the next data frame, forming a spatiotemporal processor. The application requirements analysis also showed that several generic interconnect structures would be required. These include feedfonvardtfeedback, fan uptfan down, branching into new channels, and combination of multiple inputs. Feedback (or recurrence) is considered here more in the context of signal processing or control systems. It is found in biological systems and is important certain neural network paradigms [ 121-[ 141.
BASIC CONCEPTS
The basic concept of the Mod 2 is to implement a blockor subsystem-defined processing system. Three features which are key to understanding the Mod 2 are frame-based processing, a general interconnect structure, and a neuroprocessing block (Fig. 2) .
The frame-based concept is that the basic information representation is as a frame or matrix of data, and that all processing is completed upon a frame of data before any new data are accepted and before passing the completed results to the next layer or block. This concept is application-driven since image processing in real time requires a continuous pipelined processing approach.
The general interconnect structure concept is that, ideally, any layer can pass data to any other layer, including itself. This means that features such as fan-up/fan-down, branchindfeedforward, channel combination, multilayer neural structures, and both local and global feedback can be supported. The multiple interconnects allow the feedforward and feedback interactions of multiple basic neural networks.
The neuroprocessing block concept is that the hardware layers can be configured into distinct subsystem blocks, which incorporate neuroprocessing, memory, input/output, etc. Each block functions as an autonomous unit, so that (once developed) it may be considered solely in terms of its inputs and outputs. The smallest subdivision is the hardware layer, which itself can be a subsystem block. A single hardware layer has a direct input for current-frame data, a recursive input for alternative or feedback information, a local memory for last-output feedback, and a neuroprocessing subsystem. 
IMPLEMENTATION

A. Physical Configuration
An overview of the physical configuration is shown in Fig.  3 . The primary construction goal has been to build a laboratory processing system having maximum flexibility for functional demonstrations, as opposed to a specific design suitable to a specific weapon system. The neuroprocessing hardware layers are physically installed in standard VME-backplane card cages. Each physical cage is designed to hold up to six hardware layers, plus the necessary bus repeaters. The bus repeaters allow passing up to four LBUSes from one cage to another. Fig. 4 shows a more detailed view of the layer functional interaction.
The host computer handles setting up the system, communicating with the hardware layers over the VME-bus portion of the backplane (however, the host never interfaces directly to an LBUS), and monitoring system operations.
B. Functional Implementation
1) Frame-Based Processing:
The function to be accomplished in order to implement frame-based processing is synchronization of computation and data transfer. These represent separate functional subblocks within a hardware layer. Within the hardware layer, computation is triggered by a flag stating that the input and output processes are completed. Simi- larly, when computation has been completed on a frame of data, the input/output processes are triggered. (Note that when a neuroprocessing block consists of several hardware layers, the designer must consider and evaluate the inherent transport delay proportional to the number of hardware layers).
This concept places a speed/margin requirement on all communication devices so that all information transfers can be completed within the frame time. This gives a lower bound on frame time, as well as an upper bound on propagation delay. Efficient implementation of this concept also implies a compute-while-pass data subarchitecture.
2) Interconnect Structure: The basic function of the interconnect structure is to provide a flexible and fast data frame transfer capability. Flexibility is achieved by having both multiple physical pathways and a data framehubframe structure. Fig. 5 shows the basic interconnect functional requirements that the structure must support. The interconnect structure is implemented through a parallel set of local buses (LBUSes). On any LBUS, only one device is a transmitter, while any receiver on the bus can receive the data available. Each hardware layer has two primary inputs, to add further flexibility.
The data frame structure has been designed so that the transfer time available can support the transmission of four subframes, from different transmission paths. Each transmitter sends out a complete data frame. On a receiver site, it can select any of the four subframes.
3) Neuroprocessing Block: The function of a neuroprocessing block is to receive (single or multiple) inputs, transmit output data, and support any of several neuroprocessing paradigms. The smallest subdivision is the hardware layer, which has input and output and local memory and supports neuroprocessing paradigms as shown in Fig. 6 .
Since the neuroprocessing "engine" (the INTEL 80170NX) accepts inputs as 64-or 128-wide data vectors, there is a function in the hardware layer known as tiling. Tiling is the process of breaking the input frame down into smaller vectors which the neuroprocessor can handle. The neuroprocessor functions locally over portions of the data frame. Global representations may be built up through the layered structure. 
C. Interconnect (LBUS) Structure
There are two bus systems in the complete Mod 2, the VME bus and the local bus (LBUS) system. Existing on the reserved portion of the VME backplane, the VME bus is used only for setup and monitoring, and will not be discussed in this section. The primary interlayer communication pathways are formed over the LBUSes, of which there are ten parallel buses in each cage.
The LBUS interconnect system is implemented on the userdefined portion of the VME backplane. The local bus system provides the flexible real-time interconnect structure. During real-time operations, only the LBUS system is used for data interconnections.
Each card cage can hold up to six hardware layers, so there are six local LBUSes, plus four global LBUSes. The local and global LBUSes are distinguished by the fact that, within a cage, a given hardware layer can transmit only over a local LBUS. A single layer can transmit on only one LBUS, while it can receive on any of the ten. The global LBUSes are used to bring data from outside a given card cage. Fig. 7 shows a schematic of the local bus interconnection concept.
At the level of architectural definition, the LBUS design is a data-transfer procedural definition, independent of whether it is a digital or an analog implementation. The allocated design goal is to transfer a 64K frame in 3.28 ms, which gives adequate margin over the basic design requirement of 6.25 ms. Each bus is a dedicated, nonarbitrated data bus with a fixed message format, called the frame/quadrant structure. A frame of data is a 64K data block, which is subdivided into four 16K data quadrants.
For the Mod 2, the LBUS structure was chosen to be implemented as a set of digital buses. Each LBUS operates as a bit-parallel word-serial transfer, 8 bits wide, at a 20 Mbyte/s rate. This design was chosen over an analog bus system in order to both provide the needed flexibility and minimize system noise on the buses. The design trade-off was that this approach increased the complexity of the hardware interface design.
In a given hardware layer, the output data transmitter sends out the entire 64K data block. A receiver on another given hardware layer is set up to bring in any 16K quadrant from any LBUS, local or global. For example, in its simplest setting a receiver can pick up the entire 64K data block by picking up each of the four quadrants in sequence. Alternatively, it could be set to bring in four separate quadrants from four separate LBUSes. When the transfer is complete, the frame computation is allowed to begin.
In the context of a neuroprocessing block, this structure will support either interblock or intrablock transfers. It is the responsibility of the system design engineer to ensure that the transfers between hardware layers within a single block are not picked up by another hardware layer outside that block, since a local LBUS extends the width of the cage. The natural boundary at the end of the cage limits the distance an LBUS extends, thus providing flexibility in reassigning LBUSes to alternative uses when passing to another cage.
To pass data to another cage, a bus repeater can select any local LBUS to be repeated onto the global LBUS input lines of another cage.
D. Neuroprocessing Block
A neuroprocessing block is implemented through combinations of one or more hardware layers. Communication between hardware layers within a block is accomplished using the LBUS system. The LBUS system also supports the block external input and outputs.
E. Hardware Layer
In the Mod 2, a hardware layer is a neuroprocessing layer incorporating a single 80170NX ETANN. Schematically, Fig.  8 shows the modes that the ETANN is capable of supporting. Since an ETANN computes rapidly but has only 64 inputs (versus the 64K bytes defined in a data frame), the hardware layer performs the operation of tiling across the data space. The image processing operations required are all local operations, which fits this approach nicely. The tile shape and offset are all user selectable.
A hardware layer is physically composed of two circuit boards, known as the DATA and the ANN boards. The DATA board is in control, handles all interfacing to the external world, and contains the memories and reformatters. The ANN board accomplishes the neural processing. The ANN board is the only place where analog signals are resident; it converts inputs from digital into analog, and the outputs back again. It also incorporates the chip control logic. Within this architecture, the ANN board interfaces solely with the DATA board.
The major subsections of a hardware layer were shown in Fig. 4 . These subsections include input, output, local memory, control, and neuroprocessing.
The hardware layer incorporates two basic input channels, labeled "direct" and "recursive" inputs, both capable of receiving data frames from the LBUS structure. The "direct" input interfaces to the upper 64 inputs of the ETANN, while the "recursive" input interfaces to the lower 64 inputs of the ETANN. The recursive input can also receive its input directly from the output of the ETANN. Each of these inputs is buffered through a memory section, the direct and the recursive memory, respectively. Output from the neuroprocessor is available both to the recursive memory and to the output memory. Each memory is a 64K dual-bank design, so that data are not overwritten until the processing frame is completed. While a hardware layer is computing, the alternate bank of input memories can be filled and the alternate bank of output memory can be sent out. Upon completion of the computation process, the banks switch and the computation process starts.
To support the tiling operation, there are three reformatters, one each for input, recursive, and output memories (Fig. 9 shows examples of data reformatting). In each case, the data window is a 64-element data window of arbitrary shape. That data window can be tiled across the 64K data space in slide values (vertical and horizontal) of 1-8. The typical expected window is a square 8 x 8 window, with a typical tile slide-by of 4. However, each data window can be different, except that if the output is simultaneously written to output and recursive memories, it must be written with the same window. Another common output window is the separation window, where subsections of the chip doing different tasks are sent to separate sections of the output memory.
Data input and output may occur over either the VME (host) bus or over one of the LBUSes. Owing to its more general nature, the VME bus is slower than the LBUS; consequently it is not used as a real-time pathway. However, the VME side of the interface can be used for input or output to a hardware layer or subsystem block. Within a hardware layer, the LBUS interface has one frame-sized transmitter and eight quadrantsized receivers (four for direct and four for recursive inputs). The transmitter sends out a full 64K data frame. Each receiver is independently assigned to one LBUS and one quadrant. The basic neuroprocessing device is the commercially available INTEL 80170NX ETANN. This device is an analog neural processor featuring 64 or 128 inputs, 64 outputs, and 10240 weights (including the bias weights). The slope of the sigmoid function is user selectable. The chip features a forward propagation time of 3.6 ps, giving a potential capability of 2.27 G-intis. Since is was decided early in the Mod 2 development that is would be extremely difficult to develop a general, lownoise analog data interconnect structure, it was decided to interface digitally to the 80170NX. This was a decision of expediency: it gives great flexibility, but limits the ability of the processor to be physically shrunk as would be required for embedded weapons applications.
The ETANN was used since it is commercially available and meets the application requirements. The architecture can support the use of other neuroprocessing devices of comparable capacity; it is planned to do so when such devices become available.
F. HostiMonitor
The host computer handles setting up the system and communicating with the hardware layers over the VME-bus portion of the backplane. Note that the host never interfaces directly to an LBUS, which is intended solely for interlayer transfers. The basic purposes of the host are: a) during development, it serves as an interactive platform generating the initial data and gatheringlmonitoring results; b) during real-time operation with an external imaging sensor, it is basically a monitor. To accomplish this monitoring function, a limited number of hardware layers can be commanded to copy their outputs onto the VME bus, so that it can be read back into the host for evaluation, monitoring, or display. The limitation on the number of simultaneous output frames read back is set by the bandwidth limitation of the VME bus.
G. External Interfaces
There are two basic ways of using the Mod 2 processing system: non-real-time, with the host computer providing the input and accepting the output; and real-time, with an external sensoriseeker for input and output.
For non-real-time processor-in-the-loop testing, the Mod 2 is operated in conjunction with the host. Here, the Mod 2 is used to replace portions of the processing simulation. (The processing simulation has been developed on another computer. It is transported over to the host for this type of testing.) Input data sources can be externally provided to the host, or internally generated (synthesized).
With an external sensor, the sensor output is placed upon a global LBUS via a dedicated hardware interface. Then the hardware layers designated as input layers are set up to accept data from that bus. Similarly, data output from the Mod 2 is picked up from the LBUS system via another dedicated hardware interface. 
H. Training
The basic approach we are taking is that the Mod 2 is primarily a processor system. In this basic approach, training is to be done off-line. Typically, training is done with the processing simulation, generating a weight set. This weight set is then down-loaded into the 80170NX in a hardware layer. If needed, fine-tuning is done through further training, using the on-line training capability.
To load a weight set or to conduct on-line training, the normal procedure is to use another separate computational system, the Chip Characterization Station. However, if desired, another hardware board can be installed into the Mod 2 cage. Called the TestSet Board, this unit handles all functions associated with weight loading and modification for the ETA".
Typically, off-line training is done using various versions of back-propagation [ 151, [16] to generate a weight set. On-line training is conducted through the use of the analog-weightoriented Madaline I11 algorithm [17] .
I. System IntegrationlOperation
System integration is equivalent to structural definition and programming in conventional computer and software system development. At the functional level with the Mod 2, system integration is setting up control and monitoring, external interfaces, interconnects, and the neuroprocessing layers.
The control function is accomplished by the host computer working over a controllmonitor bus (the VME bus). During real-time operations with external sensors, the host basically functions to set up the Mod 2 and monitor its operations.
Monitoring is accomplished by trapping error flags from the hardware and by transferring specified output data frames to the host.
To integrate the interconnect structure, there are two basic functions to be accomplished: setting the physical pathways and defining the data frames. Data frames are defined by prerun control settings from the host to the hardware layers. The physical pathways are set by a physical connection on the DATA board between the LBUS inputs and the bus receivers.
In system integration, the hardware layer is set up by having its control functions defined from the host during initialization and by having its neuroprocessor weights set. Weights are directly loaded onto the neuroprocessing chip from the external simulation. Fig. 10 shows an implementation currently being developed, in which a system which is defined by neural network subsystems is allocated onto the Mod 2. The system is a seeker image processing system, and has several functionally different tasks being performed in parallel. On the left side of Fig. 10 is shown the functional block diagram, while on the right side is shown the allocation into hardware layers and the interconnection assignments. Each of these layers operates synchronously in parallel on the data frames from the sensor; because of the different number of hardware layers within each pathway, the controller must be designed to accommodate different latencies of each path. This has not been a problem to date, since the usages of the parallel pathway outputs are different.
IV. EXAMPLES
A. Block Allocation into Subsystems
B. Single-Layer Perceptron
The simplest structure to be implemented as a hardware layer is the single-layer perceptron, as shown in Fig. 11(a) . In the example shown, it is desired to extract four features at each pixel location in the data frame. From the 16K input frame, the local processing design extracts four features at each of the central 16 pixels of an 8 x 8 pixel window. Consequently, the input window is moved across the 16K input frame in increments of four pixels ("slide by 4"). The outputs are stored by the reformatter into the four quadrants of the output memory, so that each 16K quadrant represents detection of its respective feature at that pixel location. Note that in this example, the output has fanned up to four times the size of the input. Since the details of how to tile the window can be rather involved, but are of limited interest in the architectural description, they will be omitted for the balance of these examples. Fig. l l ( b ) shows an example of the combining of two input frames. In this case, two 16K input frames from different sources are inserted into two different quadrants of input memory. The reformatter divides the 64-input window into two 32-unit subwindows reading from each section, so that the network can combine the two data frames.
C. Multilayer Perceptron
A multilayer perceptron can be implemented several ways in the Mod 2 (Fig. 12) . The simplest is a mapping in which each hardware layer implements one neural network layer, in a very conventional concept. Any network which can be represented as up to a 128-input, 64-output network can be implemented. The most efficient implementation is for a fully connected network. This can be very inefficient in terms of hardware usage if the network fans down fast, such as 641412.
There are at least two ways to implement multiple neural layers on a single hardware layer. The simplest is to use the two layers inside the E T A " to have two 64-inl64-out fully connected layers.
A second way to implement multiple layers is to iterate in the second layer of the chip, effectively synthesizing several sequentially smaller layers. It can be seen that necessarily each layer in the iteration must be somewhat smaller than the width of the basic chip. Intermediate-result neurons in the output data space are ignored. As long as the intermediate and final numbers of neurons total no more than 64, this will fit into the ETANN.
Because the processing is synchronous with the frame rate, multiple hardware layers can be combined into a single block, which can be conceptualized as a single neural network. By defining the LBUS structure so that input and output are taken only from the LBUSes shown, the multiple layers are used by the system as a single layer.
D. Spatiotemporal Processing
As shown in Fig. 13(a) , this example represents a network designed to process two different frames of data spatially and to combine the two frames (temporal processing). The specific application for this particular network is to evaluate feature change between the frames. The current frame is stored in the input memory, while the last frame is stored in the recursive memory. By bringing into the network processor the same 64 elements from each frame (into the direct and recursive inputs), the network processes the difference information between them spatially and temporally.
Another spatiotemporal example is shown in Fig. 13(b which a simple pixel-level feedback in incorporated in addition to the spatial processing of the network. In this example, the direct input to the network performs spatial processing, while the recursive input is used to bring in feedback from the previous output frame.
E. Image Processing
Each of the above examples has been drawn from image processing applications. The single-layer perceptron example has been used for spatial pattern detection and for spatial smoothing of image data. The spatiotemporal processing examples have been used for motion (e.g., change) detection and for motion detection in the presence of system noise. The multilayer synthesized example is being used also for spatial pattern detection. Spatiotemporal pattern detection has been accomplished by structures defined by multilayer versions of the spatiotemporal networks.
V. SUMMARY/STATUS
This paper has described the design concepts embodied in the Mod 2 neuroprocessor system. The Mod 2 is a processing system in which neural networks are used as subsystems, in an architecture which supports the flexible and efficient interaction of a hierarchy of neural subsystems. At the time of this writing, two hardware layers are operational, and 12 more are in the process of being built.
The Mod 2 is limited by the basic constraints under which it was developed: (1) application to real-time image processing tasks, ( 2 ) definition as a laboratory system from which a weapon-size embedded system can be derived, and (3) usage of commercially available neuroprocessingjother chips. The real-time constraint limits our interest in such iterative paradigms as Hopfield networks and simulated annealing. The laboratory test-bed condition forces the Mod 2 to be designed to maximize the flexibility of computational and interconnection pathways, which extracts a price in power and space used. Usage of commercially available chips limits the paradigms which can be implemented, making winnertake-all structures, multiplicative structures, and higher-order correlation structures difficult or impossible to implement.
The Mod 2 is also limited by a number of constraints imposed by the practical considerations of hardware development. The frame data space was limited to 64K bytes, which limits the potential for direct fan-up of information output (if fan-up beyond 64K is required, the alternative is to spawn new parallel pathways). The number of LBUSes within a card cage was limited by the number of available backplane wires -at this point not a serious limitation.
Future directions for the Mod 2 first involve extensive field testing with realistic sensor systems. Some of the paradigmatic limits mentioned above would be fruitful to implement. Most of the planned development is intended to make it easier to use, including hardware changes to the LBUS control structure, and a revised system software design to allow easier interaction between the host and the neuroprocessor. Following service in the Air Force working on the development of advanced missile seeker systems, he joined the Naval Weapons Center, China Lake, CA. There he has worked on control theory and systems design, on missile and weapon system design and analysis, and in signal processing development. Currently, he is the project manager of the MissileBorne Neural Network Demonstration project. His resear& interests include advanced nonlinear signal processing techniques, advanced parallel computer systems, and the development of advanced missile seeker systems.
Mr. Mumford is a member of Sigma Xi, SIAM, and the Mathematical Association of America. Center dnd was the chief architect for Naval Warfare Simulation models Since 1979 he has been advisor to the Technical Director for Artificial Intelligence and in charge of several applied artificial intelligence projects for the Naval Weapons Center Since 1986 the main thrust of his research has been in artificial neural networks (ANN'S) He was responsible for and continues to lead the Navy's applied research effort in neural networks under the Office of Navdl Technology. In addition, he 15 also the chief scientist for the Missile-Borne Neural Network Demonstration project Dr Andes is an active member of the International Neural Network Society and has been an invited speaker at several national and international meetings dround the world His current research work include5 the building of a thirdgeneration neural computer. theoretical models of visualicortical processing, dnd several other special applications of neural computing to naval tactical system s Lynn R. Kern received the B.S. degree in electrical systems and sciences engineering from Southern Illinois University, Carbondale, IL, in 1984. He is currently with the Naval Weapons Center, China Lake, CA. From 1984 through 1985, he served as the Systems Engineer on the Full Scale Aircraft Target Program in the development and production of the QF-4 full scale drone. Between 1985 and 1989, he served as project engineer researching unique passive detection concepts for IR radiation aoolicable to oroximitv fusing. Since 1989.
1.
i v
Mr. Kern has been involved in the development of artificial neural network hardware, including the development of several new analog neural components and computational systems. He is currently the Lead Hardware Engineer on the Missile-Borne Integrated Neural Network Demonstration project responsible for the hardware development of the Mod 2 Neurocomputer system. His research interests include analog signal processing using neural networks. with emphasis on wide-band image processing from two-dimensional arrays.
