In this paper we develop techniques that are suitable for the parallel implementation of Fuzzy ARTMAP networks. Speedup and learning performance results are provided for execution on a DECmpp/Sx-1208 parallel processor consisting of a DEC RISC Workstation Front-End (FE) and MasPar MP-1 Back-End (BE) with 8,192 processors. Experiments of the parallel implementation were conducted on the Letters benchmark database developed by Frey and Slate. The results indicate a speedup on the order of 1000-fold which allows combined training and testing time ofunder four minutes.
INTRODUCTION
Adaptive resonance theory was introduced by Grossberg in 1976 as a means of describing how recognition categories are selforganized in neural networks'. Since this time, a number of specific neural network architectures based on ART have been proposed. Many of these architectures originated from Carpenter, (irossberg, and their colleagues at Boston University. A major separation amongst the ART architectures developed so far is in the categories of unsupervised versus supervised ART architectures. A prominent member ofthe class ofunsupervised architectures is Fuzzy ART2, while a prominent member of the class of supervised architectures is Fuzzy ARTMAP3. Fuzzy ART can cluster arbitrary collections of binary or analog input patterns, while Fuzzy ARTMAP can implement any mapping from an input space of arbitrary dimensionality to an output space of arbitrary dimensionality. There is a high degree of correlation between the Fuzzy ART and the Fuzzy ARTMAP architectures, because a number ofcomponents ofFuzzy ARTMAP are Fuzzy ART modules.
Our primary focus in this paper is the parallel implementation of Fuzzy ARTMAP. Analog and digital VLSI implementations of simpler than the Fuzzy ARTMA.P architectures (e.g., ART1) have appeared in the literature7. To the best of our knowledge though, no effort has been made so far to implement Fuzzy ARTMAP on general purpose massively parallel hardware.
The parallel implementation of Fuzzy ARTMAP is applied on a DECmpp/Sx-1208 parallel processor consisting of a DEC RISC Workstation Front-End (FE) and a MasPar MP-1 Back-End (BE) with 8, 192 processors. Processing is divided into FE routines that perform input/output (I/O), integer scaling, and data set randomization, and BE routines that train and test the network. Experiments performed on the Letters benchmark database8, developed by Frey and Slate, indicated a speed-up of 1000-fold which resulted in a combined training and testing time of under 4 minutes.
The organization of the paper is as follows. In Section 2 we describe the Fuzzy ART neural network since it constitutes the building block in the design of a Fuzzy ARTMAP neural network. In Section 3, we continue with the description of the Fuzzy ARTMAP neural network. In Section 4, we present the implementation environment where Fuzzy ARTMAP will be implemented. In Section 5, we emphasize issues pertaining to the parallel implementation of Fuzzy ARTMAP. Finally, in Section 6 we discuss the experimental results, while in Section 7 we provide some concluding remarks.
FUZZy ART
A brief overview of the Fuzzy ART architecture is provided in the following sections. For a more detailed discussion of this architecture, the reader should consult G. A. Carpenter2.
Fuzzy ART Architecture
The Fuzzy ART neural network architecture is shown in Figure 1 . It consists of two subsystems, the attenlional subsystem, and the orienting subsystem. The attentional subsystem consists of two fields of nodes denoted F1 and F2 . The F1 field is called the inputfield because input patterns are applied to it. The F2 field is called the category or class representation field because it is the field where category representations are formed. These categories represent the clusters to which the input patterns belong. The orienting subsystem consists of a single node (called the reset node), which accepts inputs from the F1 field, the F2 field (this input is not shown in Figure 1 ), and the input pattern applied across the F1 field. The output of the reset node affects the nodes in the F2 field.
Some preprocessing of the input patterns of the pattern clustering task takes place before they are presented to Fuzzy ART. The first preprocessing stage takes as an input an Ma-dimensional input pattern from the pattern clustering task and transforms it into an output vector a = (a1,..., aMa), whose every component lies in the interval [0, 1] (i.e., 0 a, I for I i Ma). The where a1C 1 -a ; 1 I Ma (2) . The above transformation is called complement coding. The complement coding operation is performed in Fuzzy ART at a preprocessor field designated by F0 (see Figure 1 ). We will be refer to the vector I formed in this fashion as the input pattern.
We denote a node in the F1 field by the index i (i€{1, 2,. . , 2Ma}), fld a node in the F2 field by the indexj (j€{1, 2,. . ., Na)). Every node / in the F1 field is connected via a bottom-up weight to every nodej in the F2 field; this weight is denoted W,?.
Also, every nodej in the F2 field is connected via a top-down weight to every node i in the F1 field; this weight is denoted
The vector whose components are equal to the top-down weights emanating from nodej in the F2 field is designated and is referred to as a template. Note that a (a ,w;2,. . . ,W;2M ) forj = 1, . . . , Na. The vector ofbottom-up weights converging to a node j in the F2 field is designated W. Note that in Fuzzy ART the bottom-up and top-down weights corresponding to a node j in F2 are equal. Hence, in the forthcoming discussion, we will primarily refer to the top-down weights ofthe Fuzzy ART architecture. Initially, the top-down weights ofFuzzy ART are chosen to be equal to the "all-ones" vector. The initial top-down weight choices in Fuzzy ART correspond to the values of these weights prior to presentation of any input pattern to the Fuzzy ART architecture.
Before proceeding, it is important to introduce the notations a0Ic and Quite often, templates in Fuzzy ART are discussed with respect to an input pattern I presented at the F1 field. The notation denotes the template of nodej in the F2a field of Fuzzy ART prior to the presentation of I. The notation denotes the template of nodej in the F2 after the presentation of I. Similarly, any other quantities defined with superscripts {a, old) or {a, new) will indicate values of these quantities prior to or after a pattern presentation to Fuzzy ART, respectively.
Operation of Fuzzy ART
As mentioned previously, we will use I to indicate an input pattern applied at F1 and to indicate the template ofnodej in F2a . In addition, we will use I and twtI denote the size of I and respectively. The size of a vector in Fuzzy ART is defined to be the sum ofits components. Furthermore we define I AWJa to be the vector whose i-th component is the minimum ofthe i-th I component and the i-th component. The operation A is called thefuzzy-min operation.
Let us assume that an input pattern I is presented at the F1 field of Fuzzy ART. The appearance of pattern I across the F1 field produces bottom-up inputs that affect the nodes in the F2 field. These bottom-up inputs are given by the equation:
IAwa old
where aa, which takes values in the interval (0, oo), is called the choice parameter. It is worth mentioning that if in the above equation a0dI is equal to the "all-ones" vector, then this node is referred to as an uncommitted node; otherwise, it is referred to as a committed node.
The bottom-up inputs activate a competition process among the F2 nodes, which eventually leads to the activation of a single node in F2, namely the node which receives the maximum bottom-up input from F1 . Let us assume that node Jm in F2 has been activated through this process. It is worth mentiomng that in equation (5) we might have w = w3 ; in this case we say that no learning occurs for the weights of nodejm. A150 note that equation (5) is actually a special case of the learning equations of Fuzzy ART that is referred to as fast learnine. In this paper we only consider the fast learning case. We say that node Jmhas coded input pattern I if during l's presentation at F1 nodejm in F2 is chosen to represent I, andjm's top-down weights are modified as equation (5) prescribes. Note that the weights converging to or emanating from an F2 node other thanjm (i.e., the chosen node) remain unchanged during rs presentation.
Fuzzy ARTMAP
A brief overview ofthe Fuzzy ARTMAP architecture is provided in the following sections. For a more detailed discussion of this architecture, the reader should consult G. A. Carpenter et. al.
Fuzzy ARTMAP Architecture
A block diagram of the Fuzzy ARTMAP architecture is provided in Figure 2 . Note that two of the three modules in Fuzzy ARTMAP are Fuzzy ART architectures. These modules are designated ARTa fld ARTb fl Figure 2 . The ARTamodule accepts as inputs the input patterns, while the ARTb module accepts as inputs the output patterns ofthe pattern classification task. All ofthe details in Section 2 are valid for the ARTa module without change. The same for the ARTbmodule by ifthe superscript a of Section 2 is replaced with the superscript b to emphasize the fact that we are referring to weights and parameter values of the FuzzyART module. One ofthe differences between the ARTa fld the ARTb modules in Fuzzy ARTMAP is that for pattern classification tasks (many-to-one maps) it is not necessary to apply complement coding to the output patterns presented to the ARTb module.
As illustrated in Figure 2 , Fuzzy ARTMAP contains a module that is designated the inter-ART module. The purpose of this module is to make sure the appropriate mapping is established between the input patterns presented to ARTa, fld the output patterns presented to ARTb. There are connections (weights) between every node in the F2 field OfARTa, and all nodes in the Fab field ofthe inter-ART module. The weight vector with components emanating from nodej in F2 and converging to the nodes of F0 the field is denoted w = ,..., W ,..., Wi), where Nb are the number of nodes in Fab (the number of nodes in Fab is equal to the number of nodes in F21'). There are also fixed bidirectional connections between a node k in Fab, and its corresponding node k in F2b.
Operation of Fuzzy ARTMAP
The operation of the Fuzzy ART modules in Fuzzy ARTMAP is a slightly different from the operation of Fuzzy ART described in Section 2. For instance, resets in the ARTa module of Fuzzy ARTMAP occur either because the category chosen in F2 does not match the input pattern presented at F1, or because the appropriate map has not been established between an input pattern presented at ARTa, and its corresponding output pattern presented at ART1'. This latter type of reset is enforced by the inter-ART module via its connections with the orienting subsytem in ART (see Figure 2 ). This reset is accomplished by forcing the ARTa architecture to increase its vigilance parameter value above the level that is necessary to cause a reset of the activated node in the F2 field. Hence, in the ARTa module ofFuzzy ARTMAP, we identiIj two vigilance parameter values, a baseline vigilance parameter value which is the vigilance parameter of ARTa prior to the presentation of an input/output pair to Fuzzy ARTMAP, and a vigilance parameter Pa that corresponds to the vigilance parameter that is established in ARTa via appropriate resets enforced by the inter-ART module. Also, the node activated in F2" due to a presentation of an output pattern at F1" can either be the node receiving the maximum bottom-up input from Ft", or the node that is designated by the Fa field in the inter-ART module. The latter type of activation is enforced by the connections between the Fat, field and the F21' field.
All of the equations in Section 2 for the Fuzzy ART module are valid for the ARTa and ART,, modules in Fuzzy ARTMAP. In particular, the bottom-up inputs to the F2 field and the F2" field are given by:
IAwa01d1 r(i) (6) (cia ÷w;.01) and OAw"' Inter-ART module w where in equation (7), 0 stands for the output pattern associated with the input pattern I, while the rest ofthe ARTb quantities are defined as they were defined for the ARTa module in Section 2. Similarly, the vigilance ratios for ARTa and ARTb are computed as follows:
IAw' old (8) III and IAw0l ( 
9) 1I
The equations that describe the modifications of the weight vectors ab can be explained as follows: A weight vector emanating from a node in F2 to all the nodes in Fab is irntthlly the "allones" vector and, after training that involves this F2 node, all ofits connections to F0b, except one, are reduced to the value of zero.
Operating Phases of Fuzzy ARTMAP
Fuzzy ARTMAP may operate in two different phases: training and performance (testing). The training phase of Fuzzy ARTMAP works as follows: Given the training list {I'; 01), {12; 02) : : :, {I" P we want Fuzzy ARTMAP to map every input pattern of the training list to its corresponding output pattern. In order to achieve the aforementioned goal, present the training list repeatedly to the Fuzzy ARTMAP architecture. That is, present I' to ARTQ and 0' to ARTb, then 12 to ARTa and 2toARTb, and eventually I" to ART0 and 0' to ARTb; this corresponds to one list presentation. Present the training list as many times as is necessary for Fuzzy ARTMAP to classify the input patterns. The classification (mapping) task is considered accomplished (i.e., the learning is complete) when the weights do not change during a list presentation. The aforementioned training scenario is called off-line training.
In theperformancephase ofFuzzy ARTMAP the learning process is disengaged, and input/output patterns from a test list are presented in order to evaluate the classification performance of Fuzzy ARTMAP. In particular, during the performance evaluation of Fuzzy ARTMAP, only the input patterns of the test list are presented to the ART0 module of Fuzzy ARTMAP. Every input pattern from the test list will choose a node in the F2 field. Ifthe output pattern to which the activated node in F2 is mapped matches the output pattern to which the presented pattern should be mapped, then Fuzzy ARTMAP classified the test input pattern correctly; otherwise Fuzzy ARTM.AP committed a misclassification error.
IMPLEMENTATION ENVIRONMENT
These routines are implemented for DECmpp/Sx model 1208 computer which consists ofa DEC RISC Workstation as a Front end and the MasPar MP-1 model 1208 Back end. Communication between the Front end and Back end computers is through DMA channels over a VME bus.
The Back end machine as described by MasPar9 is a SIMD massively parallel machine consisting of 512 4x4 clusters of processor elements (PE) arranged in an 16 x 32 cluster array. This gives us 8192 PEs in a 128 x 64 PE array. To controland transfer data to and from the PE array the MP-1 has a RISC 32-bit processor with 32 32-bit registers, 128 K bytes of data memory and 1 megabyte of rain. Each PE consists of a 4-bit processor with 4032-bit registers and 16 kilobytes of RAM. A key component of any implementation is the movement of data to and from the processor. In a single processor system this movement is implied. In a parallel processing environment it can be the driving contribution to the execution time. Communication between PEs can either be transferred X-Net (a mesh type connection) or through the Global Router. X-Net communications are made in a straight line to any one of the 8 major compass directions with toroidal wrap around. Global Router communications are direct point to point simultaneous transfers between units. They are simultaneous from the viewpoint that all transfers will occur before the processors will execute the next instruction, however they take place over a switched network that allows many transfers to occur in parallel but in reality the worst case condition can cause serial transfers. Although a very powerftil mechanism, global routing may not be the best available choice' '. One of the restrictions of the router mechanism is that only one input and one output communication can occur from any given 16 processor cluster at one time, thus the identification ofthe cluster element in the architecture. For more details on the MP-1 see references9.
The programming environment for this implementation is MPL'°. MPL is an extended version of ANSI C that has added the data type modifier plural that refers to any variable operated on or stored in the PEs. It has also added keywords to control the active set ofprocessors and implement inter-processor communications. In MPL, control statements take on the additional implication of determining which processors are executing instructions at a given time. If the variables that form part of the control expression are all singular the active set is not modified. If any of them are plural the active set is dependent on value ofthe plural variable that is contained in the individual PE. A processor element either executes the sequence of instructions inside the block governed by the control statement or does nothing. As long as any processor is executing code within the control structure that code is executed. I.e., if a plural variable is used in a control structure such as if (plural var) {.. .} else {... .} both sets ofcode will probably be executed.
To allow the implementations to be used for any size problem the routines were virtualized by treating the processor array as if it were 3 dimensional. It is made up ofadditional layers ofthe total processor array. The communication between layers was through the shared memory and a limited set ofregisters. Only the processors were layered not the PE memory.
In order for an algorithm to benefit from parallel processing it must exhibit a high degree of calculation independence. A neural network exhibits this characteristic in that product of each weight with the input can be calculated simultaneously and the sum can be calculated in any order. In addition when the inputs are vectors each element of the vector can be calculated simultaneously. The key purpose of using a parallel computer is to reduce the total amount of time required to perform the computation. The primary hardware independent measure of this gain is speedup which is defined as Total number of calculations divided by the total number of steps. A massively parallel computer (normally defined as a SIMD machine with 1024 processors or more or a MIMD machine with 64 processors or more) can conveniently be considered as having an infinite number of processors which allows a program design that takes advantage of maximum calculation independence, where number of physical processors is exceeded, their number can be extended by virtualization. While an infinite number processors allows a large number of calculations to be performed simultaneously, it does not address the data flow or communication of the results. In this area the design of the program becomes architecture dependent. A Single Instruction Multiple Data (SIMD) architecture machine executes that the same operation on all active data streams at the same time.
While this reduces the number instruction decode units and provides a highly synchronized environment, it limits the processor independence. The primary advantage of a SIMD architecture is the simplicity of design. The Multiple Input Multiple Data (MIMD) architecture allows each processor to perform an independent instruction sequence on the data stream, however inter-process synchronization and communication become a bigger challenge. The DECmpp/Sx is MIMD from the viewpoint of Front end, Back end relationship, but SIMD for the massively parallel array. In most applications you use at least two program files, one with the main program written in ANSI-C which also call functions written in MPL from the second file. The ANSI-C routines execute on the Front end and the MPL routines execute on Back end. There is nothing that dictates where the main program runs as it can be either a "C" or "MPL" routine as it is simply the first entered and last exited, all the routines are linked together into one executable. Generally, the Front end is used for the main routine as it is the device that communicates with user and other general I/O and is more efficient at executing strict sequential actions such as reading character streams. This MIMD configuration can execute in either synchronous or asynchronous mode as primitives supporting both operations are provided.
PARALLEL IMPLEMENTATION
This implementation attempts to take full advantage of the parallel machine while also reducing the complexity of the operations (Integer vs. Floating Point) as a demonstration of the feasibility of special purpose highly parallel architectures for Fuzzy ARTMAP neural networks. The program is broken into two files fuzzmapfe.c and fuzzmapp.m which contain the Front end (FE) and Back end (BE) routines respectively. The FE routines perform all the I/O and pre and post scaling operations.
They also perform the randomization of the data set between trials. The BE routines perform the training and testing of the neural networks. The BE is configured as two Sets Of vector processors. The first set handles all of the ARTa calculations simultaneously with the second set handling all of the ARTb calculations. Each set is further broken down into N template vector processors where N is the number of committed templates + 1 . Each template vector processor takes ceiling(M/2) physical processors where M is the dimension of input patterns. Thus each physical processor handles two elements of the input plus their complements times the number of VirtUal layers. The justification for this organization is that the communication between adjacent processors was faster than from the memory to a processor. However, operations from registers were the fastest. There are more total operations on a per template basis than on a per element basis so some compromise was desired to promote processor efficiency. Each additional virtual processor layer generates an additional set of execution steps. The division of the physical processors into ARTa and ARTb sets was made arbitrarily due to the configuration of the PE array. The last row was set aside for the ARTb processing. Although this reduces the number of processors available for ARTa processing since it is only 1 .56% maximum loss, this was considered better than leaving 99% idle during separate ARTb processing and the simplification ofthe program gained by dedicating a total row was advantageous. If this ratio of ARTa to ARTb processors is inadequate the program will generate an error message to that extent and the program will have to be modified. Since, the program computes the maximum number of required ARTa patterns as being 1+ the number of training patterns (the result if Pa 1 ) fld expects the maximum number of ARTb templates to be an input parameter this error should not occur.
To demonstrate the potential to use simplified hardware (integer arithmetic) as well as the processing speed to be gained on the PE all ofthe values are pre and post scaled prior to submission to the PE array. All operations were performed using 32 bit integer arithmetic except solving for the ratio. In this case the numerator is scaled to 64 bits so the resulting division will yield a 32 bit quotient.
The FE processor initiates the program by reading in the parameters and the data set, that it scales and places into arrays for passing to the BE processor. The FE also calculates the dimensions for the PE array configuration. After all input and output values, both training and testing are in place the FE calls the BE main routine. The BE copies all the parameters and data to the PE array and ACU. Ifan additional randomly ordered run is required it initiates an asynchronous call to randomize routine on the FE which will execute concurrently with Training. It then calls the training routine which finishes configuring the PE array and then calculates the templates.
The templates are calculated on an iterative basis on a pattern by pattern basis until all patterns are presented with no further change to the template set. Each pattern is generated by reading the input element by element from the data array, each value is broadcast to each template processor where the element and its compliment are stored in the appropriate vector register. The vector element location is determined by a modulo operation which activates only the processors that act on the specified vector element. Note the compliment is recalculated for each pattern presentation since it faster than the reading and broadcast ofan additional element. After the input pattern is generated the corresponding output pattern is likewise generated.
If it takes more than one layer of processors to handle all the templates this section is repeated for each layer. Simultaneously the fuzzy mm operation is performed between the pattern and committed and first uncommitted template for both input and output patterns. Since each PE element provides processing for 2 elements and their compliments, this operation is repeated 4 times. The magnitude ofthe intersection is calculated first by summing the four elements in each processor then by pair wise summing the partial sums from each processor of the template vector. The pair wise summation takes log2(M/2) steps to complete. This magnitude TW is saved for each template until the best template is selected. TW is then divided by the sum of beta and the old template magnitude TWO and saved as TR (template ratio) (this s a scaled value as discussed in the previous section). If TW is greater than pM and TR is greater than T of the previous layer then TR and the virtual processor number are saved in T,, and num. After all template layers are processed, the T register holds the largest value across the layer. The template identified by the virtual processor number also meets the vigilance criteria but we still need to select across the set of processors.
Selecting T across all processors is accomplished by a pair wise binary reduction process first between all processors of each row which leaves the Tm and num at processor 0 in each row and then the column reduction that leaves the answer at processor 0. Note the last row is not included in the final step since it is used to process ART only and its solution was complete after the row reduction. The temporary variable num in processor 0 for ARTa and processor Sb for ARTb contain pointers to the respective input and output templates. This pointer is used to fetch the mapping index for ARTawhich is compared to the template number of ARTb. If the ARTa result pointed to the uncommitted template its mapping index is set to the ARTb template number and the template magnitude TW is saved as the old template magnitude TWO and the intersection elements are saved as the template elements.
If the mapping index does not agree with the ARTb template number then the selection process must be repeated with a revised p. This is implemented by setting the pMTW+c where TW was the intersection weight ofthe template that yielded T and c is the smallest possible value. Using the previously calculated values ofTW and TR the selection process of T, is continued resetting p.M as required until a suitable mapping index is obtained. Note that iftwo patterns are identical but map to different outputs this result would never converge. A test for this condition is included and if it occurs an error message is generated and the program is aborted. Once the correct template is found the intersection is recalculated since that was faster than saving all the intersections. This intersection replaces the previous elements of the template and the old template magnitude is replaced by the current template magnitude. This completes the operations for one pattern presentation.
After all patterns have been processed in the current iteration the value of the each template's old magnitude is compared to the saved magnitude. If all do not match, then the saved magnitude is updated to the old magnitude and another iteration is initiated. This approach is much faster than setting a flag each time a change occurs as it at most only takes L (the number of layers) steps per iteration. Otherwise, there could be up to N (the number of patterns) changes per iteration. Once all saved magnitudes match the old magnitudes the training is complete and the program returns to the parallel main routine.
After learning, the BE synchronizes its actions with the FE and passes the learning results to the FE. If output of the learned templates has requested the BE initiates an asynchronous process on the FE to scale, format, and output those templates. Concurrently the BE proceeds into test phase. The test phase is very similar to the learning phase except no templates are modified. The test patterns are presented the same as in the learning phase. However, after T is determined and a template is selected ifthe uncommitted template is the best match the unmatched counter is incremented. Ifthe mapping index of ARTa does not agree with ARTb template number then mismatched counter is incremented. Ifthe mapping mdcx OfARTa does agree with ARTb template number then matched counter is incremented. After each testing pattern is presented one time the results are returned to the BE main routine.
The BE main routine synchronizes with the FE and transfers the results to the FE and initiates an asynchronous test report , routine on the FE. Ifadditional passes were requested based on a reordering ofthe data set the BE now transfers the updated data set from the FE then synchronizes with the FE and initiates another random shuffling of the data set concurrently with proceeding with another learn, report, test and report cycle. This continues until all requested passes are completed or the program times out. Since, it is common for larger jobs to be presented than can be accomplished during the preset time limits the program is designed to generate multiple output files and close them incrementally during the process. It also will accept an input parameter to tell it where to restart. The program can be restarted before the initiation ofany arbitrary random pass. It does this by repeatedly executing the random shuffling routine until ready for the prescribed pass. It is assumed that since these passes are performed on the FE prior to activating BE they will not deleted from the maximum time limit.
EXPERIMENTAL RESULTS
The experiments presented in this paper were performed on the Letters benchmark developed by Frey and Slate8. This benchmark consisted ofa database of2O,000 patterns derived from 20,000 unique black-white pixel image. Sixteen numerical feature attributes were obtained from each character image, and each attribute value was scaled to range of 0 to 15. Each of patterns consists of the desired output character and the sixteen attribute values. The output characters were changed to an integer value of 1 through 26 to represent the alphabet from A to Z. The output was represented as one element with values from 1 to 26. A special FE process was created to deal with database allowing the input of the integer values rather than normalized values. This allowed a smaller database size than using a 6 digit decimal fraction representation. This FE process first normalized the input data prior to scaling data. The output used the reverse operation to allow the direct comparison to the input data. This experiment consisted of twenty separate runs of using the same order of the data set varying only the value of the ARTa section. Values of were selected to minimize the number of classification errors during the test run. The test was initially ran for values of 0, .5, and 1 with further values selected at mid points of previous intervals until a local minimum was found. This data is presented in table 1. The data set was presented by using the first 16000 patterns to train the network and the last 4000 patterns to test the network.. 
CONCLUSIONS AND PLANS FOR FURTHER INVESTIGATION
The initial results generated more questions than answers. However, they provided some insight to guide further investigation. The data suggests the presence of multiple minimums suggesting that a more exhaustive study with multiple orderings is necessary before any conclusions are reached. It also demonstrates, that its sensitivity to , is stepwise rather than continuous. This is related to the fact that each element has only 16 possible values over the range 0-15. This allows an -M•15-n exhaustive study of Pa as it can be set to M 15 where n = 0. . M• 15. n an integer. As the number of templates only changes by 3 between 0 and .5 and number ofiterations required to train remains constant, it suggests that the majority of the study concentrate on the range of .5 to 1 .0. For that reason we studied 130 values of with 121 of them from .5 to 1.0 selected each possible template weight in that range, and the rest chosen to illuminate the range between 0 and .5. For each of twenty different data sets I evaluated each of the I 30 values of these values for a total of 2600 computer runs. We were also interested in observing the characteristics of the temporary values of p used during the resets, so we collected the template # and the value of used for each ofthese occurrences over all ofthese runs. After this data is collected and analyzed, results and conclusions will be updated. Each run took between 3 to six minutes to complete with approximately 10 hours total per data set. Each data set was generated by splitting the data into half and exchanging each element of the first half with a randomly selected element of the second half Successive data sets were generated by repeating this shuffling process. This exhaustive study with results presented in figures 3 and 4 showed that adjusting can provide a template set with a higher probability of correctly identifying the input. This appears to occur between p. values of .70 to .85 but it would take an exhaustive search within this range to select an optimum but is also dependent on the ordering ofthe data set as is indicated by the variations noted between runs. Figure 3 shows the mean and two sigma confidence region on the percentage of correct mappings versus p. forthe twenty different shuffles. Figure 4 shows the mean and two sigma confidence region ofnumber of templates versus ,. rho This implementation demonstrates that this type of neural network can take advantage of parallel processing. Furthermore since it is primarily based on comparison as its primary operation it can take advantage of simple processors. As the majority of its processing is the same it is also highly adaptable to the SJ.MI)architecture. This also implies that a node or template has low computational, or hardware cost. From this viewpoint it appears that in reality that a one to one comparison between a Fuzzy ARTMAP node and a Back Propagation node is not totally correct. From a computational cost viewpoint one Back This combined with the fast learning characteristics, combined with the simple parallel architecture could make it a feasible component for adaptive control mechanisms or other intelligent applications where fast learning is required. As identified in table 2 for a = .771 the number oftemplates required was 987. This set oftemplates generated correct answers within .05% ofthe peak found in the study yet used almost 200 templates less. This set would probably be chosen as optimum for the presented computer architecture since it used 987/1008 sets ofprocessors or 98% ofthe available processors for each presentation ofeach pattern, yet it gave better performance than most larger sets oftemplates. No operations would be saved for a smaller set of templates unless they were less than 512.
Further investigation is planned in more precisely determining the amount of speedup that can be expected from the parallel processing approach. In addition more analysis will be performed to determine whether the range of exhaustive study can be predicted by the size oftemplates generated by a run of 0.
