Overview
This report describes the last six months progress on tile Rutgers CAM Project.
The overall design of the system is complete at the architectural level and described in section 2. The machine, shown in Figure  1 on page 2, is composed of two kinds of cells;
the CAM ceils, which include both memory and processor, and support local processing within each cell; and the tree cells, which have a smaller instruction set, and provide global processing over the CAM cells.
We have completed a parameterized design of the basic CAM cell. The parameters are teclmology dependent and are concerned, not with the basic form of the cell, but such characteristics as the width of the data word within the cell. The instruction set for the CAM cells has been designed and is described in section 3. An instruction level simulator has been designed, implemented, a.nd used to simulate algorithms on this architecture.
Progress has been made on the the final specification of the CPS and is described in section 5. We have a partial instruction level simulator for this component but we have not as yet settled on the final instruction set.
The gate level simulator described in section 4 is almost completed. It will be used to evaluate the design the details of both tree and CAM cells as welt as the CPS.
The machine architecture has been driven by the design of algorithms whose requirements are reflected in the resulting instruction set(s). A few of these algorithms are described in section 6.
We have begun the design of a high level language, which not only will take advantage of the potential parallelism, but whose compiler will be an expert (rule-based) system that will take advantage of the associative properties of the CAM to support the compiling process.
A discussion of our approach to the compilation process is contained in section 7.
2 Hardware Design Figure  1 shows the Rutger's CAM architecture as a collection tree sitting over a set of CAM final state 4 5 6 7 8 9 10 11 4 3 6 7 6 9 10 9 Theseoperations require data to be processedin the collection tree. The techniquewe haveadopted performs a scanin two phases;an up phaseduring which data is processed, stored, and propagated up tile collection tree; and a down phaseduring which the stored data and a valueinjected at the root are processedand propagateddown the collection tree. Each TREE cell is connectedto its left child, right child, parent and contains one internal register.
During eadl phaseof a scan,tree cellsperform two parallel operationsdeterminedby tile type of scan(i.e. up or down) and the operationbroadcastto the ceils. During tile up phase eachtree cell stores the data from its left child into its internal register, appliesthe sPecified operation to the data from its children, and routes this result to its parent. The down phase works similarly; each tree cell routes the data from is parent to its left child, applies the specifiedoperation to the data from its parent and internal register, and routes the result to is right child. A syntactic description of this computation and communication is provided in figure 5 . Note that as in the abovedescription it is assumedthat values containedin a cell's internal register, vi, during the down phase were stored during the corresponding up phase of the operation. Activity information must also be propagated within the collection tree; however, in contrast to segmentation information that only is passed up the tree, activity information must also be passed down the tree. The activity information is encoded as a 1 bit field that is conceptually attached to each data value passed in the tree. A value of 1 indicates that the data is significant and must be used in the computation while a value of 0 indicates that the data is irrelevant and should not be used.
For unsegmented scans the activity information could be handled in the same manner as segment information with the exception that activity information must also be propagated down the tree. However, this is not possible for segmented scans where activity information is dependent on segment information. Activity information, as was the case with segment information, can be given unified semantics that encodes activity in a single bit field that indicates if the associated data is relevant. Figure  7 shows the relationship between the activity and segment bits for segmented scans with activity control. Subscripts 1 and r denote information about the left child, right child, and parent of a node while superscripts u and d denote the attachment of activity information to the up and down phases of a scan. Figure  8 shows the data paths that comnmnicate the segment and activity information to a TREE cell. These cells compute their own segment and activity as specified in Figure 7 and transmit this information as indicated by the edges. The segment and activity control over the tree as a whole is determined by this hardware and the setting of the segment and activity bits in the CAM cells as well as an single activity bit, a d, introduced at the root of the collection tree. sr ar The addressingmodesfor the instruction set are quite simple. For vector instructions, besidesthe optional activity control, segmentationcontrol, starting value,and default value, each instruction may have one or two sourceoperands and a destination address. The sourceoperandscomefrom memory(direct addressing)or an accumulatorwhich is a generalpurposeregister. Sincecurrently weassumethat eachCAM cellhasonly onegeneral-purpose register, only oneoperandcan be from tile accumulator. We also assumethat there is only one data path to memory. Thus, if an instruction has two sourceoperands,one should be from the accumulatorand tile other should be from memory. The destination addresscan be either memory (also direct addressing)or accumulator. However,if an instruction has two sourceoperands,its destination addressnmst be tile accumulator. Scalarinstructions may alsohavetwo sourceoperandsand a destination address,besidestile optional activity control and tile number of bit positions to be shifted (for shift instructions). Similar to vector instructions, scalar instructions also have the one data path and one accumulator restrictions. However,a sourceoperandin a scalar instruction can be an immediate value (i.e., immediate addressingmode). iiiiiiii I0000000 00000000 00000000 00000000 00000000 00000000 00011000 00000000 01000000 00000000 00000000 00000000 00000000 00000000 00000000 
CPS Simulation

Input:
The expression, type of each item (TY, and precedences of each iteln (LP and RP).
Output:
The parse tree represented by a vector, PA, containing the parent address of each item.
Method:
1. Initialization:
Clear PAi of every CAM+ 2. Check if only one CAM cell has PA = 0. If it is, (i.e., the cell contains the root of the parse tree), output the parse tree and then stop. Otherwise, go to the next step. [i] = i 1 2 2 2 2 3 3 6 6 6 6 6 1 1 2 
Input: M,,_,i
and Vi, where 0 < i < N-1, and the new value k.
Output:
The updated M,r¢,i, 0 < i < N -1.
Method:
It includes the following steps: Find the right boundary of the region, i.e., find the largest l, where l > j, such that
Find the left boundary of the region, i.e., find the smallest I, where 1 _< j, such that Table 4 ). Table 4 on page 24 gives the result of applying the algorithm to the input matrix given ill Table 3 , by selecting a particular colunm.
Algorithm 3. Region separation by renunlbering.
Input:
The M × N matrix to be processed, stored in memory location S of each CAM cell, and the selected column k.
Output:
The processed matrix stored in memory location R of each CAM cell.
Method:
The required M × N CAM cells are divided into M rows of N cells each, and their addresses are assmned to be two dimensional, i.e., CAMi,j, where 0 < i < M -1 and O<_j<N-1. Beginningwith eachof the possiblerepresentationsfor the graph, there are severaldifferent algorithms that could be usedto perform the operationsneeded.Many of thesepresent the option of producing the result in different ways, i.e. as a modification of the original datastructure, as a new datastructure, or as a representationof the abstract result in a different datastructure.
.. There are two major thrusts to our high-level language research. The simple graph for an abstract program gives rise to a "blown up" graph of possible implementations. An optimal path through the latter is then found.
spanning
The graph can be reduced by precompuling the optimal path from each input node to each output node. The reduced graph can then be used as a primitive.
