REFERENCES by Made B. A. Makrucki et al.
1EEE TRANSACTIONS ON COMPUTERS, VOL. c-33, NO. I1l, NOVEMBER 1984
estimates ofthevalues forPa, BW, and U, ifCu is small. The second
of these, the MC model, is the more complex model but, according
to comparisons to simulations, provides accurate estimates of the
values forPa, BW, and U, for a wide range of C,. The ER model
requires the values of M, N, r, and X as inputs. The MC model
requires, in addition, the valueofX2. The explicit dependence ofthe
MC model on X2 (and hence C,,) can be observed in (12). This was
confirmed empirically; specifically, it was shown thatBW decreases
with increase inCG. The fact that the second moment is an important
feature of memory interference should not be completely un-
expected as the behavior of similar systems, e.g., networks of
queues, also depend on the variance of underlying stochastic pro-
cesses (in the case of queues, it is the variances of the interarrival
time and service time).
ACKNOWLEDGMENT
The authors gratefully acknowledge comments and suggestions
made by B. A. Makrucki.
REFERENCES
[1] C. E. Skinner and J. R. Asher, "Effects of storage contention on system
performance," IBM Syst. J., vol. 8, no. 4, pp. 319-333, 1969.
[2] W. D. Strecker, "Analysis of the instruction execution rate in certain
computer structures," Ph.D. dissertation, Carnegie-Mellon Univ., Pitts-
burgh, PA, 1970.
[3] D. P. Bhandarkar, "Analysis ofmemory interferencein multiprocessors,"
IEEE Trans. Comput., vol. C-24, pp. 897-908, Sept. 1975.
[4] F. Baskett and A. J. Smith, "Interference in multiprocessor computer
systems with interleaved memory," Commun. ACM, vol. 19,
pp. 327-334, June 1976.
[5] C. H. Hoogendoorn, "A general model for memory interference in
multiprocessors," IEEE Trans. Comput., vol. C-26, pp. 998-1005,
Oct. 1977.
(61 J. S. Emer and E. S. Davidson, "Control store organization for multiple
stream pipelined processors," in Proc. 1978Int. Conf. on Parallel Pro-
cessing, Aug. 1978, pp. 43-48.
[7] B. R. Rau, "Interleaved memory bandwidth in a model of a multi-
processors," IEEE Trans. Comput., vol. C-28, pp. 678-681, Sept. 1979.
[8] J. H. Patel, "Processor-memory interconnections for multiprocessors,"
IEEE Trans. Comput., vol. C-30, pp. 771-780, Oct. 1981.
[9] T. N. Mudge and B. A. Makrucki, "Probabilistic analysis of a crossbar
switch," in Proc. IEEE 9th Annu. Symp. Comput. Arch., Apr. 1982,
pp. 311-319.
(10] D. W. L. Yen,J. H. Patel, and E. S. Davidson, "Memory interference in
synchronous multiprocessor systems,"IEEE Trans. Comput., vol. C-31,
pp. 1116-1121, Nov. 1982.
[11] L.N. Bhuyan and C.W. Lee, "An interference analysis of inter-
connection networks," in Proc. 1983 Int; Conf. on Parallel Processing,
Aug. 1983, pp. 2-9.
[121 F. A. Briggs and E. S. Davidson, "Organization of semiconductor memo-
ries for parallel-pipelined processors," IEEE Trans. Comput., vol. C-26,
pp. 162-169, Feb. 1977.
[13] M. A. Marsan and M. Gerla, "Markov models for multiple bus multi-
processor systems," IEEE Trans. Comput., vol. C-31, pp. 239-248,
Mar. 1982.
[14] 1. H. Onyuksel and K. B. Irani, "A Markov queueing network model for
performance evaluation of bus-deficient multiprocessor systems," in
Proc. 1983Int. Conf. on Parallel Processing, Aug. 1983, pp. 437-439.
[15] T. N. Mudge, J. P. Hayes, G. D. Buzzard, and D. C. Winsor, "Analysis
of multiple-bus interconnection networks," in Proc. 1984Int. Conf. on
Parallel Processing, Aug. 1984, pp. 228-232.
[16] S. H. Fuller, "Performance evaluation," in Introduction to Computer
Architecture, H. S. Stone, Ed. Chicago, IL: Science Research, 1975,
pp. 474-546.
[17] D. P. Bhandarkar and S. H. Fuller, "Markov chain models for analyzing
memory interference in multiprocessor computer systems," in Proc.Ist
Annu. Symp. Comput. Arch., Dec. 1973, pp. 1-6.
[18] J. H. Patel, "Analysis of multiprocessors with private cache memories,"
IEEE Trans. Comput.,vol. C-31, pp. 296-304, Apr. 1982.
[191 T. N. Mudge and H. B. Al-Sadoun, "Memory interference models with
variable connection time," Comput. Res. Lab., Dep. Elec. Eng. Comput.
Sci., Univ. Michigan, Ann Arbor, MI, Rep. CRL-TR-16-84, Mar. 1984.
An EfficientImplementation of
Search Trees on rig N + 1] Processors
MICHAEL J. CAREY AND CLARK D. THOMPSON
Abstract -A scheme for maintaining a balanced search tree on
Fig N + 1] parallel processors is described. The scheme is almost fully
pipelined: Flg N + 11/2 search, insert, and delete operations may run
concurrently. Each processor executes 0(1) instructions of a top-down
2-3-4 tree manipulation algorithm before passing the operation along to
the next processor in the pipeline. Thus, the total delay per tree operation
is O(lg N), and one tree operation completes every 0(1) time units.
Index Terms -Algorithms for VLSI, dictionary search, pipelining,
search trees, special-purpose architectures.
I. INTRODUCTION
The problem of implementing a search tree on a digital computer
has received intense study. If all the data in the tree will fit into main
memory, the conventional approach is to maintain the data in the
form of a balanced binary tree [1]. Using the balanced tree format,
a uniprocessor can perform search tree operations such as insert,
delete, exact-match search, and range search by executing just
O(lg N) instructions where N is the current number of entries in the
tree.' In this correspondence, we show how to design a multi-
processor system that is O(lg N) times as fast as the conventional
approach. Our system can accept a new search tree operation once
every 0(1) instruction times. This increased throughput makes our
scheme attractive for the implementation of centralized databases
handling a large volume of queries. A more complete presentation
of our scheme is available as [8].
In brief, the idea is to use a linear array of Flg N + 11 processors
to maintain a balanced tree structure for up to N items. Each item
is assumed to consist of a primary key K and an uninterpreted data
fieldI(K). Our scheme handles the following operations on the data
in the tree: insertions, deletions, exact-match searches, and range
queries. Each operation completes after O(lg N) delay where the
unit of time is taken to be the instruction cycle time of an individual
processor. The scheme is almost fully pipelined, allowing as many
as Flg N + 11/2 operations to be at varying stages of execution at
any point in time, so one operation can complete every 0(1) time
units. The processors in the system operate independently, each
executing its own instruction stream, so the system is an MIMD
architecture.
Each of the processors in our linear array is furnished with a
private memory unit. Processor PI has memory capable of storing a
single tree node. Processor Pi, 2c ic Flg N + 11, has twice the
amount of memory of its predecessor Pi-,. The last processor
PFIg N+I-i must have memory sufficient to hold all of the data which
are to be stored in the machine. Adding bidirectional communica-
tion paths between the processors, the resulting machine architec-
ture is shown for N = 16 in Fig. 1. (Also shown is an example of
tree storage layout in our scheme.) This is essentially the same
architecture used by Armstrong [3] and by Tanaka, Nozaka, and
Masuyama [19]. In distinction to these earlier machines, however,
our processors execute a top-down version of a 2-3-4 tree manipu-
lation algorithm [13], in which each processor takes care of one
level of the tree. By using this algorithm, our machine provides a
richer set of database operations than either of its predecessors.
Manuscript received March 5, 1984; revised July 16, 1984. This work was
supported by the National Science Foundation under Grant ECS-8110684,
the Air Force Office of Scientific Research under Grant AFOSR-78-3596, the
NavalElectronic Systems Commandunder Contract NESC-N00039-81-C-0569,
and a California MICRO Fellowship.
M. J. Carey is with the Department of Computer Science, University of
Wisconsin, Madison, WI 53706.
C.D. Thompsonis with theDivision of Computer Science, University of
Califomia, Berkeley, CA 94720.
'We follow the Knuthian practice of writingIg N forlog2 N.
0018-9340/84/1100-1038$01.00 © 1984 IEEE
1038
Authorized licensed use limited to: The University of Auckland. Downloaded on April 16,2010 at 03:45:25 UTC from IEEE Xplore.  Restrictions apply. IEEE TRANSACTIONS ON COMPUTERS, VOL. c-33, NO. 1 1, NOVEMBER 1984
requests
T -S )MM2
M3
Fig. 1. Parallelarchitectue for b alancd te m r-- o6tb t()tb M56|
replies
Fig. 1. Parallel architecture for balanced tree maintenance.
Also, only the lastprocessor in our machinePrIgN+11 stores the actual
data items l(K).
Il. RELATED RESEARCH
Our pipelined search tree implements many of the operations
performed by the VLSI "search trees," "dictionary machines," and
"database machines" ofLeiserson [14], BentleyandKung [6], Song
[17], [18], Dohi, Suzuki, and Matsui [10], Ottman, Rosenberg, and
Stockmeyer [15], Atallah and Kosaraju [4], Bonucelli, Lodi,
Luccio, Maestrini, and Pagli [7], and Somani and Agarwal [16].
These "tree machines" are based on the use of O(N) processors
organized in tree-like configurations, and they can handle a large
variety ofsearchoperations, including"partial match"queries [12],
in 0(ig N) time. In contrast, our scheme requires only O(ig N)
processors but queries canonly be for exact orrange matches to the
"primarykey" field of each data item. In applications for which our
scheme'soperation setis sufficientlypowerful, ourpipelined search
tree machine will be smaller, cheaper, and thus superior to an
O(N)-processor tree machine.
Another related scheme is the Flg N + 11-processor heapsort/
search tree database system of Tanaka, Nozaka, and Masuyama
[19]. Theiridea is to use oneFig N + 11-processorpipeline toheap-
sort a streamofrecords, and asecondmiachine withsimilar structure
to arrange the sorted stream into the form of a binary search tree.
The main difference between their search tree and ours is that we
provide on-line rebalancing, allowing insertions and deletions
to run concurrently with search operations without unbalancing
the tree.
A final related scheme was proposed as this correspondence was
being revised for publication. Fisher [11] developed a scheme for
maintaining a "trie" on apipeline of Iprocessors where I is thelength
of thelongest item to be stored and each processor stores one byte of
each item. Fisher's scheme (like ours) is superior to O(N)-processor
tree machines forsimple databaseapplications, namely thoseinwhich
the only operations are insert, delete, and exact search.
III. PROCESSOR ARCHITECTURE
In the introduction, we defined our architecture as consisting of
a linear array ofFlg N + 11 processors, each with its ownmemory.
In stating ourtimebounds, we have assumedthat each processor can
execute one instruction in one unit oftime. Beforeproceeding with
the description of our tree maintenance algorithms, we briefly de-
scribe the capabilities required for the processors in our scheme.
Memory: Each word of data in the processors and memories of
our scheme consists of w bits where w is largeenough to represent
either a key and two Flg N + 11-bit pointers, or else a key, an
uninterpreted datafield, and one Flg N + 11-bitpointer. Apointer
can refer to oneofN different memory words, or to nil. Processor
P, 1 c i ' Flg N + 1i, has 2i-' words of storage in its local
memory. It has Q(1) internal registers, some of which are' loaded
with predefined constants. Finally, it has its own read-only control
store with 0(1) instructions.
Instruction Set: The instruction set contains register-register
ADD, SUBTRACT, and MOVE instructions, a BRANCH-ON-ZERO(reg,
addr) instruction, instructions to READ and WRITE into each ofthe N
(orfewer) locations in eachpro'cessor's local memory, and SEND and
RECEIVE instructions to communicate with the processor's nearest
neighbors. For the two processors on the "ends" ofthe linear array,
the SEND andRECEIVE instructions are used to communicate with the
outside world. All instructions except RECEIVE are fetched, inter-
preted, and executed in unit time. An execution ofRECEIVE, on the
other'hand, is not complete until a message is received from the
specified processor. This can take an arbitrarily long amount of
time.
IV. OPERATIONS FOR TREE MAINTENANCE
In this section, the pipelined 2-3-4 tree manipulation operations
are described. We remind the reader that a 2-3-4 tree is a balanced
search treewheretwo, three, orfourpointers (and one, two, orthree
search keys) appear in each internal (index) node, and all data items
appear in external (leaf) nodes [13]. The tree manipulations are
simple variants of the algorithms for the more common 2-3 trees
and B+ trees [1], [5], [9]. The advantage of 2-3-4 trees over 2-3
trees is that the manipulations can be performed in a top-down
fashion [2], [13]. Top-down operations are ideal for our architec-
ture, as they make pipelined operation both possible and simple.
Since 2-3-4 tree manipulations have been de'scribed elsewhere,
our description will be informal, with actual pointer and key
manipulations omitted.'
A. Searching
The SEARCH operation for the parallel 2-3-4 tree scheme is a
simple pipelined version ofnormal B+ tree searching. Hence, when
processor Pi receives a "SEARCH(key n, using pointerp)" message,
it should'do the following.
Case 1: Pi contains internal nodes (i <Flg N + 1]). Follow the
pointer p to the appropriate index node in local memory. Use the
key value n to select the appropriate pointer (p') to follow from
here.- Send SEARCH(n,p') to Pi+1.
Case 2: Pi contains data nodes (i = Flg N + 1]). Given key n
and pointer p, see if pointerp points at a data node containing
key n. If so, send the data to the outside world. If not, send out a
message indicating'that the desired data were not found.
B. Insertion
The INSERT operation for the parallel 2-3-4 tree schemne is a
pipelined version ofthe top-down node-splitting insert algorithm of
Guibas and Sedgewick [2], [13]. When the search encounters a
4-node, the transformation shown in Fig. 2 is applied, ensuring that
future node splits will not cause upwardly propagating splits. Note
that the insertion transformation results in an increase in the actual
tree height when applied at the root node.2 The figure depicts the
transformation in terms of 2-3-4 trees, with optional pointers
drawn in dotted lines and the search path pointer indicated via a
small black square. Although the figure shows the insertion path as
being the leftmost path, the transformation applies in the obvious
way regardless of the path.
Hence, when processor Pi receives an* "INSERT(key n, using
pointerp)" message, it should do the following.
Case 1: Pi contains internal nodes (i < Flg N + 1i). P, follows
the pointer p to the appropriate index node in its local memory. It
then uses the key value n to select the appropriate pointer p'
to follow from here. Next it sends INSERT_TRANSFORM(p') toPi+,.
Pi+, will apply the insertion transformation ifit is applicable, split-
ting the next node on the search path for key n, and sending
INSERT TRANSFORM REPLY(m, np) toPi. This replyinformsP, ofthe
'The rootnode isdefined asthe first2-, 3-, or4-node onthepathgoing from
PIo oPFgN+11-
1039
Authorized licensed use limited to: The University of Auckland. Downloaded on April 16,2010 at 03:45:25 UTC from IEEE Xplore.  Restrictions apply. IEEE TRANSACTIONS ON COMPUTERS, VOL. c-33, NO. 11, NOVEMBER 1984
Fig. 2. Insertion transformation.
new splitting key3 m and new offspring pointer np resulting from a
node split if one occurred. That is, if np # nil, Pi increases the size
of its current index node p. Ifp is a 4-node, such an action would
lead to the formation of a 5-node (an error). By induction, however,
this is impossible since INSERT_TRANSFORM(p) was previously
requested by processor Pi-l. (We consider-the basis i = 1 of
this induction in the next section, in the discussion of storage
requirements.)
If processor Pi modifies the current node p in response to an
INSERT_TRANSFORM_REPLY, it uses n once again to select the appro-
priate pathp' to follow from here, Finally, Pi sends INSERT(n,pl) to
Pi+1.
Case 2: Pi contains data nodes (i = Flg N + 1]). If the indi-
cated data item is not already present and ifthere is room for another
itemn in the tree, PrF N+11 installs it in a new data node. It then sends
PrlgNI apointer np to the new node in an INSERT_REPLY(np) message,
and finally sends the outside world an acknowledgment. If there is
no room for the item, or ifthe indicated data item is already present,
Pr[gN+11 sends PFIg N an INSERT.REPLY(nil) message, then sends the
outside world an error response.
C. Deletion
The DELETE operation is a modified version of the Guibas and
Sedgewick top-down deletion algorithm [13]. The modification is
based on the observation that old keys may be used to guide searches
in B+ trees [9] since all predece'ssors of a deleted key are prede-
cessors of its successor key, and all successors of its successor key
are also successors of the deleted key. Thus, it is not necessary to
delete the instances of a data item's key from the index portion of
a 2-3-4 tree when deleting the item.
The basic idea of top-down deletion is that when a node with the
minimum allowable number of keys is encountered, a trans-
formation that adds keys must be performed to ensure that deletions
cannot propagate upwards [2], [13]. No paper in the literature has
described these transformations in terms of standard 2-3-4 trees or
B+ trees in a particularly comprehensible manner, so they wvill be
described here in some detail. There are three such transformations,
depicted in Figs. 3, 4, and 5 for 2-3-4 trees. In all cases, the
transformations ensure that the next node on the search path has at
least 3 descendents. As with Fig. 2, Figs. 3-5 depict the deletion
path as being the leftmost path; the transformations can be gener-
alized in the obvious way regardless of the path. Note that deletion
transformation I (and only deletion transformation I) can result in
a decrease in the tree height when applied at the root node.
Hence, when processor Pi receives a "DELETE(key n, using
pointerp)" mnessage, -it should do the following.
Case 1: Pi contains internal nodes (i < rlg N + 1]). Processor
P, follows the pointer p to the appropriate index node in local
memory, using the key value n to select the appropriate pointer p'
to follow from here. Then it sends DELETE_TRANSFORM(m,p',p") to
Pi,+ where p" is the adjacent path pointer4 for p' and m is the
splitting key forp' andp". Processor Pi+, applies a deletion trans-
formationifone is applicable, eithermerging the nodes indicatedby
p' andp" or moving one of the offspring of thep" node into the p'
node (i.e., redistribution). Next, processor Pi 1 sends DELETE_
3A "splitting key" is a key which Pi stores in its index node to guide future
searches to one of two index nodes stored in Pi+,.
4The "adjacentpathpointer" sentbyPi points to an index node inPi+1 which
is an immediate neighbor ofthe index node inP,i+I on the current search path.
Fig. 3. Deletion transformation I.
Fig. 4. Deletion transformation II.
Fig. 5. Deletion transformation III.
TRANSFORM REPLY(M', np) back to Pi. This reply informs Pi of the
new splitting key (m') resulting from a transformation (if one
occurred). Also, the reply contains a pointer np to either nil, p', or
p", depending on which node (if any) was deleted by the trans-
formnation. Processor Pi uses this information to update its current
index node.
Once its index node is updated, Pi again uses n to select the
appropriate path p' to follow from here. (The path may be differ-
ent if the current index node p was rearranged in response to
the DELETE_TRANSFORM_REPLY message.) Finally, Pi sends
DELETE(n,p') to Pi+1.
Case 2: Pi contains data nodes (i = Flg N + 1]). If the indi-
cated data item is present, PrIgN+il deletes its data node, sends PrigNi
a DELETE_REPLY(no error) message, and sends the outside world an
acknowledgment. If the indicated data item is not present, Prlg N+1
sends PrIgN] a DELETE_REPLY(error) message, then sends the outside
world an error response.
D. Range Queries
Range queries can be accommodated with a slight additional
amount of storage overhead. Immediately after a SEARCH(n) opera-
tion is initiated, any number of NEXT_UPTO(m) operations can be
requested. WhenP~lgN+11 receives aNEXT_UPTO(m)message, itlooks
for the smallest key k which is largerthan the key ofthe previously
outputted item. If there is no such item, or ifk > m, PflgN+11 sends
out an error indication. The external interface to our search tree
shouldrespond to this error indicationby stopping the generation of
NEXT_UPTo requests, thereby terminating the range query. About
Flg N + 11/2 NEXT_UPTO requests will still be in the pipeline at the
time the range query is terminated. A range query returningj items
canthus beexecutedby meansofatotalof] + Flg N + 11/2search
tree operations.
Note that PFIgN+II's search for an item that is NEXT_UPrO(m) will
be very time consuming if that item is not in the same node- as the
previous output. For this reason, additional pointer fields should be
provided witheach item (or with each node), forming the items into
adoubly linked list or sequence set [9]. Thus, with two extrapoint-
ers per item, and one additional message type, our scheme will
execute range queries in time proportional to Ig N plus the number
of items in the range. For completeness, an analogously defined
1040
Authorized licensed use limited to: The University of Auckland. Downloaded on April 16,2010 at 03:45:25 UTC from IEEE Xplore.  Restrictions apply. IEEE TRANSACTIONS ON COMPUTERS, VOL. c-33, NO. 11, NOVEMBER 1984
PREV_DOWNTO(m) command should probably be added to any real-
ization ofour scheme that implements theNEXT_Uvro(m) command.
Once the doubly linked lists are in place, only a minimal amount of
extra code is required in PpI N+Iito implement PREV_DOWNTO.
V. STORAGE REQUIREMENTS
In this section we prove that our scheme has enough memory to
hold all possible 2-3-4 trees of N or fewer items. As defined in
Section III, processor Pi, 1 ' i s Flg N + 1], has 2'' words of
storage in its local memory. Trees start out on processor PrIgN+11,
growing upwards towards P1 as items are inserted. LetMi(k) be the
maximum number ofwords usedby P1 when there are k items in the
tree. Since akey, an item, and apointer field fit into the word length
specified in Section III, the last processorPrpgN+11 usesjust as many
words as there are items in the tree
MFIgN+11(k) = k.
The number of words used by processor PFlgNl is somewhat vari-
able, depending on the number of 2-, 3-, and 4-nodes in the last
processor. The worst case storage requirement is obtained when
-P[lgN+1 has no 3-nodes or 4-nodes
MF,gNl(k) = max{l, Lk/2i}.
The "max" function is used to reflect the fact that there is always at
least one occupied word on each level, even in an empty tree.
In general, Mi(k) can be expressed in terms ofMi+,(k)
Mi(k) = max{1,LM+,,(k)/2J}, for all i < Flg N + II.
Solving the above recurrence, and noting that both LLk/2'j/2j and
Lk/2j+'I are equal to the binary value of k right-shifted by
j + 1 bits, we find that
Mi(k) = max{l, Lk/2Fg±N+11-iJ}
= max{l, 2'-1k/N} .
Evaluating Mli(N), we find thatPi needs no more than 2'-1 words
to store anyN-item 2-3-4 tree. Furthermore, since M'(k) is mono-
tonically nondecreasing in k, the architecture described in
Section III is capable of storing any k-item 2-3-4 tree, k < N. (If
range query support is desired, an additional N words of memory
will be required in processor Prg +,I.)
VI. CONCLUSIONS
We have described a scheme for maintaining a 2-3-4 tree using
a pipeline of Flg N + 1] processors. Since the pipeline operates
using a request/reply paradigm, half of the processors can be pro-
cessing requests at any givenpoint in time. The factor oftwo comes
from the fact that until a processor Pi receives its reply from Pi+,,
the keys and/or pointers in Pi may be incorrect. Thus, the level
of concurrency in a Flg N + 11 processor configuration executing
an arbitrary sequence of SEARCH, INSERT, and DELETE commands
is rlg N + 11/2, as claimed in the Abstract. (A higher degree
of concurrency -up to Flg N + 11 could be obtained on a
long string of SEARCH commands, as these do not involve
INSERT_TRANSFORM or DELETE_TRANSFORM messages.) Since our
scheme requires O(lg N) timepertreeoperation, butallowsO(lg N)
concurrency on the operations, one operation completes every 0(1)
time units. This scheme could be a useful component for index
maintenance in a machine architecture specialized for information
storage and retrieval.
ACKNOWLEDGMENT
The authors gratefully acknowledge the assistance ofR. McCord
of Tolerant Systems and J. B y, an anonymous referee with a
distinctive style and a perspicacious mind.
REFERENCES
[1] A. Aho, J. Hopcroft, and J. Ullman, The Design andAnalysis ofCom-
puter Algorithms. Reading, MA: Addison-Wesley, 1974.
[21 J. Allchin, A. Keller, and G. Wiederhold, "FLASH: A language-
independent, portable file access system," in Proc. ACM SIGMOD Int.
Conf. on the Management ofData, 1980.
[3] P. K. Armstrong, U.S. Patent 4131 947, Dec. 26, 1978.
[4] M. J. Attallah and S. R. Kosaraju, "A generalized dictionary machine for
VLSI," Dep. Elec. Eng. Comput. Sci., Johns Hopkins Univ., Baltimore,
MD, Tech. Rep.; also, IEEE Trans. Comput., 1984, to be published.
[5] R. Bayer and E. McCreight, "Organization and maintenance of large
ordered indices," Acta Informatica, vol. 1, no. 3, 1972.
[6] J. L. Bentley and H. T. Kung, "Two papers on a tree-structured parallel
computer," Dep. Comput. Sci., Carnegie-Mellon Univ., Pittsburgh, PA,
Rep. CMU-CS-79-142, 1979.
[7] M. A. Bonuccelli, E. Lodi, F. Lucio, P. Maestrini, and L. Pagli, "A
VLSI tree machine forrelational data bases," in Proc. 10th Annu.ACM
Int. Symp. on Comput. Arch., June 1983, pp. 67-73.
[8] M. J. Carey and C. D. Thompson, "Anefficientimplementationofsearch
trees on O(log N) processors," Dep. Comput. Sci., Univ. California,
Berkeley, CA, Rep. UCB/CSD 82/101, Nov. 1982.
[9] D. Comer, "The ubiquitous B-tree," Comput. Surveys, vol. 11, no. 2,
June 1979.
[10] Y. Dohi, A. Suzuki, and N. Matsui, "Hardware sorter and its application
to data base machine," in Proc. 9th Annu. ACM Symp. on Comput.
Arch., SIGARCH Newsletter, vol. 10, no. 3, pp. 218-225, Apr. 1982.
[11] A. L. Fisher, "Dictionary machines with a small number ofprocessors,"
in Proc. 11th Annu. ACM Int. Symp. on Comput. Arch., June 1984.
[12] P. Flajolet and C. Puech, "Tree structures for partial match retrieval," in
Proc. 24thAnnu. IEEE Comput. Soc. Symp. onFoundations ofComput.
Sci., Nov. 1983, pp. 282-288.
[13] L. J. Guibas and R. Sedgewick, "A dichromatic framework for balanced
trees," inProc. 19thAnnu. IEEE Comput. Soc. Symp. onFoundations of
Comput. Sci., Oct. 1978, pp. 8-21.
[14] C. E. Leiserson, "Systolic priority queues," Dep. Comput. Sci.,
Carnegie-Mellon Univ., Pittsburgh, PA, Rep. CMU-CS-79-115, Apr.
1979.
[15] T. A. Ottmann, A. L. Rosenberg, and L. J. Stockmeyer, "A dictionary
machine (for VLSI)," IEEE Trans. Comput., vol. C-31, pp. 892-897,
Sept. 1982.
[16] A. K. Somani and V. K. Agarwal, "An unsorted dictionary machine for
VLSI", VLSIDesignLab., McGill Univ., Montreal, P.Q., Canada, 1983.
[17] S. W. Song, "A highly concurrent tree machine for database applica-
tions," in Proc. 1980 IEEE Int. Conf. on Parallel Processing,
pp. 259-268. Aug. 1980.
[18] , "On a high-performance VLSI solution to database problems,"
Ph.D. dissertation, Dep. Comput. Sci., Caregie-Mellon Univ., Pitts-
burgh, PA, CMU-CS-81-142, Aug. 1981.
[19] Y. Tanaka, Y. Nozaka, and A. Masuyama, "Pipeline searching and
sortingmodules as components ofdata flowdatabase computer," inProc.
Int. Fed. Inform. Processing, Oct. 1980, pp. 427-432.
A MultiprocessorArchitectureforGeneratingFractal Surfaces
STEPHEN L. STEPOWAY, DAVID L. WELLS,
AND GERALD R. KANE
Abstract -Fractal surfaces have recently been shown to be a useful
model for generating images of terrain in computer graphicS. Unfortu-
nately, the generation of fractal images is very costly in CPU time. A
multiprocessor architecture is described which takes advantage of the
parallelism inherent in fractals to speed the generation of images. The
performance ofthe processing array is analyzedalong with the suitability
of implementation in"VLSI.
Index Terms Architecture, computer graphics, fractal surfaces, par-
allel processing, VLSI.
Manuscript received March 1, 1984; revised July 14, 1984.
S. L. Stepoway and D. L. Wells are with the Department of Computer
Science and Engineering, Southern Methodist University, Dallas, TX 75275.
G. R. Kane is with the Department of Electrical Engineering, Southern
Methodist University, Dallas, TX 75275.
0018-9340/84/1100-1041$01.00 © 1984 IEEE
1041
Authorized licensed use limited to: The University of Auckland. Downloaded on April 16,2010 at 03:45:25 UTC from IEEE Xplore.  Restrictions apply. 