A new class of interconnection networks, the hypernetworks, has been proposed recently. Hypernetworks are characterized by h ypergraphs. Compared with point-to-point networks, they allow for increased resource-sharing and communication bandwidth utilization, and they are especially suitable for optical interconnects. In this paper, we propose a scheme for deriving new hypernetworks using hypergraph duals. As an example, we i n vestigate the dual, K n , o f t h e n-vertex complete graph K n , and show that it has many desirable properties. We also present a set of fundamental data communication algorithms for K n . Our results indicate that the K n hypernetwork is a useful and promising interconnection structure for high-performance parallel and distributed computing systems.
Introduction
The interprocessor communication performance is one of the most critical aspects of high-performance parallel and distributed computing systems. Designing high bandwidth, low latency and scalable interconnection networks is a great challenge faced by architecture designers. In recent y ears, we have seen the trend of seeking interconnection alternatives that combine the best features of lowdimensional networks, such a s l o wer wire densities and higher wire sharing, and best features of high-dimensional networks, such as smaller network diameters and higher potential scalability. Evidently, such alternatives are no long pure point-to-point networks. One of the major driving forces of these changes is the advance of optical interconnection technologies. Photons are non-charged particles, and do not naturally interact. Consequently, there are many desirable characteristics of optical interconnects: high speed speed of light, increased fanout, high bandwidth, high reliability, supporting longer interconnection lengths, exhibiting low p o wer requirements, and immunity t o EMI with reduced crosstalk. These characteristics have signi cant system con guration and complexity implications 5, 6, 7 . For example, multiple-bus con gurations with increased scalability are possible because of relaxed fanout and distance constraints. The optical fanout which i s t h e maximum number of processors that can be attached to an optical connecting device is not bound by capacitance but by the power that must be delivered to each receiver to maintain a speci ed bit-error-rate, referred to as optical power budget. Processors can be arranged at increased physical distances. Resource sharing, achieved by m ultiple accesses of optical interconnect devices using time-division multiplexing TDM, wavelength division multiplexing WDM, code division multiplexing CDM, space division multiplexing SDM, or hybrid multiplexing 8, 9 , is a fundamental advantage of optical networks. The emerging optical interconnect technologies will revolutionize interconnection network topologies.
Realizing that conventional graph theory is no longer adequate for the design and analysis of the new generation interconnection structures based on optical interconnect devices, a new class of interconnection networks, the hypernetworks, was proposed recently 12 . The class of hypernetworks is a generalization of point-to-point networks, and it contains point-to-point networks as a subclass. In a hypernetwork, the physical communication medium a hyperlink is accessible to multiple processors. The relaxation on the number of processors that can be connected by a link provides more design alternatives so that greater exibilities in trade-o s of contradicting design goals are possible. The underlying graph theoretic tool for investigating hypernetworks is hypergraph theory 3 . Hypergraphs are used to model hypernetworks. Hypernetwork designs have been formulated as a constrainted optimization problem of constructing hypergraphs.
In this paper, we propose a scheme for constructing a new hypernetwork from an existing one using the concept of dual graph in hypergraph theory. W e show that the dual H of any given hypergraph H is a hypergraph that have some properties related to the properties of H. T h us, based on the properties of H, one can investigate the properties of H . Since the structure of H and its dual H can be drastically di erent, nding hypergraph duals can be considered as a general approach to the design of new hypernetworks. We demonstrate this approach b y i n vestigating the structure of the dual K n of an n-vertex complete point-to-point network K n . W e present a set of fundamental data communication algorithms for K n . Our results indicate that the K n hypernetwork is a useful and promising interconnection network for high-performance parallel and distributed computing systems.
Preliminaries
Hypergraphs are used as underlying graph models of hypernetworks. A hypergraph 3 H = V;E consists of a set V = fv 1 ; v 2 ; ; v N g of vertices, and a set E = fe 1 ; e 2 ; ; e m g of hyperedges such that each e i is a non-empty subset of V and m i=1 e i = V . An edge e contains a vertex v if v 2 e. If e i e j implies that i = j, then H is a simple hypergraph. When the cardinality of an edge e, denoted as jej, is 1, it corresponds to a sel oop edge. If all the edges have cardinality 2, then H is a graph that corresponds to a point-to-point network. In this paper, we only consider simple hypergraphs and graphs. A hypergraph of n vertices and m hyperedges can also be de ned by its n m incidence matrix A with columns representing edges and rows representing vertices such that a i;j = 0 i f v i 6 2 e j , a i;j = 1 i f v i 2 e j .
For a subset J of f1; 2; ; m g, w e call the hypergraph H 0 V 0 ; E 0 such that E 0 = fe i ji 2 Jg and V 0 = e i 2E 0e i the partial hypergraph of H generated by the set J. F or a subset U of V , w e call the hypergraph H 00 V 00 ; E 00 such that E 00 = fe i Uj1 i m; e i U 6 = g and V 00 = e2E 00 e the sub-hypergraph induced by the set U. A hypernetwork M is a network whose underlying structure is a hypergraph H, in which each vertex v i corresponds to a unique processor P i of M, and each h yperedge e j corresponds to a connector that connects the processors represented by the vertices in e j . A connector is loosely de ned as an electronic or a photonic component through which messages are transmitted between connected processors, not necessarily simultaneously, in constant time. We call a connector a hyperlink.
The simplest implementation of a hyperlink is by a bus. Basically, there are two optical bus con gurations: dual-bus and folded bus. In a duel-bus system, every processor is connected to two unidirectional buses, and one bus attachment consists of a pair of transmitter e.g. laser diode and receiver e.g. photo diode. The two buses transmit in opposite directions so that there is a path from every processor to every other processor in the system. In a folded bus system, each processor is attached to the bus twice, one attachment for reading and the other for writing. The bus is divided into two sections, the up-stream section for processors to send data, and the down-stream section for processors to receive data. With TDM or CDM, the performance of dual-bus and folded bus can be improved. A photonic crossbar switch i s a h yperlink. A star coupler 9, 10 , which uses WDM, can be considered either as a generalized bus structure or a photonic switch, is another implementation of a hyperlink. In the rest of this paper, the following pairs of terms are used interchangeably: hyperedges and hyperlinks, vertices and processors, point-to-point networks and graphs, and hypernetworks and hypergraphs.
The problem of designing e cient i n terconnection networks can be considered as a constrainted optimization problem. For example, the goal of designing point-to-point networks is to nd wellstructured graphs whose ranks are xed, as a constant 2 with small degrees and diameters. In hypernetwork design, the relaxation on the number of processors that can be connected by a h yperlink i.e. the rank of the hyperlink provides more design alternatives so that greater exibilities in trade-o s of contradicting design goals are possible. We consider using the dual of a point-to-point graph as a hypernetwork. Properly labeling the vertices and hyperedges in a hypergraph can greatly simplify its use as a communication network. Vertex labels are used as processor addresses. Similarly, h yperedge labels are used as the unique names of hyperlinks. There are many w ays to label the vertices and hyperedges of K n . Although all di erent labeling schemes of K n are equivalent because the symmetries of K n Proposition 3, we c hoose to de ne the K n hypernetwork using an interesting scheme by which the connectivity o f K n can be concisely derived.
De nition 1 Let N n = nn , 1=2 for n 0. The K n hypernetwork, n 3, is a hypergraph that consists of N n vertices, v 1 ; v 2 ; :::; v Nn , and n hyperlinks, e 1 ; e 2 ; :::; e n . The connectivity of K n can be recursively de ned as follows: n is constructed f r om K n,1 by adding n , 1 more vertices v N n,1 +1 ; v N n,1 +2 ; :::; v N n,1 +n,1 = v Nn , and one more hyperlink e n such that all the newly added n , 1 vertices are c onnected t o e n and v N n,1 +m is connected to hyperlink e m , 1 m n , 1. For a vertex v i in K n , w e use i as its vertex label. Similarly, w e use j as the label of hyperedge e j of K n . By a simple induction on n, it is easy to show that K n is a complete graph of n vertices. By the properties of K n and above Propositions, we observe the following fact:
Fact 1 K n is 2-regular, n , 1-uniform, linear, and vertex and hyperedge symmetric; the diameter of K n i s 1 i f n = 3 , and 2 if n 3.
4
In the following alternative de nition, the connectivity o f K n hypernetwork is explicitly speci ed.
De nition 2 Let N n = nn , 1=2 for n 0. The K n hypernetwork, where n 3, i s a h y p ergraph that consists of N n vertices, v 1 ; v 2 ; ; v Nn , and n hyperlinks, e 1 ; e 2 ; ; e n . F or any two distinct vertices v i and v j , let u i = minfrjN r ig, u j = minfsjN s jg, l i = i , N u i ,1 , and l j = j , N u j ,1 . v i and v j are c onnected by a hyperlink if and only if one of the following conditions holds: 1 u i = u j ; 2 u i = l j ; 3 l i = u j ; o r 4 l i = l j . F urthermore, if 1 or 2 holds then v i ; v j 2 e u i , and if 3 or 4 holds then v i ; v j 2 e l i .
By a simple induction on n, one can easily see that De nitions 1 and 2 use the same vertex and hyperedge labeling schemes and they are equivalent. It is easy to verify that any v ertex v i of K n is connected to exactly two h yperedges e l and e u , where u = minfrjN r ig, and l = i , N u,1 . W e call hyperedges e l and e u the lower and upper hyperedge of v, respectively. F or any l and u such that 1 l u n, there is a unique vertex v i that is connected to hyperedges e l and e u , and furthermore, i = N u,1 + l. Therefore, a vertex v i of K n can be uniquely identi ed by an ordered pair hl;ui, 1 l u n. The notion of hl;ui can be interpreted in another way. I f w e group those vertices that share the same upper hyperlink, n , 1 groups also called blocks are formed. The k-th k 0 block contain k vertices. Vertices within each block are labeled based on the location of their lower hyperlinks in the block. Given vertex hl;ui, u , 1 is the block n umber of of the block it resides, and l is the rank of this vertex within the block. As shown in the next section, being able to address processors by h yperlinks is a useful property of the K n hypernetwork for the design and analysis of parallel algorithms. Figure 1 shows the bus implementation of the K 6 hypernetwork, whose corresponding K 6 is shown in Figure 2 .
The uniformity i.e. all hyperlinks consist of the same number of processors, regularity i.e. all the processors are included in the same number of hyperlinks, and linearity i.e. no two h yperlink share more than one processor of the K n hypernetwork have important implications. Consider the bus-based implementations of hypernetworks. Here, uniformity and linearity imply that the bus loads are evenly distributed and minimized, and regularity implies simpli ed processor design since all the processors have the same interface circuitry. V ertex hyperedge symmetry is important for a h ypergraph to be used as a hypernetwork, since it allows for all the processors hyperlinks to be treated as identical. Both De nitions 1 and 2 can be used to expand an existing K n hypernetwork to a K n+1 hypernetwork without modifying the connections in K n . The property that a larger hypernetwork can be easily constructed using smaller hypernetworks in the same class, when enhancement is desired, is call the the expandability o f a h ypernetwork. Clearly, the K n hypernetwork is easy to expand. The incremental expandability o f K n is discussed in Section 5. Proposition 5
indicates that K n can be partitioned into several smaller hypernetworks in the K n family. This property is useful in designing parallel algorithms for K n using the divide-and-conquer paradigm.
The K n hypernetwork may become infeasible when n is large. To improve scalability, w e can use K n as a building block to construct more complicated hypernetworks. In this section, we demonstrate how to use the vertex and hyperedge labels to design data communication algorithms for the K n hypernetwork. For simplicity, w e assume bidirectional bus implementation of hyperlinks. We also assume that transmitting a word between two processors connected by a bus takes constant time. Since a bus is shared by all its connected processors, at most one pair of processors can communicate at any time instance. Bus communications can be either synchronous and asynchronous. In asynchronous mode communication, arbiters are needed to allocate the bus to processors in an on-line fashion. We assume a synchronous mode communication. Bus allocations, although operated dynamically, are predetermined by an o -line scheduling algorithm. This bus operational mode has been used in 4 for analyzing a multiple-bus interprocessor connection structure. We consider four types communication operations: one-to-one communications, one-to-many communications, many-to-one communications and many-to-many communications. We show that the performances of our algorithms are either optimal ROUTE and BROADCAST or optimal within a constant factor PERMUTATION, REDUCTION, TOTAL EXCHANGE and PREFIX. These communication algorithms constitute a powerful set of tools for designing parallel algorithms 6 on the K n hypernetwork.
One-to-One Communications
We consider two fundamental one-to-one communication operations, shortest path routing between two processors, and data exchange using a permutation.
Shortest path routing
The following algorithm can be used for data routing from v i = hl i ; u i i to v j = hl j ; u j i in K n .
procedure ROUTEhl i ; u i i, hl j ; u j i begin if u i = u j or u i = l j then hl i ; u i i sends the message to hu i ; l j i using hyperlink e u i else if l i = u j or l i = l j then hl i ; u i i sends the message to hl j ; u j i using hyperlink e l i else * hl i ; u i i and hl j ; u j i do not share a hyperlink * l = minfl i ; l j g; if l i = l then hl i ; u i i sends the message to hl j ; u j i through the path hl i ; u i i; e l i ; hl i ; u j i; e u j ; hl j ; u j i else hl i ; u i i sends the message to hl j ; u j i through the path hl i ; u i i; e u i ; hl j ; u i i; e l j ; hl j ; u j i end It is easy to verify that for any given pair of processors v i and v j in the K n hypernetwork, algorithm ROUTE routes a message from v i to v j , or vise verser, along a shortest path.
Permutation
Permutation is a bijection on the set of processors in K n . In a permutation communication operation, each processor ha; bi sends a message to another processor ha 0 ; b 0 i, and each processor receives a message from exactly one processor. We use a set of N n ordered processor pairs ha; bi; ha 0 ; b 0 i t o represent a n p e r m utation. In each pair ha; bi; ha 0 ; b 0 i, ha; bi and ha 0 ; b 0 i are called the source processor and destination processor of the pair, respectively. W e use A ha;bi;ha 0 ;b 0 i to denote a message to be sent from ha; bi to ha 0 ; b 0 i. A permutation is called a total permutation if ha; bi 6 = ha 0 ; b 0 i for all pairs; otherwise, it is called a partial permutation. W e only consider total permutations, since a partial permutation can be carried out using a total permutation by masking out those processors which are mapped to themselves.
We present an algorithm PERMUTATION which performs a permutation operation e ciently. For each of these three cases, algorithm PERMUTATION routes the messages strictly along paths shown in Figure 3 . Based on these path patterns, we call a message a two-step message if it follows a path of length 2 cases ii and iii ; otherwise, it is called a one-step message case i. Note that the source and destination processors of a two-step message may be distance 1 apart. Algorithm PERMUTATION consists of two phases. In the rst phase, all one-step messages are sent to their destinations, and all two-step messages are routed to the intermediate processors of their routing paths. In the second phase, all two-step messages are sent to their destinations. In each phase, a hyperlink may be used to transmit more than one message.
procedure PERMUTATION begin Use e k to sequentially transmit the two-step messages with e k assigned for their second step endfor end
Observing Figure 3 , we see the following: In the rst phase, e k is used to transmit messages of source processors ha; ki. Since there are at most n , 1 such source processors in a permutation, the number of messages to be transmitted using e k is at most n , 1. Thus, the total number of parallel message transmission steps in the rst phase is no more than n , 1. In the second phase, each e k is used to transmit messages to destination processors ha 0 ; k i and there are at most n , 1 such destination processors in a permutation. The total number of parallel message transmission steps in the second phase is also at most n , 1. Hence, the total number of steps performed by PERMUTATION is 2n , 1. There are N n = nn , 1=2 messages. Each message destinates a distinct processor in a total permutation and all these nn , 1=2 messages need to be transmitted. At least n , 1=2 parallel message transmission steps are required in the worst case because there are n hyperlinks in K n . Hence, the performance of PERMUTATION is optimal within a constant factor.
Careful readers may notice that hyperlink e 1 is not used in PERMUTATION. I f w e let e 1 to share some communication load, the permutation performance can be slightly improved. In fact, by evenly distributing the communication load among hyperlinks, the performances of all algorithms presented in this paper, excluding ROUTE and BROADCAST, can be slightly improved. However, the modi ed algorithms will be more complicated. 
One-to-Many Communication
Consider the following algorithm for broadcasting a message from any processor v = hl;ui to all the other processors in K n .
procedure BROADCASThl;ui begin hl;ui broadcasts the message to all the processors connected by e u ; for all the processors ha; bi such that a = u or b = u do in parallel if a = u then ha; bi broadcasts the message to processors in fhi; biji a g using e b ; if b = u then ha; bi broadcasts the message to processors in fha; jijj 6 = bg using e a endfor end
The processors in the K n hypernetwork can be partitioned into ve m utually disjoint groups. group 1 : fha; bi j a = l^b = ug group 2 : fha; bi j a = ug group 3 : fha; bi j a 6 = l^b = ug group 4 : fha; bi j a u g group 5 : fha; bi j a û b 6 = ug Group 1 contains one processor, the source processor. After the rst step, processors in group 2 and group 3 receive the message. In the second step, each processor in group 4 receives the message from a processor in group 2 via the rst if statement, and each processor in group 5 receives the message from a processor in group 3 via the second if statement. The performance of BROADCAST is optimal.
Many-to-One Communication
A reduction or census, or fan-in function is de ned as a commutative and associative operation on a set of values, such as nding maximum, addition, logic or, etc. It can be carried out using a many-to-one communication operation. The following is an algorithm for performing a reduction operation speci ed by the operator + on a set of N n values A 1 ; A 2 ; ; A Nn stored in v 1 ; v 2 ; ; v Nn , and putting the nal result in v 1 
Many-to-Many Communication
We consider two cases: all-to-all communication and pre x computation. In all-to-all communication, each processor sends a message to all the other processors. It is also called the total exchange operation. The pre x computation can be considered as a many-to-many operation since many results are computed using many operands.
All-to-all communication
We can obtain an all-to-all communication by modifying the algorithm REDUCTION. The operator used is set union. After n , 1 steps, v 1 receives all messages. Then, using two additional steps, v 1 broadcast all the N n messages to all processors in K n . A drawback of this algorithm is that each step transmits ON n messages along a hyperlink. We give another algorithm with improved performance.
procedure TOTAL EXCHANGE begin * Phase 1: intra-block total-exchange * for j = 3 to n do in parallel for i = 1 to j , 1 do Processor hi; ji broadcasts its message to processors in fha; jija 6 = ig using e j endfor endfor; Denote the set of messages processor hi; ji has by S hi;ji ; * Phase 2: inter-block total-exchange * for i = 2 to n do h1; i i broadcasts S h1;ii to processors in fh1; b ijb 6 = ig using e 1 ; for all the processors in fh1; b ijb 6 = ig do in parallel h1; b i broadcasts S h1;ii it received to processors in fha; bija 6 = 1 g using e b endfor endfor end <1,2>  <1,3>  <2,3>  <1,4>  <2,4>  <3,4>  <1,5>  <2,5>  <3,5>  <4,5>  <1,6>  <2,6>  <3,6>  <4, Algorithm TOTAL EXCHANGE has two phases. The rst phase consists of n,1 parallel intrablock broadcasting operations. The second phase consists of n , 1 iterations, each iteration has two parallel communication steps, one for inter-block broadcasting and the other for intra-block broadcasting. For K 6 , the communication patterns of TOTAL EXCHANGE are shown in Figure 4 .
The correctness of the algorithm directly follows from the de nition of K n and the communication patterns used.
The number of parallel communication steps performed by TOTAL EXCHANGE is optimal within a constant factor since the lower bound n of the number of communication steps for a many-to-one communication operation holds for the total-exchange operations. Each processor is connected to two h yperlinks, and it needs to receive N n , 1 ,2>  <1,3>  <2,3>  <1,4>  <2,4>  <3,4>  <1,5>  <2,5>  <3,5>  <4,5>  <1,6>  <2,6>  <3,6>  <4, . Algorithm PREFIX consists of three phases. The rst phase performs pre x computation for all blocks in parallel. This phase only requires n , 2 i n tra-block communications using di erent h yperlinks. The second phase performs one parallel inter-block broadcasting operation. The third phase also uses n , 2 parallel intra-block broadcasting operations. For K are shown in Figure 5 . Generalizing these patterns using the de nition of K n , w e can conclude that algorithm PREFIX carries out a pre x computation using 2n,3 parallel communication steps, one operand or partial result value is broadcast along a hyperlink per step.
The n l o wer bound for the number of communication steps of REDUCTION holds for the pre x computation. Since one value is broadcast per hyperlink in each communication step of algorithm PREFIX, the communication performance of PREFIX is optimal within a constant factor.
Incomplete K n Hypernetwork
We observe that the gap, N n , N n,1 = n , 1, between K n,1 and K n is not a constant. It is desirable that hypernetworks can be expanded with incremental size increases. For any given N such that N n,1 N N n , w e can construct a sub-hypergraph H of K n such that jV Hj = N and jEHj = n. Such a sub-hypergraph is called an incomplete K n hypergraph.
De nition 3 The incomplete K n hypernetwork, where n 3, o f N vertices such that N n,1 N N n is the sub-hypergraph of K n induced by vertex set fv 1 ; v 2 ; ; v N g. In other words, an incomplete K n hypernetwork of N vertices, N n,1 N N n , is de ned by the incidence matrix obtained from the incidence matrix of K n by deleting its rows corresponding vertices v N+1 ; v N+2 ; ; v Nn . The vertices in an incomplete K n can be divided into n , 1 blocks. The i-th block has i vertices for 1 i n , 1 a s i n K n , and the n , 1-th block has at least one vertex and at most n , 2 v ertices. For convenience, we call the n , 1-th block of an incomplete K n its incomplete block. W e use k n to denote the numb e r o f v ertices in the incomplete block o f a n incomplete K
n . An incomplete K n is linear and 2-regular, but it is not uniform, and not vertex and hyperedge symmetric. It is not di cult to prove that the diameter of incomplete K n hypernetwork, where n 3, is 2.
It is easy to verify that the shortest path routing communication algorithm ROUTE and data broadcasting algorithm BROADCAST presented in the previous section can be directly used for the incomplete K n hypernetwork. Consider the reduction operation. Since an incomplete K n is not symmetric, we cannot use procedure TRANSFORM to relabel the processors. We adapt REDUCTION given in the previous section to an incomplete K n by adding one operation: send the nal result from h1; 2i to the nal destination hl;ui using hyperlinks in at most two additional steps. It is simple to verify that the all-to-all data communication algorithm TOTAL EXCHANGE presented in the previous section can be used for the incomplete K n hypernetwork. This is done by treating all the processors v j such that j N in the K n hypernetwork as dummy processors that do not participate in communications. For each of these cases, algorithm PERMUTATION INC routes the messages strictly along paths shown in Figure 6 . Algorithm PERMUTATION INC is similar to PERMUTATION. It consists of three phases. In the rst phase, all one-step messages are sent to their destinations, and all two-step and three-step messages are routed to the next processors on their routing paths. In the second phase, all two-step messages are sent to their destinations, and all three-step messages are sent to the third processors on their routing paths. Then, in the third phase, all three-step messages reach their destinations. As in algorithm PERMUTATION, in each phase of algorithm PERMUTATION INC, messages are transmitted on di erent h yperlinks in parallel, and messages are transmitted on the same hyperlink sequentially. Observe Figure 6 . In the rst phase, the number of messages transmitted using hyperlink e n cases i, v and vii is at most k n , which is less than n , 1 since the incomplete block has at most n , 2 processors, and the number of messages transmitted on any other hyperlink e b is also no more than n , 2 since the number of messages with ha; bi as their source processors is at most n , 2. Hence, the rst phase has no more than n,2 parallel message transmission steps. In the second phase, the number of messages transmitted using e b 0, b 0 6 = 1 and b 0 6 = 2, is at most n , 2, because there are at most n , 2 t wo-step messages with destination processors ha 0 ; b 0 i such that b 0 6 = 1 and b 0 6 = 2 Note: actually, there is no processor ha 0 ; b 0 i with b 0 = 1 i n K N . The messages transmitted using e 1 satisfy conditions iv and v, and the total number of such messages is no more than k n +1. The messages transmitted using e 2 satisfy vi and vii or b 0 = 2, and there are at most 2k n ,d k n =2e+1 k n +1 such messages. Therefore, the second phase has no more than maxfn , 2; k n + 1 g parallel message transmission steps. In the third phase, the number of messages transmitted using e n is no more than k n , because there are at most k n three-step messages with processors in the incomplete block as destination processors. The number of messages transmitted using any other e b 0 is also no more than k n , because the number of three-step messages with processors in the incomplete block as source processors is no more than k n . The total number of parallel message transmission steps performed by PERMUTATION INC for any permutation operation is no more than n , 2 + maxfn , 2; k n + 1 g + k n . The worst case is that k n = n , 2, which results in 3n , 5 steps. Comparing with the n l o wer bound, the 14 performance of PERMUTATION INC is optimal within a constant factor.
The algorithm PREFIX given in the previous section also cannot be directly applied to an incomplete K n hypernetwork. We h a ve to modify it to obtain an algorithm with similar performance. This phase requires n,3 i n tra-block communications using di erent h yperlinks. The second phase and the third phase are the same as the corresponding phases in PREFIX, but all communication and computation are restricted to the rst n,2 blocks of the incomplete K n . These two phase have one and n,3 parallel communication steps, respectively. The fourth phase requires 3 communication steps to broadcast the partial result of hn,2; n ,1i, the rightmost processor of the n , 2-th block, to the n , 1-th block which is the incomplete block.
It is easy to verify that algorithm PREFIX INC carries out a pre x computation on an incomplete K n hypernetwork in 2n,2 parallel communication steps, which is optimal within a constant factor.
For an incomplete K 6 of 13 processors, the communication patterns are shown in Figure 7 .
Discussions
We s a y that a linear hypernetwork is non-trivial if it has at least 4 vertices, at least 2 hyperlinks, Among all the hypergraphs derived from duals of point-to-point graphs, the dual, K n , o f t h e n-vertex complete graph K n has the smallest m=N ratio when N is xed and smallest diameter, where m and N is the number of hyperedges and vertices, respectively. We h a ve discussed the K n hypernetwork in much detail. Between the high cost performance of fully connected network K n and low cost performance of linearly connected network a ring are a set of point-to-point networks that constitute a wide range of trade-o s in cost and performance. For example, H can be point-to-point networks such a s h ypercubes, star graphs 1 , chordal rings including barrel shifters 2 , etc. The duals of these point-to-point networks also constitute a wide range of trade-o s in cost and performance.
In any point-to-point network, the number of links is at least equal to the number of processors except a tree, in which the number of links is one less the number of processors. A trivial lower bound on the time complexity of parallel algorithm on a point-to-point network is the best sequential time divided by the number of processors. But in a hypernetwork, it is desirable that the number of hyperlinks is less than the number of processors due to cost-e ectiveness consideration. In such a situation, the number of hyperlinks, the rank of hyperlinks and the hypernetwork degree are important factors in determining the lower bounds of time complexities of parallel algorithms, as demonstrated in our algorithm analysis. If we replace each bus by a crossbar switch, more e cient algorithms for the communication and computing problems we considered are possible. For example, using crossbar switches as hyperlinks of K n , reduction and pre x operations can be implemented in Olog n time, which is optimal. The On 2 time complexity of total-exchange operation on the K n hypernetwork cannot be improved because of the constant degree of K n . W e do not know if the time complexity of permutation operation on the K n hypernetwork with crossbar switch h yperlinks can be reduced to Olog n.
Most discussions in this paper are restricted to constant degree more speci cally, degree 2 linear hypernetworks. Our approach can be easily generalized to the design and analysis of variable-degree and or non-linear hypernetworks. Hypernetwork design is formulated as a constrainted hypergraph construction optimization problem. Hypergraph theory plays a central role in hypernetwork design and analysis. Simple hypergraph theory concepts, such as Steiner triple systems and hypergraph duals, have led to several interesting hypernetwork topologies as demonstrated in 13 and this paper. It has been pointed out in 12 that hypernetwork designs are also related to block design problems in combinatorial mathematics, which in turn are related to algebra and number theory.
