Abstract-The parallel computation model upon which the proposed algorithms are based is the hyper-bus broadcast network. The hyper-bus broadcast network consists of processors which are connected by global buses only. Based on such an improved architecture, we first design two O(1) time basic operations for finding the maximum and minimum of N numbers each of size O(log N)-bit and computing the matrix multiplication operation of two N ¥ N matrices, respectively. Then, based on these two basic operations, three of the most important instances in the algebraic path problem, the connectivity problem, and several related problems are all solved in O(log N) time. These include the all-pair shortest paths, the minimum-weight spanning tree, the transitive closure, the connected component, the biconnected component, the articulation point, and the bridge problems, either in an undirected or a directed graph, respectively.
INTRODUCTION
HE algebraic path problem can be viewed as a collection of all related path problems in graph theory. The allpair shortest paths problem, the minimum-weight spanning tree problem, and the transitive closure problem all are instances of the algebraic path problem [5] , [10] , [23] . The connectivity problem of a graph is the minimum number of vertices (edges) whose removal makes this graph disconnect. Both the connected component problem and the biconnected component problem are its instances, too. Besides, the connected recognition problem, the biconnected recognition problem, the articulation point problem, and the bridge problem all are related to the connectivity problem. In graph theory, there are lots of applications that can be derived from those problems, as mentioned above.
For increasing the computation power of a computer system, parallel processing is the best way to achieve this goal. Many algorithms have been developed in various parallel computation models. Algorithms developed in the shared memory model are very efficient, but hardly enough to be implemented by the current technology. With the advance of VLSI technology, more practical parallel processing systems such as mesh-connected machines and cube-connected machines are constructed by interconnection networks [9] , [11] . Unfortunately, for mesh-connected machines, extra time complexity could be required for the global communications in such an architecture due to the large communication diameter of the system. To overcome the long distance communications, equipping the global buses to the existing parallel processing systems (usually called as broadcast-based networks) has been proposed by many researchers recently [1] , [3] , [4] , [5] , [6] , [7] , [12] , [13] , [16] , [17] , [18] , [21] , [22] , [25] , [27] , [28] . There are various features of the broadcast-based networks, such as the broadcast communication model (BCM) [7] , [16] , [17] , [27] , the multichannel broadcast network (MCBN) [20] , [28] , the mesh-connected computers with multiple broadcasting (MCCMB) [1] , [4] , the generalized mesh-connected computers with multiple buses (GMCCMB) [6] , the mesh-connected computers with hyperbus broadcasting (MCCHB) [12] , and the reconfigurable network (RN) [3] , [5] , [13] , [18] , [21] , [22] , [25] .
A hyper-bus broadcast network (HBBN) is one of the broadcast-based networks. It consists of processors only sharing some global buses and there are no local links between processors. Compared to the other existing broadcast-based networks, it has the following two properties. First, instead of increasing the number of dimensions used, it can solve the bus-contention problem caused by the BCM and the MCBN by increasing the number of global buses. Second, since it has no local buses between processors, it can save silicon area when it is directly implemented by a VLSI chip. In other words, the algorithm derived in the HBBN can be easily modified to run on the MCCMB, GMCCMB, MCCHB, and RN with the same time and processor complexities. As for demonstrating the computation 1045-9219/97/$10.00 © 1997 IEEE
----------------
• H.-R. Tsai power of the HBBN, several interesting graph algorithms are derived in this paper. We first design three efficient algorithms for solving the all-pair shortest paths problem, the minimum-weight spanning tree problem, and the transitive closure problem, respectively, which are the three instances of the most important problems in the algebraic path problem. Then, the connectivity and several related problems are also derived; these include the connected component problem, the biconnected component problem, the articulation point problem, and the bridge problem, either in an undirected or a directed graph, respectively. Note that, if our algorithms for solving the all-pair shortest paths problem and the minimum-weight spanning tree problem are directly implemented on the reconfigurable networks, they are more efficient than those derived by Chen et al. [5] . That is, we achieve the same time complexity but reduce the num- -based on the concurrent-write bus resolution scheme, or by a factor of N based on the extended concurrent-write bus resolution scheme, where c is a constant and c ≥ 1. The rest of this paper is organized as follows. We first describe the hyper-bus broadcast network upon which our algorithms are based in Section 2. Section 3 deals with two basic operations, including the maximum/minimum operation and the matrix multiplication operation. Section 4 develops several applications based on the basic operations proposed in Section 3. Finally, some concluding remarks are included in the last section.
HYPER-BUS BROADCAST NETWORK
A one-dimensional (1D) HBBN (it can be recognized as a single-channel broadcast communication model) contains N processors. Each processor is identified by a one-tuple unique index i, 0 £ i < N. The processor with index i is denoted by P i . Each processor P i has a port denoted by S 0 and each port S 0 is connected to the broadcasting bus.
Similarly, an r-dimensional (rD) HBBN contains M processors, as physically arranged in a linear order, where
K . Each processor is connected to r broadcasting buses for r ≥ 1. Physically, each processor is identified by a one-tuple unique index i denoted by
Logically, each processor is identified by a r-tuple unique in-
has r ports denoted by S j , and each port S j is connected to the i jdimension global bus (one for each dimension) for 0 £ j < r. By using the row-major order mapping, we can easily transform the rD logical connections to the 1D physical connections. We show an example for a 2D 4 ¥ 4 HBBN by establishing the physical connections in Fig. 1a . Fig. 1b shows its logical connections. A logical 3D 4 ¥ 4 ¥ 4 HBBN is also shown in Fig. 2 . The number of ports in each processor is not unlimited. From a VLSI viewpoint, the more output ports of a chip are, the harder for building it will be. Hence, for efficiently saving the silicon area, we let the number of ports in each processor (i.e., the number of dimensions) be bounded within the range [1, log M] . Such a strategy makes the architecture reasonable and implementable. For example, when r = 4 (i.e., four-dimensional HBBN), the number of ports in each processor is equal to the two-dimensional mesh-connected computers; when r = log M (i.e., log Mdimensional HBBN), the number of ports in each processor is log M and the architecture can be recognized as the cubeconnected machines.
Assume the HBBN is worked on a word model. For a unit of time, each processor can either perform arithmetic and logic operations, or communicate with others by broadcasting data on a bus. It allows multiple processors to broadcast data on the different buses, simultaneously at a time unit, if there is no collision. If more than one processor attempts to broadcast different data on the same bus simultaneously, then a resolution scheme should be applied; otherwise, a collision occurs and the final data received is unexpected. We also allow multiple processors to broadcast the same data on the same bus, simultaneously at a time unit, if there is no collision. For the resolution scheme of bus arbitration, two models are considered to deal with the bus-contention problem. For the first model, named concurrent-write model, the global bus width of it is assumed to be log N-bit and the concurrent-write operation is allowed to write only on one specified bit of the global bus with a "0" or "1." For the second model, named extended concurrent-write model, the concurrent-write operation is extended to all bits of the global bus. That is, the concurrentwrite operation can write any individual bit of a global bus. The concurrent-write model can be also referred to as the well-known weak conflict resolution rule of the CRCW PRAM (concurrent read and concurrent write parallel random access machine) model proposed by Kucera [14] . For fully utilizing the available global bus, the extended concurrentwrite model was also used to solve various problems by some researchers [3] , [13] . Practically, the concurrent write ability is implemented in the content-addressable array parallel processor as proposed by Shu et al. [25] .
An HBBN is operated in an SIMD (single instruction stream, multiple data streams) model. An enable/disable mask can be used to select a subset of the processing elements that are to perform an instruction. Only the enabled processors will perform the same instruction. The remaining processors will be idle. As for easily presenting our algorithms, let var( , , , )
K denote the local variable var (memory or register) in a processor with index
sum of processor P 0,0,1 . The complexity of an algorithm is assumed to be the sum of the maximal computation time among all processors and the maximal communication time among all processors. This assumption was also used by many researchers [1] , [3] , [4] , [5] , [7] , [13] , [15] , [16] , [17] , [18] , [20] , [21] , [22] , [25] , [27] , [28] .
BASIC OPERATIONS
Some data operations will be described in this section. These data operations will be used for developing several efficient algorithms in the following section. For the sake of completeness, we shall first review the maximum/minimum operation which had been proposed by Kao and Horng [13] in detail. Then, several efficient basic operations are derived later.
LEMMA 1 [26] . 
The Maximum/Minimum Operation
, , , K -each of size log N-bit, the maximum (minimum) operation is to find the maximum (minimum) number among these N numbers. Without loss of generality, assume that each A j is a log N-bit unsigned and distinct integer and it is represented by the base-2 number system as follows:
where
Instead of using the base-2 number system, each A j can be represented by the base-w number system as follows:
where T = Îlog w N˚ + 1, 0 £ a j,l < w, 0 £ A j < N, and 0 £ j < N. Based on (2), the maximum (minimum) number of these N unsigned integers can be found as follows. First, each processor P j , 0 £ j < N, computes the a j,l , 0 £ l < T, from A j by using the division operation in O(T) time. That is, each A j is represented by T digits and each digit is bounded within the interval [0, w -1]. Then, apply the prune-and-search technique on the a j,l , where 0 £ j < N and 0 £ l < T, to find the maximum (minimum) number of these N unsigned integers. That is, at lth iteration, A j (A i ) is pruned if the digit a j,l of A j is less (greater) than the digit a i,l of A i . Repeat this process from the most significant digit to the least significant digit. Then, the maximum (minimum) number of these N unsigned integers can be obtained. Based on the concurrent-write ability of the global bus, this approach, as proposed by Kao and Horng [13] , can be easily implemented on a 1D N HBBN with the extended concurrent-write bus resolution scheme. Assume that the bus width is w-bit. We have the following lemma. LEMMA 3 [13] , [26] . Given N integer numbers each of size log N-bit, the maximum (minimum) can be found in O(T) time on a 1D N HBBN with the extended concurrent-write bus resolution scheme, where the bus width of the global bus is w-bit for 2 £ w < N and T = Îlog w N˚ + 1.
In the following subsection, another result for the same problem is derived, based on the concurrent-write bus resolution scheme (i.e., the weak conflict resolution rule). That is, the concurrent-write operation is limited only for a special bit of the global bus. By increasing the number of processors to be used, we show how to implement this idea on a 2D w ¥ N HBBN with the concurrent-write bus resolution scheme. Initially, the state of each processor is set in "active," all global buses are cleared to 0, and each A i 0 is stored in processor P i 0 0 , , where 0 £ i 0 < N. Finally, the maximum (minimum) number and its associated index are stored in the max(0, 0) and mid(0, 0) local variables of processor P 0,0 , respectively. We show the detailed maximum algorithm (MAA) as follows. The minimum algorithm can also be designed similarly. Assume N = 8 and w = 3. Fig. 3 shows an example for the data 3, 7, 2, 0, 6, 5, 4, and 1 to be executed by algorithm MAA.
Algorithm MAA(A, max, mid); /* A is an input variable. max and mid are output variables. */ 0: begin.
1: Processor
, , 0 £ i 1 < w, through i 1 -dimension global bus; then, processor PROOF. We have N data and each is located to a processor. There are at most T digits. We compare all data from the most significant digit to the least significant digit by
Step 2. During the lth iteration of Step 2, only those data having the maximal a i l 0 , will survive and others will be pruned. That is, at most, Lemmas 3 and 4 can be easily modified to process the nondistinct data, the signed integer data, and the real number data, respectively. The interested reader can refer to the literature [13] for l = 0, after Step 2.2, (h) processor's state for l = 0, after
Step 2.3, (i) max(0,0) = 7 and mid(0,0) = 1, after Step 3. AS: active state, IS: inactive state. 
The Matrix Multiplication Operation
Step 2, com-
Step 3, compute 
LEMMA 5. Given two N ¥ N matrices A and B, if the operator ≈ is the logical OR (AND) operator, then the matrix multiplication operation of A and B can be computed in O(1) time on a 3D N ¥ N ¥ N HBBN, either with the concurrent-write or the extended concurrent-write bus resolution schemes.

LEMMA 6. Given two N ¥ N matrices A and B, if the operator ≈ is the + (addition) operator, then the matrix multiplication operation of A and B can be computed in O(log N) time on a 4D N ¥ N ¥ N ¥ N HBBN, either with the concurrentwrite or the extended concurrent-write bus resolution schemes.
LEMMA 7. Given two N ¥ N matrices A and B, if the operator ≈ is the maximum (minimum) operator, then the matrix multiplication operation of A and B can be computed in O(
APPLICATIONS
In this section, we will develop several applications for graph problems using the basic operations described in the previous section. These include the algebraic path problem, the connectivity problem and several related problems.
The Algebraic Path Problem
The algebraic path problem is defined in terms of a weighted graph G = (V, E, w), where V is the set of vertices, E is the set of edges, and w(e) is the associated weight of each edge e OE E [10] , [23] . The edge weight w(e) from vertex i to vertex j is defined in a semiring (or dioid) (H, +, ¥), where + (addition) and ¥ (multiplication) are closed binary associative operators over the set of elements H with 0 and 1 the respective identity elements. + is commutative, ¥ is distributive with respect to +, and 0 is an absorptive element (a ¥ 0 = 0 ¥ a = 0, a OE H) with respect to ¥. The weight of a path from vertex i to vertex j is defined as the product of the edge weights in the path. The weight of an empty path is defined as 1, which is the identity with respect to ¥. 
Based on the idea as stated by Akl [2] , c i j l , of (4) at each iteration l can be computed using the matrix multiplication operation, as mentioned in the previous section, by replacing the operators ≈ and ƒ with + and ¥, respectively. That is, c i j l , of (4) The application of the algebraic path problem is diversified for different definitions of semiring. For example, the semiring (R < {•}, min, +) is for the all-pair shortest paths problem, the semiring (R < {•}, min, max) is for the minimum-weight spanning tree problem, and the semiring ({0, 1}, ⁄, Ÿ) is for the transitive closure problem, where R < {•}, ⁄, and Ÿ denote the set of positive real numbers extended by plus infinity, the logical OR operation, and the logical AND operation, respectively. In other words, by properly replacing the operators + and ¥ of (4), both the time and processor complexities of some instances of the algebraic path problem can be reduced enormously. In the following, we will show how to solve three important and typical instances of the algebraic path problem: the all-pair shortest paths, the minimum-weight spanning tree, and the transitive closure problems.
4.1.1.The All-Pair Shortest Paths Problem
The all-pair shortest paths problem of G is defined to find the shortest path for each pair of vertices with a minimum total length. Initially, the adjacency matrix of G is given by 
The Minimum-Weight Spanning Tree Problem
The minimum-weight spanning tree problem of G is defined to find a tree with a minimum total weight. Initially, the adjacency matrix of G is given by Based on the approach proposed by Maggs and Plotkin [19] , we can rewrite (4) by replacing the operators + and ¥ with minimum and maximum. Hence, the minimum-weight spanning tree problem is also an instance of algebraic path problem and the iteration step can be represented by 
The Transitive Closure Problem
The transitive closure of G, denoted as A * , is defined as that Based on the well-known technique, we can rewrite (4) by replacing the operator + and ¥ with logical OR and with logical AND. Hence, the transitive closure problem is also an instance of the algebraic path problem and the iteration step can be represented by
c i j l , of (7) 
The Connectivity and Several Related Problems
Based on the minimum-weight spanning tree and the transitive closure algorithms proposed previously, the connectivity and some related problems can be solved in the following subsection. We first discuss two O(log N) time algorithms for solving the connected recognition and component problems. Then, based on the idea as proposed by Savage and JáJá [24] , the articulation point, the bridge, the biconnected recognition, and component problems all can be also solved in O(log N) time. All algorithms developed in this subsection are based upon the concurrent-write bus resolution scheme. The same results can be also derived on the extended concurrent-write bus resolution scheme with the same number of processors. 
The Connected Recognition and Component Problems
The Articulation Point and Bridge Problems
A vertex i OE V of G is an articulation point if and only if G is disconnected after removing the vertex i and its incident edges. The articulation points of G can be found by the following two major steps. First, for each vertex k, 0 £ k < N, construct a subgraph G k from G by removing the vertex k and its incident edges. Then, for each subgraph
apply Theorem 5 to test whether G k is a connected graph or not. If G k is a disconnected graph, then the vertex k is an articulation point; otherwise, it is not. An edge e, e OE E, is a bridge if and only if G is disconnected after removing the edge e from G. The bridges of G can be determined by the following four major steps. First, apply Theorem 3 to find all N -1 edges of a minimumweight spanning tree of G. Next, apply Lemma 2 to rank these N -1 edges from 1 to N -1. Then, for each edge e l with its associated rank l, 1 £ l < N, construct a subgraph G e l from G by removing the edge e l . Finally, for each subgraph G e l , 1 £ l < N, apply Theorem 5 to determine whether e l is a bridge or not. That is, if G e l is a disconnected graph, then the edge e l is a bridge; otherwise, it is not. 
The Biconnected Recognition and Component Problems
A graph G is a biconnected graph if G -{k} is connected for every vertex k OE V. That is, a graph is biconnected if and only if it has no articulation points. Following the articulation point problem, as stated previously, the biconnected recognition problem can be solved in a straightforward manner. The biconnected components problem of G is defined to find all maximal biconnected subgraphs. The problem can be solved by the following four major steps. First, for each ver- 
CONCLUDING REMARKS
The hyper-bus broadcast network is quite suitable for reducing the long-distance communications existed in the interconnected parallel processing systems and is practically implementable, as all processors are only linked by global buses. It can be recognized as a reduced computation model of the MCCMB, GMCCMB, MCCHB, and RN. Hence, all proposed algorithms can be easily modified to run on these models in the same time and processor complexities. In this paper, based on the proposed basic operations, we have shown the algebraic path problem can be solved in O(log 2 N) time. By properly modifying the operators of the basic operation, three important instances of the algebraic path problem such as the all-pair shortest paths, minimumweight spanning tree, and the transitive closure problems can be solved in O(log N) time. Furthermore, we have also derived some O(log N) time algorithms for the connectivity and several related problems. Note that these problems could be easily modified for directed graphs. Most of the results derived in this paper are far better than those derived by Yang et al. [27] , [28] and Chen et al. [5] . The detailed comparisons are listed in Table 1 . With such a newly created architecture, we believe that there are lots of problems could be also solved in the order of polylogarithmic time. 
