Abstract. This paper presents an algorithm for generating all the permutations defined by linear transformations on a shuffle-exchange network of 2 processors in 2n-passes. The proposed algorithm generates any such permutation in O(n log n) elementary steps. The subclass ofbit-permutations is generated in O(n) steps.
1. Introduction. The shuffle-exchange (SE) network is an efficient tool for implementing various types of parallel processes [2] , [6] . The SE network is composed of N=2 processors, where each processor is represented by a binary n-tuple (Xl, x2," , xn). In the SHUFFLE-operation processor (xl, x2,. , x,) transfers information to processor (x2,.. ",xn, xl). In the EXCHANGE-operation processors (xl, x2, , x_l, 0) and (xl, x2," , xn_l, 1) may exchange information, independent of other pairs of this form.
One SHUFFLE followed by one EXCHANGE is called a pass. Between the SHUFFLE phase and the EXCHANGE phase of a pass there is a computational phase during which the active pairs of the upcoming EXCHANGE are determined. Prior to the first pass there is normally a preprocessing stage. The overall procedure consisting of the preprocessing stage and all the passes is often referred to as the routing algorithm.
An important problem in this context is the design of efficient routing algorithms that implement permutations in a SE network in a minimal number of passes. In general, a transformation on a SE network associates with each processor a destination processor for the purpose of information transfer. This paper deals with the realization of nonsingular linear transformations, i.e., permutations for which each bit of the destination processor is a linear combination of the bits of the origin processor. It is well known ([3] and [4] ) that such permutations can be realized in 2n-1 passes, using a routing algorithm of O(n2) steps.
In 2 we show how to realize these permutations in 2n-1 passes, using a routing algorithm of O(n log n) steps. In 3 we show that if the permutation is merely a bit permutation, then only O(n) steps are required.
Following Linial and Tarsi [3] , we employ the combinatorial model described below.
DEFINITION 113]. A 0-1 matrix A, of order N m, N 2 , rn-> n, is said to be balanced if all the rows in any n consecutive columns of A are distinct. DEFINITION 2. The standard matrix is an N n matrix D whose ith row is the base-2 representation of i, 0_-< _-< N-1.
In terms of these definitions our problem can be stated as follows [3] : Given a balanced N n matrix A find a (possibly empty) matrix , B n be linearly independent vectors. Then there exists an integer k, 1 <-_ k <= n, and binary coefficients bj, 1 <-j <-_ n 1, such that case it is clear that C,,+ is nonsingular. In the latter case noting that C, 1 -<_ r < n, has at most two nonzero entries in every column, we can view C, as the incidence matrix of the graph G(C,) (2) Two n-tuples, S and F-ST, whose initial values represent, respectively, the ID of the said processor and that of the destination processor as defined by the given linear transformation. In the SHUFFLE and the EXCHANGE operations that follow each processor transfers its current S and F and receives new values for S and F. That is, for a given processor S (s(1),. , s(n)) and its destination processor F (f(1),... ,f(n)), the path in the SE network via which the transformation F ST is implemented by Procedure 1 is given by the sequence of processors corresponding to successive n-tuples from the row SB=s (1) Proof. First, observe that every column of the matrix B, 0-< r-< n 1, has at most two nonzero entries and, thus, can be viewed as the incidence matrix of the graph G(B) defined in 2. Note that G,, as defined by Construction If, on the other hand, connecting vertex p(n-m) to vertex n-m-1, after operation (i), does not create a cycle in the piece containing vertex p(n m), it certainly does not create a cycle with vertex 0 and the resulting graph is again a tree.
Since G(Bo) is a tree it follows that G(Bm) is a tree for all 0-< m <-n- 1 Construction 2 leads to Procedure 2, given below for realizing bit-permutations. In this procedure, which is simpler than Procedure 1, each processor has at each stage the following information:
(1) an (n 1)-tuple U (u (1),. ., u (n 1)) as in Procedure 1;
(2) an n-tuple S as in Procedure 1; (3) the permutation P=(p(1),..., p(n)). 
