Sorting Omega Networks Simulated with P Systems: Optimal Data Layouts by Ceterchi, Rodica et al.
Sorting Omega Networks Simulated with
P Systems: Optimal Data Layouts
Rodica Ceterchi1, Mario J. Pe´rez-Jime´nez2, Alexandru Ioan Tomescu1
1 Faculty of Mathematics and Computer Science, University of Bucharest
Academiei 14, RO-010014, Bucharest, Romania
2 Research Group on Natural Computing
Department of Computer Science and Artificial Intelligence
University of Sevilla
Avda. Reina Mercedes s/n, 41012 Sevilla, Spain
E-mails: rceterchi@gmail.com, marper@us.es, alexandru.tomescu@gmail.com
Summary. The paper introduces some sorting networks and their simulation with P
systems, in which each processor/membrane can hold more than one piece of data, and
perform operations on them internally. Several data layouts are discussed in this context,
and an optimal one is proposed, together with its implementation as a P system with
dynamic communication graphs.
1 Introduction
Paper [9] proposed two models to sort a sequence of N numbers, based on the
bitonic sorting network. The first one consisted of N membranes, each storing two
numbers; one number was an element of the sequence, and the other one was an
auxiliary register used to route values. A number x was codified as the number of
appearances of a symbol a in each membrane. Moreover, the membranes were dis-
posed on a 2D-mesh, where only communication between neighbor membranes on
the mesh are permitted. This model, using a variant of P Systems, called P systems
with dynamic communication graphs, (see [8]), follows closely the implementation
of the bitonic sort on the 2D-mesh.
The second model consisted of only one membrane, where all the N numbers
were encoded as occurrences of N different symbols. Restrictions on communica-
tion were no longer imposed, as if the underlying communication graph were the
complete graph.
In this paper we introduce a model in between the two. First of all, observe
that the first model has the advantage of a codifying alphabet of fixed size, while
the second has the advantage of a small communication overhead. The model we
put forth in this paper captures these two benefits. Each membrane holds a fixed
number of values, and each of the membranes can communicate with any other.
80 R. Ceterchi et al.
Additionally, in order to minimize the communication between membranes, we
use a periodic remap of values to membranes, according to the steps of the omega
network.
The problem of mapping values to processors has been previously addressed in
the context of parallel sorting algorithms. The bitonic sorting network, which can
sort N keys in time O(log2N), is probably one of the most well-known parallel
sorting algorithms. However, modern architectures differ greatly from the theo-
retical models under which such good results were obtained. As coarse-grained
processors can store internally more than one value, the following problem arises:
how to map N keys to P processors (N > P ), such that inter-processor com-
munication is minimized. In the bitonic sorting algorithm, and for N ≥ P 2, the
solution given in [13, 14] consisted in alternating a blocked layout with a cyclic
layout, performing thus the minimal number of remaps. This paper gives an op-
timal mapping strategy for the bitonic sort for any N > P , and then applies this
result to P Systems.
The paper is organized as follows. Section 2 presents preliminaries on bitonic
sorting networks and defines omega networks. Section 3 approaches the problem
of mapping N keys among P processors, each processor manipulating n = N/P
keys, such that overall communication is minimized. Optimal data layouts for the
omega network are proposed along the lines of [20], and some essential results are
proved about them. Section 4 discusses about internal processing in one processor,
and how we model it in our implementation with P systems. Section 5 introduces
the P system which simulates the omega network with optimal data layouts, and
the algorithms which generate the sequence of dynamic communication graphs of
this model. Complexity issues are addressed at the end of Sections 3 and 5.
2 Preliminaries on Bitonic Sorting Networks and Omega
Networks
A bitonic sequence is a concatenation of two monotonic sequences, one ascending,
and the other one descending, or a sequence such that a cyclic shift of its elements
would put them in such a form.
The key components of a bitonic network are the bitonic splitters and the
bitonic mergers. The splitter of size N takes as input a bitonic sequence of length
N and partitions it in two bitonic sequences of equal length, such that all the
elements in the first sequence are smaller than (or greater than) all the elements
in the second sequence. A bitonic merger of size N consists of a splitter of size N
and of two mergers of size N/2, of opposite direction. It accepts as input a bitonic
sequence and sorts it in ascending or descending order (direction).
As any sequence of two numbers is bitonic, the sorting network uses bitonic
mergers of increasing size and alternating direction to construct bitonic sequences
of increasing length. The last such merger, of size N , renders the whole sequence
of N numbers sorted.
Sorting Omega Networks Simulated with P Systems 81
0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
Fig. 1. A bitonic sorting network of size N = 8. The network can be partitioned in three
stages, each containing bitonic mergers of size 2, 4, and 8, respectively.
0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
a
b
c=min(a,b)
d=max(a,b)
a
b
c=max(a,b)
d=min(a,b)
a
b
c=(1-s)a+sb
d=sa+(1-s)b
(a) Increasing com-
parator
0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
a
b
c=min(a,b)
d=max(a,b)
a
b
c=max( ,b)
d=min(a,b)
a
b
c=(1-s)a+sb
d=sa+(1-s)b
(b) Decreasing com-
parator
Fig. 2. Network devices
Following [15] it is customary to represent a network as an ordered set of
N lines (wires) connected by a set of compare-exchange devices (comparators, for
brevity). A comparator has two input terminals, a and b, and produces two output
terminals c and d. If the comparator is increasing, Fig. 2(a), then c = min(a, b)
and d = max(a, b), while if the comparator is decreasing, Fig. 2(b), c = max(a, b)
and d = min(a, b). A bitonic sorting network for N = 8 is represented in Fig. 1.
We introduce some more notations regarding the serial and parallel connections
of networks T1 and T2, of size N . Their serial connection, T1T2, is a network in
which the i-th output terminal of T1 is connected to the i-th input terminal of T2.
The parallel connection, T1 ◦ T2, is the union of T1 and T2, with terminal i of T1
becoming terminal i of T1 ◦ T2, and terminal i of T2 becoming terminal i +N of
T1 ◦ T2 (i = 0, . . . , N − 1).
Definition 1 (Omega network, Fig. 3(d)). Let Dk, k ≥ 1 be a one-step
network of N = 2k lines with a device between the pair of lines (i, i + N/2),
for i = 0 . . . N/2 − 1. Then the omega network OMk is recursively defined as
OMk = Dk(OMk−1 ◦OMk−1).
In [6] the striking similarity between the bitonic merger (Fig. 3(a), 3(b)) and
the balanced merger (Fig. 3(c)) is investigated. Although prior research [11] showed
that there is no permutation of lines to transform the bitonic merger into a bal-
anced merger, a framework is developed under which it is shown that the two
82 R. Ceterchi et al.
0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
a
b
c=min(a,b)
d=max(a,b)
a
b
c=max(a,b)
d=min(a,b)
a
b
c=(1-s)a+sb
d=sa+(1-s)b
0
1
2
3
4
5
6
7
0
1
3
2
6
7
5
4
(a) The bitonic
merger - classical
representation
0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
a
b
c=min(a,b)
d=max(a,b)
a
b
c=max(a,b)
d=min(a,b)
a
b
c=(1-s)a+sb
d=sa+(1-s)b
0
1
2
3
4
5
6
7
0
1
3
2
6
7
5
4
(b) The bitonic
merger - after a
permutation of
lines
0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
a
b
c=min(a,b)
d=max(a,b)
a
b
c=max(a,b)
d=min(a,b)
a
b
c=(1-s)a+sb
d=sa+(1-s)b
0
1
2
3
4
5
6
7
0
1
3
2
6
7
5
4
0
1
2
3
4
5
6
7
(c) The balanced
merger
0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
a
b
c=min(a,b)
d=max(a,b)
a
b
c=max(a,b)
d=min(a,b)
a
b
c=(1-s)a+sb
d=sa+(1-s)b
0
1
2
3
4
5
6
0
1
3
2
6
7
5
4
0
1
2
3
4
5
6
7
D3 OM2
OM2
(d) The omega net-
work OM3
Fig. 3. The bitonic merger, the balanced merger, and the omega network of size 8
mergers are isomorphic graphs, also isomorphic to the graph of the omega net-
work (Fig. 3(d)).
As a serial connection of logN identical networks in the class of omega networks
forms a sorting network [6], in what follows we will concentrate mainly on the
omega network.
3 How They Communicate
A sorting network is a fine-grained theoretical model, containing exactly one input
key on each wire. Additionally, comparators require communication between wires,
Sorting Omega Networks Simulated with P Systems 83
which can sometimes be more time consuming than the comparison operation it-
self [1, 3, 10, 16]. When redesigning parallel sorting algorithms for coarse-grained
PRAM, one has to pay particular attention to both communication and compu-
tation.
Given N keys and P processors (N > P ), we have to map n = N/P keys
to each processor, such that overall communication is minimized. Ionescu and
Schauser [13, 14] investigated this problem for the bitonic sorting algorithm. As
initially suggested in [10], they proposed a smart periodical switch between a
blocked layout and a cyclic layout. They observed that in each stage of the sorting
algorithm, the last log n steps can be performed locally under a blocked layout,
while under the cyclic layout the first log n steps are local. A necessary condition
for the two layouts to span enough depth to cover an entire stage of the network is
N ≥ P 2. In addition, the two layouts are particular to the sorting network being
implemented. We shall see, for example, that the balanced merger [11, 12], which,
as the bitonic merger, belongs to the class of omega networks, also admits data
layouts optimizing overall communication.
An approach from the opposite side was put forth by Lee and Batcher [17].
They used a parity strategy for a shared-memory model with N = 2P to store
even-parity keys in local memory, while only odd-parity keys were recirculated.
This decreased by a factor of 2 the number of shared memory references.
The main contribution of this paper is a general scheme to map N values to P
processors, for any N > P and for any sorting network with the topology of the
omega network. Our idea captures the essence from the alternating smart layout
of [14], and makes it generally applicable, even when N < P 2. The number of data
layouts is no longer two, but it depends on the granularity of the processors.
3.1 Optimal Data Layouts for the Omega Network
In the following, without explicitly mentioning it, we assume we have to sort
N = 2k keys using P processors, N > P , each processor holding n = N/P keys.
Any number i ∈ {0, . . . , 2k − 1} has a bit representation i = a1a2 · · · ak, a1 being
the most significant bit, and ak the least significant one. To simplify notation,
we say that a sequence of bits aj · · · ai, where i, j ∈ {1, . . . , k} and j > i, stands
for the void sequence. The number of parallel steps of OMk is k, and step t of
the omega network OMk contains devices linking lines whose bit representations
differ of bit t, with 1 ≤ t ≤ k. For any t ∈ {1, . . . , k}, consider the function
bct : {0, 1, . . . , 2k − 1} −→ {0, 1, . . . , 2k − 1}, the bit complement of the t-th bit,
defined by bct(a1a2 · · · at · · · ak) = a1a2 · · · a¯t · · · ak. The function bct is injective
and idempotent.
First, we give a formal definition of a data layout.
Definition 2 (Data layout). A data layout of N values to P processors is a
function D : {0, . . . , N − 1} → {0, . . . , P − 1}.
We introduce the following data layouts, as suggested in [10, 14].
84 R. Ceterchi et al.
0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
(a) An omega network on size 32. Lines marked with same
shape are assigned to the same processor in one data lay-
out.
0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
0 0 0 0 0 = 0
1 0 0 0 0 = 16
0 1 0 0 0 = 8
1 1 0 0 0 = 24
0 0 0 0 0 = 0
0 0 1 0 0 = 4
0 0 0 1 0 = 2
0 0 1 1 0 = 6
0 0 0 0 0 = 0
0 0 0 1 0 = 2
0 0 0 0 1 = 1
0 0 0 1 1 = 3
(b) Keys
mapped to
processor 0
in each of the
three data
layouts
Fig. 4. Three data layouts for the omega network OM5.
Definition 3 (Blocked layout). A blocked layout for mapping N keys on P
processors is a function Db : {0, . . . , N − 1} → {0, . . . , P − 1}, such that Db(i) =
bi/nc, where n = N/P .
Definition 4 (Cyclic layout). A cyclic layout for mapping N keys on P proces-
sors is a function Dc : {0, . . . , N − 1} → {0, . . . , P − 1}, such that Dc(i) = i mod
P .
We note that Definition 5 in [13], and the definition for the cyclic layout indi-
cated in Section 2.1 of [14] are incorrect, since if we map the i-th key to the i mod
n processor, where n = N/P , we have that n ≤ P , which implies N ≤ P 2, which
clearly is not the case considered.
In a blocked layout, the first logN− log n steps require remote communication,
while the last log n steps are local. In a cyclic layout, the situation is reversed: the
first logN − log n steps are local, while the last log n steps are remote. The idea
Sorting Omega Networks Simulated with P Systems 85
proposed in [14] when mapping N ≥ P 2 values in the bitonic sort is to periodically
switch between the two layouts, such that all steps are local. Moreover, as the
stages in a bitonic sort have increasing size, the author proposes an improved
“smart” remap such that a layout spans through multiple stages of the algorithm,
achieving a total of logP + 1 remaps.
Our paper better highlights the reasoning behind these remaps, in the case of
the bitonic sort. Consider the omega network OMk, and consider we choose to
map key 0 to processor 0. If each processor can hold 2m values, which other keys
are mapped to processor 0? As we can see, at step 1 we have a device linking line
0 with line 0 + 2k−1. At step 2 we have a device linking line 0 with line 2k−2, and
a device linking line 2k−1 and line 2k−1 + 2k−2. We also note that in step 1 lines
2k−2 and 2k−1 + 2k−2 were also linked with a device. We continue until step m,
where we identify 2m lines linked by 2m−1 devices. It would be natural to map
these lines to processor 0, as all comparisons at step m are local. However, one
more problem remains: all comparisons at stages 0 through m− 1 are also local?
As we shall see, the answer is yes.
The following lemma is straightforward from the definition of OMk.
Lemma 1. At each step 1 ≤ t ≤ k of OMk, and for any 0 ≤ i < 2k, line i is
linked by a device only with line bct(i).
Lemma 2. In OMk, for any 0 ≤ i < 2k−m, 1 ≤ m ≤ k and 0 ≤ t ≤ k −m, in
steps t+1, . . . , t+m there is no device linking lines in the set P t,mi = {a1a2 · · · ak |
a1 · · · atat+m+1 · · · ak = i, where a1 · · · ak is a bit representation} with lines from
{0, . . . , 2k − 1} \ P t,mi .
Proof. Suppose there are 1 ≤ r ≤ m, l ∈ P t,mi and l′ /∈ P t,mi such that at step
t+ r there is a device linking l and l′. From Lemma 1 we have that l′ = bct+r(l),
which implies l′ ∈ P t,mi , a contradiction.
We can therefore derive the data layouts for the omega network. Suppose we
have N = 2k, n = 2m, and P = 2k−m. We first assign to each processor Pi all
values in the set P 0,mi , for 0 ≤ i ≤ P − 1. By Lemma 2 we have that the first
log n = m steps are entirely local. After m steps, we remap to each processor Pi
all the values in the set Pm,mi , and perform the next m stages locally, and so on.
We can now give the definition of our proposed data layout.
Definition 5. Given N = 2k keys and P = 2k−m processors, which can store n =
2m values, m ≥ 1, the sequence of optimal data layouts consists of dlogN/ log ne =
dk/me data layouts. In each data layout Ds, 0 ≤ s ≤ dk/me − 1, values in the set
P sm,mi are mapped to processor Pi, for all 0 ≤ i ≤ 2k−m. More formally, for any
0 ≤ u < 2k such that u ∈ P sm,mi , we have Ds(u) = i.
The following is a consequence of Lemma 1 of [14].
Lemma 3. The maximum number of successive steps of the omega network that
can be executed locally, under any data layout is log n, where n = N/P .
86 R. Ceterchi et al.
M
e
m
b
ra
n
e
 0
0
1
1
  
=
  
0
 0
 0
 0
 1
9
  
=
  
0
 1
 0
 0
 1
1
7
 =
 1
 0
 0
 0
 1
2
5
 =
 1
 1
 0
 0
 1
M
e
m
b
ra
n
e
 0
1
0
2
  
=
  
0
 0
 0
 1
 0
1
0
 =
 0
 1
 0
 1
 0
1
8
 =
 1
 0
 0
 1
 0
2
6
 =
 1
 1
 0
 1
 0
M
e
m
b
ra
n
e
 0
1
1
3
  
=
  
0
 0
 0
 1
 1
1
1
 =
 0
 1
 0
 1
 1
1
9
 =
 1
 0
 0
 1
 1
2
7
 =
 1
 1
 0
 1
 1
M
e
m
b
ra
n
e
 1
0
0
4
  
=
  
0
 0
 1
 0
 0
1
2
 =
 0
 1
 1
 0
 0
2
0
 =
 1
 0
 1
 0
 0
2
8
 =
 1
 1
 1
 0
 0
M
e
m
b
ra
n
e
 1
0
1
5
  
=
  
0
 0
 1
 0
 1
1
3
 =
 0
 1
 1
 0
 1
2
1
 =
 1
 0
 1
 0
 1
2
9
 =
 1
 1
 1
 0
 1
M
e
m
b
ra
n
e
 1
1
0
6
  
=
  
0
 0
 1
 1
 0
1
4
 =
 0
 1
 1
 1
 0
2
2
 =
 1
 0
 1
 1
 0
3
0
 =
 1
 1
 1
 1
 0
M
e
m
b
ra
n
e
 1
1
1
7
  
=
  
0
 0
 1
 1
 1
1
5
 =
 0
 1
 1
 1
 1
2
3
 =
 1
 0
 1
 1
 1
3
1
 =
 1
 1
 1
 1
 1
M
e
m
b
ra
n
e
 0
0
0
0
  
=
  
0
 0
 0
 0
 0
8
  
=
  
0
 1
 0
 0
 0
1
6
 =
 1
 0
 0
 0
 0
2
4
 =
 1
 1
 0
 0
 0
M
e
m
b
ra
n
e
 0
0
0
0
  
=
  
0
 0
 0
 0
 0
2
  
=
  
0
 0
 0
 1
 0
4
  
=
  
0
 0
 1
 0
 0
6
  
=
  
0
 0
 1
 1
 0
M
e
m
b
ra
n
e
 0
0
1
1
  
=
  
0
 0
 0
 0
 1
3
  
=
  
0
 0
 0
 1
 1
5
  
=
  
0
 0
 1
 0
 1
7
  
=
  
0
 0
 1
 1
 1
M
e
m
b
ra
n
e
 0
1
0
8
  
=
  
0
 1
 0
 0
 0
1
0
 =
 0
 1
 0
 1
 0
1
2
 =
 0
 1
 1
 0
 0
1
4
 =
 0
 1
 1
 1
 0
M
e
m
b
ra
n
e
 0
1
1
9
  
=
  
0
 1
 0
 0
 1
1
1
 =
 0
 1
 0
 1
 1
1
3
 =
 0
 1
 1
 0
 1
1
5
 =
 0
 1
 1
 1
 1
M
e
m
b
ra
n
e
 1
0
0
1
6
 =
 1
 0
 0
 0
 0
1
8
 =
 1
 0
 0
 1
 0
2
0
 =
 1
 0
 1
 0
 0
2
2
 =
 1
 0
 1
 1
 0
M
e
m
b
ra
n
e
 1
0
1
1
7
 =
 1
 0
 0
 0
 1
1
9
 =
 1
 0
 0
 1
 1
2
1
 =
 1
 0
 1
 0
 1
2
3
 =
 1
 0
 1
 1
 1
M
e
m
b
ra
n
e
 1
1
0
2
4
 =
 1
 1
 0
 0
 0
2
6
 =
 1
 1
 0
 1
 0
2
8
 =
 1
 1
 1
 0
 0
3
0
 =
 1
 1
 1
 1
 0
M
e
m
b
ra
n
e
 1
1
1
2
5
 =
 1
 1
 0
 0
 1
2
7
 =
 1
 1
 0
 1
 1
2
9
 =
 1
 1
 1
 0
 1
3
1
 =
 1
 1
 1
 1
 1
M
e
m
b
ra
n
e
 0
0
0
0
  
=
  
0
 0
 0
 0
 0
1
  
=
  
0
 0
 0
 0
 1
2
  
=
  
0
 0
 0
 1
 0
3
  
=
  
0
 0
 0
 1
 1
M
e
m
b
ra
n
e
 0
0
1
4
  
=
  
0
 0
 1
 0
 0
5
  
=
  
0
 0
 1
 0
 1
6
  
=
  
0
 0
 1
 1
 0
7
  
=
  
0
 0
 1
 1
 1
M
e
m
b
ra
n
e
 0
1
0
8
  
=
  
0
 1
 0
 0
 0
9
  
=
  
0
 1
 0
 0
 1
1
0
 =
 0
 1
 0
 1
 0
1
1
 =
 0
 1
 0
 1
 1
M
e
m
b
ra
n
e
 0
1
1
1
2
 =
 0
 1
 1
 0
 0
1
3
 =
 0
 1
 1
 0
 1
1
4
 =
 0
 1
 1
 1
 0
1
5
 =
 0
 1
 1
 1
 1
M
e
m
b
ra
n
e
 1
0
0
1
6
 =
 1
 0
 0
 0
 0
1
7
 =
 1
 0
 0
 0
 1
1
8
 =
 1
 0
 0
 1
 0
1
9
 =
 1
 0
 0
 1
 1
M
e
m
b
ra
n
e
 1
0
1
2
0
 =
 1
 0
 1
 0
 0
2
1
 =
 1
 0
 1
 0
 1
2
2
 =
 1
 0
 1
 1
 0
2
3
 =
 1
 0
 1
 1
 1
M
e
m
b
ra
n
e
 1
1
0
2
4
 =
 1
 1
 0
 0
 0
2
5
 =
 1
 1
 0
 0
 1
2
6
 =
 1
 1
 0
 1
 0
2
7
 =
 1
 1
 0
 1
 1
M
e
m
b
ra
n
e
 1
1
1
2
8
 =
 1
 1
 1
 0
 0
2
9
 =
 1
 1
 1
 0
 1
3
0
 =
 1
 1
 1
 1
 0
3
1
 =
 1
 1
 1
 1
 1
D
a
ta
 l
a
y
o
u
t 
1
D
a
ta
 l
a
y
o
u
t 
2
D
a
ta
 l
a
y
o
u
t 
3
Fig. 5. The three data layouts for the omega network in Figure 4(a).
Sorting Omega Networks Simulated with P Systems 87
In each data layout Ds, 0 ≤ s ≤ dk/me − 2, log n = m steps are local. For
s = dk/me − 1, the last k mod m steps of the network are local. From Lemma 3
we have that the proposed data layouts for the omega network are optimal.
In the case N ≥ P 2, we notice that 2m > k, hence two data layouts are enough
to cover the whole omega network. However, they do not coincide with Db or Dc,
as in the blocked layout, the last m stages are local, while in the cyclic layout, the
first k −m stages are local.
3.2 Computation Complexity
In each data layout, a processor holds n values and performs log n steps locally,
taking time O(n log n). As we have dlogN/ log ne data layouts, we get an overall
time complexity of the omega network of O(n logN). From [6] we have that a
serial connection of logN omega networks of size N is enough to sort a sequence
of N numbers. Hence, the complexity to sort N numbers using P processors, each
holding n = N/P values, using our proposed data layouts, is O(n log2N).
This remark has a quite profound significance. In the fine-grained theoretical
model we have n = 1, and its complexity is O(log2N). The complexity of the
network using a more coarse-grained model depends linearly on the degree of
parallelism of the model. At the opposite end, when n = N and the entire sorting
network is simulated locally, we have a complexity of O(N log2N), which is worse
than O(N logN), the complexity of most sequential sorting algorithms. It would
be desirable to choose n such that this bound is not surpassed in the parallel
model. We impose n log2N ≤ N logN , which implies n ≤ N/ logN .
An algorithm to find the minimum of a bitonic sequence of size n in time
O(log n), was introduced in [14]. This gives a time complexity of each data layout
of O(n). In the case of a network obtained from a serial connection of bitonic
mergers, this observation gives an overall time complexity of O( nlogn log
2N).
4 What Happens Inside One Processor/Membrane
One processor (and the membrane which simulates it) will be capable of hold-
ing n = N/P = 2m, pieces of data. We label the data with indices in the set
{0, 1, · · · , n − 1}. For any such index we consider its writing as a binary string of
length m, for instance i = x1x2 · · ·xt · · ·xm.
Inside one processor, several comparisons are performed, in parallel, between
the n pieces of data, in the following manner: for every bit t, (starting with 1,
the most significant bit, and ending with m) we compare and exchange if neces-
sary (to obtain an increasing order) all pairs of values codified with ai and abct(i).
More precisely, we have the following algorithm to be performed inside each pro-
cessor/membrane:
88 R. Ceterchi et al.
for t← 1 to m do
forall i < bct(i) in parallel do
compare(ai, abct(i));
Algorithm 1: A parallel algorithm for the bitonic merger
where by compare(ai, aj) we denote sorting in an ascendant manner the values
codified by ai and aj , i.e. we end by having the minimum of the two values codified
by ai and the maximum by aj .
The procedure compare(ai, aj) works in a membrane in the following manner:
let si, sj and ti, tj be four auxiliary symbols, for the sources and the targets of a
comparator. The set of rules
{ak → sk | k = i, j} ∪ {sisj → titj , si → tjsj → tj} ∪ {tk → ak | k = i, j}
implement an increasing comparator between values codified by ai and aj . We first
rewrite the as to ss, next we have the comparator which writes the minimum to ti
and the maximum to tj , and then we rewrite these back to ai and aj respectively.
For all the comparisons which are to be done in parallel, take auxiliary alpha-
bets S = {s0, · · · , sn−1} and T = {t0, · · · , tn−1}. We rewrite all initial symbols to
symbols in S:
{ai → si | i = 0, 1, · · · , n− 1}.
Next we put the comparators between appropriate pairs:
{sisj → titj , si → tj , sj → tj | i = 0, 1, · · · , n− 1, i < j = bct(i)}.
Then we rewrite back to the original alphabet:
{ti → ai | i = 0, 1, · · · , n− 1}.
The parallel comparisons at each step t
forall i < bct(i) in parallel do
compare(ai, abct(i));
will thus be simulated in a membrane P by the rules
{ai → si | i = 0, 1, · · · , n− 1} ∪
∪ {sisj → titj , si → tj , sj → tj | i = 0, 1, · · · , n− 1, i < j = bct(i)} ∪
∪ {ti → ai | i = 0, 1, · · · , n− 1}.
5 A P System which Simulates the Omega Network
In this section we introduce a P system with dynamic communication [7], along
the same general lines as the model proposed in [8, 9]. For each of the processors
Sorting Omega Networks Simulated with P Systems 89
Pi, i ∈ {0, 1, . . . , P − 1} we have an associated membrane, which we label i. The
graphs we consider are sub-graphs of the complete graph, KP , or of the identity
graph.
Note that at a certain step of the sorting algorithm not all edges are involved in
communication. Therefore we call active sub-graphs of KP those graphs containing
only such edges. We also introduce the identity graph, with
V (Id) = {0, 1, . . . , P − 1},
E(Id) = {(i, i) | 0 ≤ i ≤ P − 1}
for modeling internal processing steps.
In order to describe the evolution of such a P system, we use pairs of the type
[graph, rules]. We have graph a sub-graph of KP or Id and rules a mapping from
the set of all edges of graph, E(graph), to the set of all symbol/object rewriting
rules for routing or comparison operations.
The formal definition of the P system is
Π = (V = {a0, . . . , an−1} ∪ A, 〈[ax
0
0
0 , a
x01
1 , . . . , a
x0n−1
n−1 ]0, . . . ,
[ax
P−1
0
0 , a
xP−11
1 , . . . , a
xP−1n−1
n−1 ]P−1〉, Rµ),
where the membrane indices are {0, 1, . . . , P − 1}. The alphabet {a0, . . . , an−1} is
of fixed size, and the set A contains the auxiliary symbols necessary to simulate
the omega network, as indicated in Section 4. Numbers xji with 0 ≤ i ≤ n− 1 are
the values stored on the wires mapped to processor j, 0 ≤ j ≤ P − 1 in the first
data layout. Each of them is codified as the number of occurrences of a symbol ai
inside membrane j. Finally, Rµ is the finite sequence of pairs [graph, rules] which
guides the computation.
We will see in the sequel that Rµ is generated algorithmically, by concatenating
sequences of pairs [graph, rules]3.
Lemma 4. Given N = 2k keys and P = 2k−m membranes, which can store n =
2m values, m ≥ 1, after the computation for the data layout Ds is finished, symbol
ai of membrane j codifies the value corresponding to wire u ∈ {0, . . . , N−1}, where
the bit representation of u is u = j1 . . . jsmi1 . . . imjsm+1 . . . jk−m. By j1 . . . jk−m
and by i1 . . . im we denoted the bit representations of j, and i, respectively.
Proof. The proof is immediate by Definitions 1, 5 and Lemma 2.
We observe that the remap of values from a data layout to the other can be done
in P+1 steps. When passing from data layout Ds−1 to Ds, with 0 < s ≤ dk/me−1,
in each step j, 0 ≤ j ≤ P − 1, membrane j sends its contents along the edges of
the communication graph Cjs . To avoid collisions in the destination membranes,
3 We denote the empty sequence by λ, and the concatenation of two sequences by “·”.
90 R. Ceterchi et al.
it also performs a rewriting of symbols from at to a′t, for all t ∈ {0, . . . , n − 1}.
In the last step P + 1, all auxiliary symbols a′t will be rewritten back to at in all
membranes, and the local computation can begin in each membrane.
We give below two algorithms generating the communication graphs Cjs , and
the rules associated to each edge.
E(Cjs)← ∅ ;
for j ← 0 to P − 1 do
for i← 0 to n− 1 do
let j have bit representation j1 · · · jsmjsm+1 · · · jk−m;
let i have bit representation i1 · · · im;
// the destination membrane of value encoded by ai in
membrane j z ← j1 · · · jsmi1 · · · imj(s+1)m+1 · · · jk−m;
// the destination symbol of value encoded by ai in
membrane j t← jsm+1 · · · jsm+m;
E(Cjs) := E(C
j
s) ∪ {j, z};
rulesCjs ((j, z)) := ai → a′t ;
Algorithm 2: Generation of the sequence of P communication graphs when
passing from data layout Ds−1 to Ds, with 0 < s ≤ dk/me − 1.
for j ← 0 to P − 1 do
rules-endcomm((j, j)) := {a′i → ai | 0 ≤ i ≤ n− 1};
Algorithm 3: Generation of the rules associated to the identity graph which
rewrite back the auxiliary symbols a′t when passing from any data layout Ds−1
to Ds, with 0 < s ≤ dk/me − 1.
We assume that the sequence denoted by SimOM is the sequence of pairs
[graph, rules] which simulates the omega network of size n, OMm (n = 2m). Its
construction was indicated in Section 4 and is expressed algorithmically below.
SimOM ← λ;
for t← 1 to m = log n do
forall p← 0 to P − 1 in parallel do
rulest,1((p, p))← {ai → si | i = 0, 1, . . . n− 1};
rulest,2((p, p))← {sisj → titj , si → tj , sj → tj | i =
0, 1, . . . , n− 1, i < j = bct(i)};
rulest,3((p, p))← {ti → ai | i = 0, 1, · · ·n− 1};
SimOM ← SimOM · [Id,rulest,1] · [Id,rulest,2] · [Id,rulest,3];
Algorithm 4: Generation of the sequence SimOM which simulates the omega
network of size n.
We can now give the algorithm which generates the whole sequence Rµ guiding
the computation.
Sorting Omega Networks Simulated with P Systems 91
Rµ ← λ;
for s← 1 to dk/me − 1 do
Rµ ← Rµ · SimOM ;
for j ← 0 to P − 1 do
Rµ ← Rµ · [Cjs , rulesCjs ];
Rµ ← Rµ · [Id, rules-endcomm];
Rµ ← Rµ · SimOM ;
Algorithm 5: Generation of the sequence Rµ which guides the computation.
5.1 Computation complexity
Observe that the length of the sequence SimOM is 3 log n. As we have logNlogn data
layouts, and that in each data layout 3 log n steps are needed for SimOM and
another P + 1 steps are needed for communication, the length of Rµ is 3 logN +
N logN
n logn . A sorting network can be obtained by a serial connection of logN omega
networks, hence our model can sort in time O(log2N + N log
2N
n logn ). Note that when
n = N all computation is local, and the complexity is the best possible, O(log2N).
When n = 2 the complexity increases to O(N log2N).
References
1. A. Aggarwal, A.K. Chandra, M. Snir, “Communication Complexity of PRAMs”,
Theoretical Computer Science, vol. 71, no.1, pp. 3 - 28, Mar. 1990.
2. M. Ajtai, J. Komlos, and E. Szemeredi, “An O(N logN) Sorting Network”, Proc.
15th Ann. ACM Symp. Theory of Computing, pp. 1-9, May 1983.
3. A. Alexandrov, M. Ionescu, K.E. Schauser, C. Scheiman, “LogGP: Incorporating
Long Messages into the LogP model”, Journal of parallel and distributed computing,
vol. 44, no. 1, pp. 71-79, 1997.
4. A. Alhazov, D. Sburlan, “Static Sorting P Systems”, Chapter 8 in Applications of
Membrane Computing, (G. Ciobanu, Gh. Pa˘un, M.J. Pe´rez Jime´nez Eds.), Springer,
2005.
5. K.E. Batcher, “Sorting networks and their applications”, Proc. AFIPS Spring Joint
Comput. Conf., vol. 32, pp. 307-314, Apr. 1968.
6. G. Bilardi, “Merging and Sorting Networks with the Topology of the Omega Net-
work”, IEEE Transactions on Computers, vol. 38, no. 10, pp. 1396-1403, Oct. 1989.
7. R. Ceterchi, C. Mart´ın-Vide, “Dynamic P Systems”, LNCS, vol. 2597, pp. 146 - 186,
2003.
8. R. Ceterchi, M.J. Pe´rez Jime´nez, “On two-dimensional mesh networks and their
simulation with P systems”, LNCS, vol. 3365, pp. 259-277, 2005.
9. R. Ceterchi, M.J. Pe´rez Jime´nez, A.I. Tomescu, “Simulating the Bitonic Sort Using P
Systems”, G. Eleftherakis et al. (Eds.): WMC8 2007, LNCS, vol. 4860, pp. 172-192,
2007.
92 R. Ceterchi et al.
10. D.E. Culler, R.M. Karp, D.A. Patterson, A. Sahay, K.E. Schauser, E. Santos, R.
Subramonian, and T. von Eicken, “LogP: Towards a Realistic Model of Parallel
Computation”, Proc. Fourth ACM SIGPLAN Symposium on Principles and Practice
of Parallel Programming, pp.1-12, May 1993.
11. M. Dowd, Y. Perl, M. Saks, L. Rudolph, “The balanced sorting network”, Proc.
Second annual ACM symp. on Principles of distributed computing, pp. 161-172, 1983.
12. M. Dowd, Y. Perl, M. Saks, L. Rudolph, “The periodic balanced sorting network”,
JACM, vol. 36. no. 4, pp. 738-757, 1989.
13. M.F. Ionescu, “Optimizing Parallel Bitonic Sort”, Tech. Report TRCS96-14, Dept.
of Comp. Sci., Univ. of California, Santa Barbara, July 1996.
14. M.F. Ionescu, K.E. Schauser, “Optimizing parallel bitonic sort”, Proc. 11th Int’l
Parallel Processing Symp., pp. 303-309, 1997.
15. D.E. Knuth, The art of computer programming, volume 3: sorting and searching,
second ed. Redwood City, CA: Addison Wesley Longman, 1998.
16. C. Kruskal, L. Rudolph, M. Snir. “A complexity theory of efficient parallel algo-
rithms”, Theoretical Computer Science, vol.71, no.1, pp. 95 - 132, Mar. 1990.
17. J.D. Lee, K.E. Batcher, “Minimizing Communication in the Bitonic Sort”, IEEE
Trans. on Parallel and Distributed Systems, vol. 11, no. 5, pp. 459-474, May 2000.
18. F. Leighton, “Tight Bounds on the Complexity of Parallel Sorting,” IEEE Trans.
Computers, vol. 34, no. 4, pp. 344-354, Apr. 1985.
19. M.S. Paterson, “Improved Sorting Networks with O(logN) Depth,” Algorithmica,
vol. 5, pp. 75-92, 1990.
20. A.I. Tomescu, “Optimal Data Layouts for Omega Networks”, manuscript.
