Shape reconstruction using instruction systolic array by Partheepan Kandaswamy (1254243) et al.
Shape Reconstruction using Instruction Systolic 
Array 
Partheepan Kandaswamy, James A Flint, Vassilios A Choularas 
Wolfson School of Mechanical, Electrical and Manufacturing Engineering 
Loughborough University 
Loughborough, LE11 3TU 
UK 
p.kandaswamy@lboro.ac.uk 
 
 
Abstract— This paper describes a novel, 2D mesh 
architecture prototype based on the Instruction Systolic 
Array (ISA) paradigm for distributed computing on 
fabrics. We discuss a real-time shape sensing and 
reconstruction application executing on this architecture 
and demonstrate a physical design for a wearable system 
based on ISA concept constructed out of off-the-shelf 
microcontrollers and sensors. Results demonstrate the 
application executes in 39 ms on our prototype ISA 
implementation thus confirming the viability of the 
proposed architecture for fabric-resident computing 
devices. 
 
Keywords—Instruction Systolic Array; distributed sensor 
networks; System on fabrics; wearable electronics; fabric-resident 
computing device. 
I. INTRODUCTION  
The Instruction Systolic Array (ISA) was first introduced 
by Kung et al. in 1978 [1]. The ISA is a network which is 
composed of a large number of identical, locally connected 
elementary Processing elements (P). The ISA is a parallel 
computer model, an architectural concept suited for 
implementing a system with high bandwidth and architectural 
benefits for wearable applications. To implement the on-fabric 
ISA concept, shape reconstruction application has been chosen 
from Hermanis et al. [2]. The method is based on three-axial 
acceleration and magnetic sensor nodes that are embedded 
into the fabric and can measure local orientation data. The 
shape reconstruction algorithm along with ISA from local 
orientation measurements ensures fast computations for global 
shape reconstructions utilising data from a number of sensors. 
II. BACKGROUND  
ISA is broadly used in very-large-scale integration 
technology (VLSI) for execution purposes as an architectural 
concept [3] [4]. ISA is more flexible and advanced as 
compared to data systolic arrays which are considered chiefly 
as special purpose architectures. The main focus of ISA is on 
short interconnections for data communication as well as for 
control communication. The key properties of ISA are local 
communication for data and control flow, modularity and 
scalability, local data handling and mapping is logical. 
 
In ISA, rather than data, instructions and selector bits are 
pumped in a systolic way through a processor array. This 
particular arrangement helps in executing different algorithms 
on the same processor array. Also, the instruction stream and 
the selector bit stream get combined for the execution of ISA 
as shown in Fig. 1. To execute the instructions at a particular 
processing element, the selector bit must be 1.  The 
fundamental model of a parallel computer can be seen as a 
mesh connected n x n-array with identical processors. The 
processors are capable of executing instructions from a small 
instruction set. Instructions, as well as selector bits, are used 
for controlling processing elements. The flow of instructions 
is generally from top to bottom (north to south) of the array. 
On the other hand, the selector bit flows from left to right 
(west to east) of the array. 
 
 
Fig 1: Execution of an ISA 
III. NOVEL ARCHITECTURE FOR ON-FABRIC PARALLEL 
PROCESSING 
The processing elements are arranged in a systolic manner 
as shown in Fig. 2. Each processing element is connected to 
their neighbouring processing elements and also closely 
coupled to the sensors. Due to large number of physical 
interconnections that are likely to be necessary to wire a 
suitable-sized sensor array, a compromise has to be taken by 
selecting the I2C bus and sharing it for both the ISA inter 
element connections and sensors. Each processing element has 
               
 (1)                (2)         (3)               (4) 
                    
       (5)              (6)                      (7)             (8) 
 
I
I
I
I
S
S
S
SSELECTOR BIT
INSTRUCTIONS
I
I
I
S
S
S
I
I
S
S
I
S
 
N
EW
S
four I2C buses, one per direction. The north and west ports act 
as slaves and the east and south ports are the master devices. 
The northern boundary of the array is connected to the 
instruction stream flow controller (the external host) which 
stores the array of instructions that needs to be passed to the 
processing elements. The western boundary of the array is 
connected to selector bit flow controller (also the external 
host) which stores the array of selector bits that needs to be 
passed to the processing elements. The processing elements 
and the sensors are closely coupled as shown in Fig. 2 so that 
processing can be done locally.  
 
Fig 2: System Architecture-Processor Array (P-Processing element, S-
Sensor) 
IV. SHAPE RECONSTRUCTION ALGORITHM 
       As acceleration/magnetic sensors provide only two vector 
observations, which is minimum for full orientation 
determination, no real minimisation problem can be defined. 
Therefore, Hermanis et al. proposed the triad based 
computation for fast, singularity free and a computationally 
simple algorithm. They constructed two triads of orthonormal 
unit vectors, one formed from the general reference frame, the 
other from the sensor reference frame. Equations of the triads 
are referenced from [2]. These triads are then used to form the 
global earth reference and sensor measurement matrices. The 
authors then compute the rotation matrix which describes 
sensor orientation relative to the global reference frame. The 
rotational matrix is used to calculate the surface segment 
orientation relative to initial position that corresponds to 
sensor orientation relative to earth reference frame. 
Acceleration and magnetic sensor nodes are arranged in 
regular grid along the surface. The model of the surface is 
divided in n rigid segments. Each segment is described by four 
direction vectors. From the segment structure it can be 
deduced, that if a single control point location is known, then 
any other control point on the same segment row or column 
can be calculated by adding and subtracting the corresponding 
segment direction vectors. 
V. EXPERIMENTAL SETUP 
       A concept prototype of the ISA concept was designed 
using commercially available ARM Cortex-M0+ LPC824 
microcontrollers as processing elements, and instruction and 
selector bit flow controllers shown in Fig. 3 to demonstrate the 
feasibility of our system, validate its applicability on a 
wearable-computing application, estimate its performance on 
the latter and propose ways of improving it. 
To implement the surface reconstruction application using 
ISA, a sensor network with 16 sensors was embedded in a 
piece of fabric. Inter-node communication is via I2C, allowing 
the connection of sensors using four wires. A sensor node 
built with the LSM303DLHC acceleration/magnetic sensor is 
shown in Fig. 3. The LSM303DLHC is used for orientation 
estimation as described in section IV. Each sensor is 
connected to its own microcontroller as shown in the system 
architecture of Fig. 2. The microcontroller serves as the 
interface between the sensor node and the host computer as all 
the computations take place locally in the microcontrollers. 
When fully implemented the umbilical shown in Fig. 3 will 
not exist. Each microcontroller is assigned a unique ID to 
identify its position in the grid and calculate the control points. 
Once the microcontroller receives the ID, it starts to receive 
the orientation data from the sensor. The orientation data is 
then averaged and stored for calculation of directional vectors. 
These directional vectors are shared between neighboring 
microcontrollers for the calculation of control points. Once 
these control points are calculated for each sensor, they are 
sent to the host computer via serial port for 3D visualisation of 
the sensed object. The process of ISA computing the control 
points and the host drawing the visualisation continues 
indefinitely. 
 
Fig 3: Prototype board (left) and Sensors embedded in fabric (cm-centimeter) 
(right) 
VI. PROGRAMMING THE ISA 
The ISA application is programmed on to the chosen 
microcontrollers. All the microcontrollers will be having the 
same program flashed in them. All microcontrollers are aware 
of their position and at the boundary sending of instruction 
and selector bits is disabled as the end of the array has been 
reached. The working of the microcontroller program is 
explained in Fig 4. The instruction and selector bits are passed 
to the processing element; the latter is in the listening mode 
waiting for the frame bytes to be received from the north and 
west slave I2C ports. Once both the instruction and selector bit 
are received they are then decoded and executed through an 
     
 
Overlapping and 
stitched on the 
edges
Sensor stitched to 
fabric
LSM303DLHC 
sensor mounted on 
a adafruit board
Patches to cover 
wiring
35cm
8.5cm
8.
5c
m
35
cm
Umbilical 
ARM Cortex-M0+ 
LPC824 
microcontrollers
 
INSTRUCTION 
CONTROLLER
P1,4P1,2 P1,3
P2,1 P2,4P2,2 P2,3
P3,1 P3,4P3,2 P3,3
P4,1 P4,4P4,2 P4,3
S1,1 S1,2 S1,3 S1,4
S2,1 S2,2 S2,3 S2,4
S3,1 S3,3S3,2 S3,4
S4,1 S4,2 S4,3 S4,4
INSTRUCTION STREAM FLOW
SELECTOR 
BIT 
CONTROLLER
P1,1
SE
LE
CT
O
R 
BI
T 
FL
O
W
interrupt service routine. After the execution the instruction 
and the selector bit are then forwarded to the neighbors 
through the south and east master I2C ports. A diagonal of 
instruction and their corresponding selector bit that is used for 
implementing the shape reconstruction application is shown in 
Fig. 4. Set of no operation is flushed through the array before 
and after the instruction and selector bit diagonal. 
 
Fig 4: A diagonal of instruction and corresponding selector bit for the shape 
reconstruction application and the illustration of the instructions 
VII. RESULTS 
       A network of 16 sensor nodes was experimentally tested. 
Sensors were arranged on a 4 × 4 grid sewed on the layer of 
fabric with mutual distances 8.5 cm long ways and 8.5 cm 
across as shown in Fig. 3.  
 
To evaluate the accuracy of proposed shape sensing method, 
two experiments were performed. The first involved the fabric 
around a cylindrical object placing the object vertically and 
then reconstructing its shape as shown in figure 5(a) and the 
reconstructed image is shown in 5(b). 
 
      
                       (a)                                       (b) 
Fig 5: (a) Fabric wrapped on a vertical object (b) Reconstructed shape 
 
The graph Fig .6 is plotted against instruction/selector cycle vs 
time. It indicates the time difference between the instruction 
received and the selector bit sent between the processing 
elements. A few instructions take longer to execute because of 
their implementation complexity. For example the 6th 
instruction on P(1,2) and the 7th instruction on P(2,3) which is 
a sensor read takes few milliseconds to read from the sensor 
and to average and store it in the register for further 
computation. Delays occur also because of I2C bus 
communication.  
Fig 6:  Performance analysis for P(1,2) and P(2,3) 
VIII. CONCLUSION 
       The wearable shape reconstruction application has been 
successfully implemented using our proposed concept of ISA 
architecture constructed out of off-the-shelf microcontrollers 
and sensors. Results demonstrate the application executes in 
39 ms on our prototype ISA implementation thus confirming 
the viability of the proposed architecture for fabric-resident 
computing devices. Future work will focus on extending the 
supported instructions, optimising the I2C medium and 
allowing for more concurrency, at node level, between 
computation and communication. 
REFERENCES 
[1] H. T. Kung, C. E. Leiserson, Systolic Arrays (for VLSI), Technical 
Report CS 79-103, Carnegie Mellon University, 1978. 
[2] A. Hermanis, R. Cacurs, M. Greitans, “Acceleration and magnetic 
sensor network for shape sensing,” IEEE Sensors, vol. 16, no. 5, pp. 
1271–1280, March 2016. 
[3] M. Kunde, H.W. Lang, M. Schimmler, H. Schmeck, and H. Schroder, 
“The instruction systolic array and its relation to other models of parallel 
computers,” Parallel Computing, vol. 7, pp. 25–39, 1988. 
[4] H.W. Lang, “The instruction systolic array - a parallel architecture for 
VLSI,” Integration, vol. 4, pp. 65–74,  1986. 
 
 
 
 
0 2 4 6 8 10 12 14 16 18 20 22
0
5
10
15
20
25
30
35
40
Instruction/Selector Bit Cycle
T
im
e 
(m
s)
 
 
Select Bit Sent (P1,2)
Instruction Received (P1,2)
Select Bit Sent (P2,3)
Instruction Received (P2,3)
              
Y
x
z Y x
z
 
ID : Assign Identification number to controller 23 ◌ ◌ ◌ ◌
22 ◌ ◌ ◌ Tx
Rx : Receive data from sensor 21 ◌ ◌ Tx
20 ◌ Tx ↓
: Calculation of Directional vector 19 Tx
↓ ◌
18 ◌ ↓ →
 
↑ 17 ◌ → ◌
↓
: Swap data between North-South ports 16 → ◌ ◌
15
↓ ◌ ◌ ◌
   ←→ : Swap data between East-West ports 14 ◌    ←→ ◌
13    ←→    ←→ 
→ : Send data to east port 12 ◌ ◌ ◌ ↑
11 ◌ ◌
 
↑
↓
↓
: Send data to west port 10 ◌
 
↑
↓
9
 
↑
↓ Rx
: Calculate control points 8
↓ Rx ID
7 Rx ID ID
Tx : Send control points to host computer 6 Rx ID ID ID
5 ID ID ID ID
◌ : No Operation 4 ID ID ID ◌
3 ID ID ◌ ◌
2 ID ◌ ◌ ◌
1 ◌ ◌ ◌ ◌
23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Selector Bits
In
str
uc
tio
ns
∑ 
∑ 
∑ 
∑ 
∑ 
∑ 
∑ 
D
D
D
D
D
