Special purpose parallel computer architecture for real-time control and simulation in robotic applications by Bejczy, Antal K. & Fijany, Amir
llllll1111111111111ll11ll11111111111111111111111 11111111111111111111111 
United States Patent [191 
...... 
US0052 18709A 
[ i l l  Patent Number: 5,218,709 
Fijany et al. [45] Date of Patent: Jun. 8, 1993 
SPECIAL PURPOSE PARALLEL 
COMPUTER ARCH- FOR 
REALTIME CONTROL AND SIMULATION 
IN ROBOTIC APPLICATIONS 
Inventors: Amir Fumy, Sherman Oaks; Anhl K. 
Bejcy, Pasadena, both of Calif. 
Assignee: Tbe United States of America 8s 
represented by the Administrator of 
the National Aeronautics and Space 
Administration, Washington, D.C. 
Appl. NO.: 458JSO 
Filed: Dec. 2a, 1989 
Int. C l . 5  ......................... GO6F 9/00; G06F 15/16 
US. Cl. ................................. 395/800, 3W231.9; 
364/DIG. 1 
Field of h c b  ....................... 395/800, 375, 650 
364/736 
References Cited 
U.S. PATENT DOCUMENTS 
3,753,238 8/1973 Tutclman ......................... 340A72.5 
4,441,152 4,4984 Matsuura et al. ................... 364/200 
4,467,436 8/1984 Chance et a1 
4,470,114 9/1984 Gerhold ..... 
4,574,345 3/1986 Konesky ............................. 364/200 
4,574,394 3/1986 Hiksztynski et al. .................. 382/41 
4,591.981 VI986 Kassabov ............................ 364/200 
4,633,392 12,4986 Vincent et al. ..................... 364/200 
4,684,862 8/1987 Rohrle ................................. 318/568 
4,736,291 4/1988 Jmnin ......... 364/200 
4,814,973 3/1989 Hillis 395/800 
4,873,626 10/1989 Giffor 395/325 
4,891,787 1/1990 Giffor 395/375 
5,016,163 5/1991 Jcsshope et al. .................... 395/800 
5,050,065 9/1991 Dartois et al. ...................... 395/650 
OTHER PUBLICATIONS 
A. Fijany et al., “Parallel Algorithms & Arch. for Ma- 
nipulator Inverse Dynamics” Advan. Robotics: 1989 
Proc. of the 14th Inter. Con. on Adv. Rob., Columbus, 
Ohio, Jun. 13-15, 1989, pp. 202-233. 
Primary Examiner-Parshotam S. Lall 
Assistant Exuminer-Ayni Mohamed 
Attorney, Agent, or Firm-John H. Kusmk, Thomas H. 
Jones; Guy M. Miller 
A Real-time Robotic Controller and Simulator 
with an MIMD-SIMD parallel architecture for interfac- 
ing with an external host computer provides a high 
degree of parallelism in computation for robotics con- 
trol and simulation. A host processor receives instruc- 
tions from, and transmits answers to, the external bost 
computer. A plurality of SIMD microprocessors, each 
SIMD processor being an SIMD parallel processor, is 
capable of exploiting fine-grain parallelism and is able to 
operate asynchronously to form an MIMD architec- 
ture. Each SIMD processor comprises an SIMD archi- 
tecture capable of performing two matrix-vector opera- 
tions in parallel while fully exploiting parallelism in 
each operation. A system bus connects the host proces- 
sor to the plurality of SIMD microprocessors and a 
common clock provides a continuous sequence of clock 
pulses. A ring structure interconnects the plurality of 
SIMD microprocessors and is connected to the clock 
for providing clock pulses to the SIMD microproces- 
sors and provides a path for the flow of data and in- 
structions between the SIMD microprocessors. The 
host processor includes logic for controlling the RRCS 
by interpreting instructions sent by the external host 
computer, decomposing the instructions into a series of 
computations to be performed by the SIMD micro- 
processors, using the system bus to distribute associated 
data among the SIMD microprocessors, and initiating 
activity of the SIMD microprocessors to perform the 
computations on the data by procedure call. 
1571 ABSTRACT 
14 c1plms, 2 Drawing Sheets 
https://ntrs.nasa.gov/search.jsp?R=19930020419 2020-03-24T07:32:46+00:00Z
U.S. Patent 
0 
-\ 
June 8, 1993 Sheet 1 of 2 
N 
cy 
\ 0 
0 
5,218,709 
U.S. Patent June 8, 1993 
0 m 
\ 
i 
I -  
O’ n 
cy / 
c 
Sheet 2 of 2 5,218,709 
5,218,709 
1 2 
associated data among the SIMD micro-processors, and 
PURPOSE PARAUEL COMPUTER initiating activity of the SIMD micro-processors to 
ARcHITEClUW FOR WAGTINE CONTROL perform the computations on the data by procedure 
AND SIMULATION IN ROBOTIC APPLICATIONS call, the basic synchronization mechanism of the RRCS 
5 being data driven but the host processor means further 
ORIGIN ON THE INVENTION comprising means for causing clock based synchroniZa- 
The invention described herein was made in the per- tion among the SIMD processors. 
fmnance of work under a NASA contract, and is sub- In the preferred embodiment, there are six the proces- 
jcct to the provisions of Public Law 96-517 (35 USC sor elements and the control and timing logic includes 
202) in which the Contractor has elected not to retain 10 logic for causing the processor elements to alternatively 
title. perform six independent primitive operations in parallel 
or form two groups of three processor elements to per- 
form two basic (matrix-vector) operations in parallel. 
The invention relates to computer architectures and, Also, the control and timing logic includes logic for 
more e c d a r l y ,  to a Real-time Robotic Controller l5 causing all six the processor elements to form 8 single 
and Simulator (RRCS) which is a MIMD-SIMD paral- group to exploit parallelism in matrix multiplication. 
le1 architecture for interfacing with an external host Further, in computing vector addition or scalar-vector 
computer and for providing a high degree of parallelism multiplication by a group of the processor elements the 
in computations for robotic control and simulation in control and timing logic includes logic for causing par- 
response to instructions from the external host corn- *O allelism in the operation to & exploitad by performing 
puter comprising, host processor means for receiving an independent addition or multiplication by each of the 
instructions from the external host computer and for elements while the operation of three proces- 
transmitting answers to the external host computer; a sor elements within a Same group are synchroked. 
plurality of SIMD micro-processors, each the SfMD Moreover, in computing matrix-vector multiplication 
TECHNICAL FIELD 
processor a parallel capable of 25 or vectorcrm-pr&uct by a group of three the p r m -  
fine grain parallelism, the SIMD processors mr elements the control and timing logic includes logic 
further k ing  able to operate as~nchronousl~ to form a for causing to be by the perfom- 
ing of three independent vector-dot products wherein architecture’ each the ’IMD processor prising a ’IMD architecture capab1e Of performing two the group of the processor elements is distinguished by matrix-vector operations in parallel while fully exploit- 30 global synchronization and sharing of common data. 
ing parallelism in each operation, each the SIMD mi- 
cro-processor comprising, BACKGROUND ART 
a single control unit including a program counter, a 
program memory, pipeline registers, and control and 
timing logic, the 
sions to receive instruction portions corresponding to 
an address field, an instruction field, and a control 
field, respectively, of instructions to be executed, 
In the field of computers used for real-time control 
35 and simulation, there are often unique problems. This is 
particularly true in the field of robotics wherein the 
movement of multijointed arms, and the like, creates a 
computationally intensive environment. Inadequate 
a control bus, computing power has always been the major obstacle in 
an address bus, 4o real-time implementation of advanced robotic schemes, 
a data bus, due to the computational cost of the evaluation of re- 
an instruction bus, quired kinematic and dynamic models. Dynamic simu- 
a memory, lation of the robot arm requires even more computing 
a plurality of processor elements, power than does control. The problem becomes more 
host interface means for interfacing between the host 45 difficult for directdrive arms. representing even faster 
processor means, the control unit, the control bus, the dynamics, and for d u n d a n t  and multiple -5, which 
address bus, the data bus, the instruction bus, and the involve more d e g r w  of freedom @OF). Thus, it is 
memory, widely recognized that parallel computing is the key 
means for obtaining the sequence of clock pulses from solution for achieving rtquhed WmPUhg power for 
the ring structure means and for connecting it to the M real-time control and simulation. 
control unit and each of the processor elements, and operation Of d igid WmPu*fi is cer- 
neighbor interface means for allowing each the tainly not a new concept. So-called mdti-processing 
SIMD processor to communicate with next ad jamt  has been employed in various computationdfy intensive 
SIMD p r m r s  on its “right” and “left” along the environments for a long b e .  L i k e h ,  it is well known 
ring structure means; a system bus connecting the host 55 to employ a large number of computing nodes linked on 
processor means to the plurality of SIMD micro- a token passing h g ,  or the like, to solve major compu- 
processors; a common clock providing a continuous rational problems through a technique called distributed 
sequence of clock pulses; and, ring structure means processing wherein one node is in charge of distributing 
interconnecting the plurality of SIMD micro-proces- the workload among the various other nodes. Large 
son and connected to the clock for providing the clock 60 commercial networks in the order of several hundred 
pulses to the SIMD micro-processors and for providing computers on a common ring have been employed in 
a path for the flow of data and instructions between the the off-hours, for example, to perform “ray tracing” 
SIMD micro-processors; and wherein, the host proces- graphics generation. When so employed, the network 
sor means further comprises means for controllipg the of smaller work stations or nodes can have a combined 
RRCS by interpreting instructions sent by the external 65 computing power equal to one of the multimillion dol- 
host computer, decomposing the instructions into a lar super computers. It is to be understood (and is well 
series of computations to be performed by the SIMD known to those skilled in the art) that the foregoing 
micro-processors, using the system bus to distribute examples of multi-processing and distributed processing 
registers including 
The 
3 
5,218,709 
4 
STATEMENT OF THE INVENTION employ a number of stand-alone computers which ac- complish their common task through the sending and 
receiving of messages between the computers which Accordingly, it is an object of this invention to P O -  
assign tasks, pass results, etc. vide a heirarchical approach for mapping robotic real- 
In investing the problems particularly unique to ad- 5 time simulation and control problems Onto an archit=- 
van& robotics, the inventors herein quickly demon- ture capable of exploiting parallelism Wing both SIMD 
strated that for kinematic and dynamic problems, par- and MlMD Processing. 
titularly those rquhed for real-time control, a pipe- It is another object of this invention to provide an 
b e d  architecture of a of digital computers architecture capable of exploiting pardlelism Wing both 
m o t  r edue  the computation be; that is, for these 10 SIMD and MIMD procesSiag for usc with robotic real- 
problems, concurrency C B ~  only be achieved by exploit- time simulation and control problems and mP1o*g a 
ing parallelism. There are, however, several problems 
to exploiting parallelism in this particular 
studies on the matter resulted in the conclusion that 
there is a high degree of parallelism inherent in these 
particular computational problems. The dificulty in BRIEF DESCRIPTION O F  THE DRAWINGS 
exploiting this parallelism results from the fact that it FIG. is a simplified functional block diagram of the 
exists in different forms and at different levels in the 2o Real-time Robotic and of 
computations. Attempts at exploiting this inherent par- this invention, 
dlelism with conventional parallel architectures have FIG. is a functional block diagram of one of the 
failed S i n c e  these architectures are capable Of exploiting SIMD micro-processors employed in the RRCS of this 
only one type of parallelism; namely, either Single In- invention. 
struction-Multiple Data (SIMD) or Multiple Instruc- 25 
tion-Multiple Data (MIMD). DETAILED DESCRIPTION OF THE 
Some thought has been done by others skilled in the INVENTION 
art in an attempt to provide a parallel computer archi- The Real-time Robotic Controller and Simulator 
tecture which is particularly suitable for robotic uses. In (RRCS) of this invention is shown in functional block 
this regard, for example, there is the 1984 U.S. Pat. No. 30 diagram form in FIG. 1 where it is generally indicated 
4,467,436 of Chance et al. entitled ROBOT ARM CON- BS 10. Unlike known prior art parallel computer archi- 
TROLLER WITH COMMON BUS MEMORY. A tectures which are designed as either a MIMD or a 
very similar architecture is shown in the 1986 patent of SIMD architecture, the principle point of novelty of the 
Konesky (US. Pat. No. 4,574,349, which is not stated specialized computer architecture of the RRCS 10 of 
to be specifically for robotic use. Neither addresses the 35 this invention is that it is designed to be a combined 
problem of optimizing a computational environment MIMD-SIMD architecture. The RRCS 10 employs a 
where (SIMD) and (MIMD) exist in the same problem. plurality of individual SIMD processors 12 (labelled 
There are two important features of such control “1” through “n” in the drawing figure), capable in con- 
computation problems that should be considered simul- cert of MIMD Wallel Processing. Each SIMD PrwS- 
taneously-the asymptotic computation complexity and 40 12 uses a SIMD architecture (to be defied in detail 
the size of the problem (i.e. the degrees of freedom hereinafter) capable of Performing two matrix-vector 
involved). Computation complexity of almost all prob- operations in parallel while fully exPloitiW @]el- 
lems are of order O(n), where n represents the number in each operation. In this sense, the architecture of each 
of degrees of freedom. The inventors herein have size Of the 
new ‘lass Of pardie’ ‘gorithmS* 
Other objects and benefits of this invention will be- 
come apparent from the detailed d ~ r i p t i o n  which 
15 follows hereinafter when taken in conjunction with the application which mUSt & consider&. ne 
drawing figures which accompany it. 
Processor l2 is O p t i m i z e d  for 
shown that these problems dl belong to 
that is, &here we existing parallel a lgo&bs with 8s- 
class; 45 matrim and VWtOrS hlVO1Ved in the robotics problems 
of Particular interest. Furthermore, the design of the 
ymptotic computation complexity (0(1og2(n))) for so~v- 
ing these problems; however, this parallelism is coarse 
grained and leads to a rather small speed-up even for ~ 
highly redundant robot a r m s .  T h s e  observations imply 
that for these probiems the following constraints apply: 
a) Reducing the coefficients of the polynomial complex- 
RRCS ‘0 allows the full exploitation of paral’el’m at 
several levels in the computation, while minimidng 
overhead. 
For purposcs of interfacing to the outside world, the 
RRcs lo in its embodiment Bs depicted in 
FIG. processor 
which CBn bt interfaced to host 
55 SJMD processors 12, the RRCS 10 includes a host pro- 
cesser 14 which, be sees sbofily, handles the 
the activity of the SIMD processors 12, and performs 
the roqked bput/ourput operations. mentioned 
is configured to a p ~  Bs an 
of an jty is more imprtant tha  reducing the sspptotic computer as a part of the bus memory. In ddition the complexity, and 
b) Parallelism is best exploited if the architecture is 
upable of employing the features of MIMD and interfpce with h e  external host computer 18, controls 
SIMD types of parallel processing. 
At the lowest level, parallelism exists in matrix and 
vector operations, which 
for some problems, more significant than) logkthmic 
parallelism. The dimculty in exploiting this type Of 
parallelism results from the small dimensions Of the 
matrices and vectors. Hence, unlike most other scien- 
tific computations, matrix-vector operations related to 65 MIMD-SIMD parallel architecture. 
robotic simulation and control cannot be performed 
efficiently by classical array processors such as pipeline 
processors or systolic arrays. 
at least BS significant (and 60 above, each SIMD p r m r  12 is 
processor capable of exploiting fine -elism; 
and, the S M D  processom 12 operate ~ ~ c ~ ~ -  
nomly and form a MIMD architecture. Therefore, from 
an architectural point of view, the mcs 10 is a 
The SIMD processors 12 arc interconnected through 
a ring structure 16 of a type well known in the art which 
provides for a reliable clock distribution among the 
SIMD 
- 5,2 18,709 
5 
SIMD processors 12 from the single clock UI as well as 
a high spced communications path among them. The 
perfect shuffle topology, which is required for exploit- 
ing logarithmic parallelism, is provided by message 
passing along the communications path of the ring 
structure 16, which reduces the overhead typically 
associated with message passing. The basic synchroni- 
zation mechanism of the RRCS 10 is data driven; how- 
ever, the fact that all the SIMD processors 12 are driven 
by the same clock, i.e. the clock 28, and a regularity of 
the computations allows clock based synchronization 
among the SIMD processors 12. This is a major point of 
deviation from the typical multi-processor computer 
architecture in which each processor or node is totally 
self-contained (Le. includes its own asynchronous 
clock) and the only cohesive aspect of the system is the 
passing of messages between the nodes which, other- 
wise, operate independently on the common system 
tasks. 
In the RRCS 10, the host processor 14 controls the 
whole architecture by interpreting instructions sent by 
the external host computer 18 at the interface 20. These 
instructions are decomposed by the host processor 14 
into a series of computations to be performed by the 
SIMD processors 12. The SIMD processors 12 can be 
considered as the memories of the host processor 14. 
Depending on the required computation in each case, 
the host processor 14 uses the system bus 22 to distrib- 
ute the associated data among the SIMD processors 12 
and initiate their activities by procedure call in a manner 
well known to those skilled in the art. The activity of 
the SIMD processors 12 is then carried out indepen- 
dently from the host processor 14. The end of the com- 
putation is indicated by the SIMD processors 12 to the 
host processor 14 (over the system bus 22), which then 
transfers the results to the external host computer 18 via 
interface M. At first, it may appear that what has just 
been described is no different from any other multi- 
processor system in which one computer allocates tasks 
among the others, which then perform their portion of 
the task and provide their inputs to the central distribu- 
tion point. In this regard, it must be remembered that it 
was said that once the host processor 14 distributes the 
data to the SIMD processors 12 and initiates their activ- 
ity by procedure call, the SIMD processors 12 then 
perform “independently” from the host processor 14. 
This is not to say that, depending on the computation 
being performed, they do not work in concert. Cer- 
tainly, because of the common clock 20 driving them, 
they can work synchronously and not always asynchro- 
nously, as is the case in the typical prior art multi- 
processor system. This novel aspect of the present in- 
vention will be returned to in more detail shortly. 
Turning now to FIG. 2, the preferred architecture of 
a typical SIMD processor 12 of the present invention is 
shown in functional block diagram form. Each SIMD 
processor 12 comprises a single control unit 24 and six 
processor elements 26. As indicated in the figure, the 
clock signal from the common clock ZS is obtained from 
the ring structure 16 and connected to the control unit 
24 as well as each of the processor elements 26. Esch 
SIMD processor 12 also includes n host interface 30 
which interfaces with the host processor 14 as well as 
the control unit 24, the various buses within the SIMD 
processor 12, and the memory 32 of the SIMD proces- 
sor 12 as well. The control unit 24 includes a program 
counter 34, a program memory 36, pipeline registers 38, 
and control and timing logic 40. The SIMD processor 
5 
10 
I5 
20 
25 
30 
35 
40 
45 
50 
55 
60 
65 
6 
12 is basically a micro-processor and, therefore, in the 
interest of simplicity and the avoidance of redundancy, 
those aspects of micro-processor operation which are 
dressed in any detail herein-only those aspects which 
are unique to the SIMD processor 12 in particular and 
the RRCS in general will be addressed in any detail. 
The pipeline registers 38 include provisions to receive 
instruction portions designated BS “AF”, ‘‘IF“’, and 
“CF”, as indicated in the drawing figure. Thcse corre- 
spond to an address field, an instruction field, and a 
control field of each instruction to be executed, respec- 
tively. Each SIMD processor 12 includes a control bus 
42, an address bus 44, a data bus 46, and an instruction 
bus 48. There are also neighbor interfaces 50 by means 
of which tach SIMD processor 12 communicates with 
other SIMD processors 14 on its “right” and “left” 
along the ring structure 16 in the usual manner. As will 
be understood by those skilled in the art from an inspec- 
tion of the drawing of FIG. 2, the control and timing 
logic 40 can use the facilities as described above and 
connected as shown to control the operation of the 
various processor elements 26 in the manner now to be 
described. 
Under the control of the control unit 24, the proms- 
sor elements 26 can perform six independent primitive 
operations in parallel or form two groups of three pro- 
cessor elements 26 to perform two basic (matrix-vector) 
operations in parallel. For other computation problems 
in robotics, e.g. forward kinematics, Jacobian, and for- 
ward dynamics, matrix multiplication is required. In this 
case, under the control of the control unit 24, all six 
processor elements 26 form a single group to exploit 
parallelism in matrix multiplication. In computing vec- 
tor addition or scalar-vector multiplication by a group 
of processor elements 26, parallelism in the operation is 
exploited by performing an independent addition or 
multiplication by each of the processor elements M, but, 
the operation of the three processor elements 26 within 
the same group are synchronized by the control unit 24. 
In computing matrix-vector multiplication or vector- 
cross-product by a group of three processor elements 
26, parallelism is exploited by performing three inde- 
pendent vectordot products (a series of multiply and 
addhbtract operations). In this case, the group of 
processor elements 26 are distinguished by their global 
synchronization and sharing of the common data. For 
example, in computing matrix-vector multiplication, 
while each row of the matrix is read for a corresponding 
processor element 26, the components of the vector are 
read for all processor elements 26 of the same group. 
This provides the possibility of exploiting patallelism in 
read operations since, once data is fetched, it can be 
read by several processor elements 26. Also, in perform- 
ing two similar or different matrix-vector operations, 
the operation of two group of processor elements 26 
are synchronized by the control unit 24. Furthermore, if 
the operations share some data, then the common data 
can be read by the processor elements 26 of the different 
groups thereby increasing the parallelism in read opera- 
tions. 
The abovedescribed organization provides the re- 
quired flexibility since the processor elements 26 can be 
grouped by the control unit 24 according to the needs of 
the particular operation. Problems may arise as a result, 
however. For one, if processor elements 26 perform 
different operations in parallel, then different instruc- 
tions for different processor elements 26 are needed. 
well known to those skilled in the art will not be ad- 
7 
5,2 18,709 
8 
One solution might be to provide individual instructions 
for tach of the processor elements 26, however, that 
solution will lead to complexity and increase the width 
of the control unit's microprogrammed instructions. 
For another, the memory organization and data routing 
among the processor elements 26, which are the classi- 
cal h u e s  in designing the SIMD architectures, become 
even more diEcult because of the required flexibility. 
TWO features of the RRCS 10 of this invention can be 
exploited for solving the above problems. The first 
feature, which is common in all considered problems, is 
the locality in the operation; that is, the computation is 
performed on a small amount of data which reduces the 
size of required memory. Hence, a cache memory can 
be used as the basic memory 32 of the SIMD processor 
12, which provides a very fast access. The second fea- 
ture is that, in any type of operation, one instruction 
may be used by several processor elements 26 since the 
number of possible instructions is limited (i.e. addhub- 
tract, multiply, and multiply and addhubtract). Hence, 
once an instruction is issued, it can be used by several 
processor elements 26. Exploiting the above features, 
the problem of the flexibility of the processor elements 
26 and memory is solved by the control unit 24 time 
multiplexing the operations of the processor elements 
24. The control unit 24 is designed to operate several 
times faster than the processor elements 26, which 
means that the control unit 24 is capable of fetching the 
data and sending the instructions much faster than the 
processor elements 26 perform the operations. 
The processor elements 26 employed in the above- 
described architecture in tested embodiments thereof is 
a simple processor (Ser. No. 74516) capable of perform- 
ing primitive arithmetic operations (addhubtract, mul- 
tiply, multiply and addhbtract,  division, etc.). The 
processor can be run with a clock frequency of 6 MHz 
while the primitive operations (except division) require 
nine to twelve clock cycles to be completed. The pro- 
cessor has only one bus, which means that two clock 
cycles are needed for loading the operands. Aim, each 
operation is defrned by the sequence of two instructions. 
At each clock cycle, one operand along with one in- 
struction can be loaded. The combination of two loaded 
instructions defines the type of operation to be per- 
formed. The control unit 24 runs with the same fre- 
quency as the processors (Le. the processor elements 
za), which means that, exploiting parallelism in reading 
data and instructions, all the processor elements 26 can 
be activated by the control unit 24 within a few clock 
cycles and perform different operations in parallel. This 
scheme, besides providing the desired flexibility, re- 
duces the complexity of the microprogram. In fact, the 
microcode for the architecture as built and tested in 
only twenty-four bits with only horizontal decoding 
(Le. without any vertical decoding), which greatly sim- 
plifies the microprogramming. 
The processor elements 26 are capable of performing 
ldbit fractionary fued-point operations. Using frac- 
tionary instead of integer arithmetic simplifies the scal- 
ing problems since multiplication does not create over- 
flow and the processor elements 26 perform the round- 
ing under the control of the control unit U, however, a 
different scaling scheme is required for each specific 
ann of the robotic amaratus under control or h u l a -  
5 
10 
15 
20 
2s 
30 
35 
40 
45 
50 
55 
60 
tion. It should be tmihasizcd that the performance of 65 
the abovedescribed architecture of the RRCS 10 of t h i s  
invention and its unique manner of operation does not 
result from the speed of the processor elements 26 or the 
performing of fued-point operations. In fact, the pro- 
ccssor elements 26 are very slow since they perfom 
fued-point multiplication in 1.5 pscc. As is well known, 
many commercially available floating-point processors 
are more than an order of magnitude faster in perfom- 
ing floating-point operations. The overall increased 
performance of the architecture of this invention in the 
particular environment for which it was particularly 
and uniquely designed mults from (1) its capability of 
exploiting parallelism at different levels in the compute- 
tion, (2) by employing many processor elements %, and 
(3) by minimizing overhead. 
Wherefore, having thus described our invention, 
what is claimed is: 
1. A Real-time Robotic Controller and Simulator 
(RRCS) for interfacing with an external host computer 
and providing a high degree of parallelism in computa- 
tions for robotic control and simulation in response to 
instructions from the external host computer, said Real- 
time Robotic Controller and Simulator comprising: 
a) host processor means for receiving instructions 
from the external host computer and for transmit- 
ting answers to the external host computer: 
b) a plurality of parallel SIMD processors, said 
SIMD processors operable asynchronously in an 
MIMD architecture, each said SIMD processor 
comprising a SIMD architecture capable of per- 
forming two matrix-vector operations in parallel 
while fully exploiting parallelism in each operation: 
c) a system bus connecting said host processor means 
to said plurality of SIMD processors: 
d) a common clock providing a continuous sequence 
of clock pulses: and, 
e) ring structure means interconnecting said plurality 
of SIMD processors and connected to said clock 
for providing said clock pulses to said SIMD pro- 
cessors and for providing a path for the flow of 
data and instructions between said SIMD proces- 
sors; and wherein, 
9 said host processor means further comprises means 
for controlling the RRCS by interpreting instruc- 
tions sent by the external host computer, decom- 
posing said instructions into a series of computa- 
tions to be performed by said SIMD processors, 
using said system bus to distribute associated data 
among said SIMD processors, and initiating activ- 
ity of said SIMD processors to perform said com- 
putations on said data; and wherein, 
g1) a single control unit including a program 
counter, a program memory, pipeline registers, 
and control and timing logic, said pipeline regis- 
ters including provisions to receive instruction 
podom corresponding to an address field, an 
instruction field, and a control field, respec- 
tively, of instructions to bc executed, 
g) each said SIMD processor further comprises, 
g2) a control bus, 
g3) an address bus, 
g4) a data bus, 
gS) an instruction bus. 
g6) a memory. 
87) a plurality of processor elements, 
g8) host interface means for interfacing between 
said host processor means, said control unit, said 
control bus, said address bus, said data bus, said 
instruction bus, and said memory, 
g9) means for obtaining said sequence of clock 
pulses from said ring structure means and for 
9 
* 5,218,709 
10 
connecting it to said control unit and each of said said host processor means further comprises means 
processor elemenb, and for causing clock based synchronization among 
g10) neighbor interface means for allowing each said SIMD processors. 
said SIMD p r m r  to communicate with next 10. A Real-time Robotic Controller and Simulator 
adjacent SIMD processors along said ring strut- 5 (RRCS) for interfacing with an external host computer 
ture means. and providing a high degree of parallelism in computa- 
tions for robotic control and simulation in responx to 
instructions from the external host COmpUtCr, said Real- 
time Robotic Controller and Simulator compriSmg: 
a) host processor means for receiving instructions 
from the external host computer and for transmit- 
ting answers to the external host computer; 
b) a plurality of parallel SIMD processors, said 
SIMD processors further being able to operate 
asynchronously to form a MIMD architecture, 
each said SIMD processor capable of performing 
two matrix-vector operations in parallel while fully 
exploiting parallelism in each operation; 
c) ring structure means interconnecting said plurality 
clock for providing clock pulses to said SIMD 
processors and for providing a path for the flow of 
data and instructions between said SIMD proces- 
sors; 
dl)  a single control unit including a program 
counter, a program memory, pipeline registers, 
and control and timing logic, said pipeline regis- 
ters including provisions to receive instruction 
portions corresponding to an address field, an 
instruction field, and a control field, respec- 
tively, of instructions to be executed, 
2. The Real-time Robtic  Controller and Simulator 
(RRCS) of claim 1 wherein there are six said processor 
elements and wherein additionally: 
said control and timing logic includes logic for -us- 10 
ing said processor elements to alternatively per- 
form six independent primitive operations in paral- 
le1 or form two groups of three processor elements 
to perform two basic (matrix-vector) operations in 15 
parallel. 
3. The Real-time Robotic Controller and Simulator 
(RRCS) of claim 1 wherein there are six said processor 
elements and wherein additionally: 
said control and timing logic includes logic for Caw- 2o of SIMD processors to a h g  all six said processor elements to form a single 
groups to exploit parallelism in matrix multiplica- 
tion. 
4. The Real-time Robotic Controller and Simulator 
(RRCS) of claim 1 wherein there are six said processor 25 
elements and wherein: 
in computing vector addition or scalar-vector multi- 
plication by a group of said processor elements said 
control and timing logic includes logic for causing 
parallelism in the operation to be exploited by per- 30 
forming an independent addition or multiplication 
by each of said processor elements while the opera- 
tion of three processor elements within a same 
group are synchronized. 
5. The Real-time Robotic Controller and Simulator 35 
d) each said SIMD processor comprising, 
d2) a control bus, 
d3) an address bus, 
d4) a data bus, 
d5) an instruction bus, 
d6) a memory, 
d7) a plurality of processor elements, 
d8) host herface means for interfacing between 
said host processor means, said control unit, said 
control bus, said address bus, said data bus, said 
instruction bus, and said memory. 
d9) means for obtaining said sequence of clock 
pulses from said ring structure means and for 
connecting it to said control unit and each of said 
processor elements, and 
d10) neighbor interface means for allowing each 
said SIMD processor to communicate with next 
adjacent SIMD processors dong said ring struc- 
e) a system bus connecting said host processor means 
to said plurality of SIMD processors; and 
f) said host processor means further comprises means 
for controlling the RRCS by interpreting instruc- 
posing said instructions into a series of computa- 
tions to be performed by said SIMD processors, 
'using said system bus to distribute associated data 
among said SIMD processors, and initiating activ- 
ity of said SIMD processors to perfom said corn- 
putations on said data. 
11. The Real-time Robotic Controller and Simulator 
(RRCS) of claim 10 wherein there are six said processor 
said control and timing logic includes logic for caus- 
ing said processor elements to alternatively per- 
form six independent primitive operations in paral- 
le1 or form two groups of three processor elements 
(RRCS) of claim 1 wherein there are six said processor 
elements and wherein: 
in computing matrix-vector multiplication or vector- 
cross-product by a group of three said processor 
elements said control and timing logic includes 40 
logic for causing parallelism to be exploited by the 
performing of three independent vector-dot prod- 
ucts wherein said group of said processor elements 
is distinguished by global synchronization and 
sharing of common data. 
6. The Real-time Robotic Controller and Simulator 
(RRCS) of claim 1 wherein said SIMD architecture of 
each said SIMD processor comprises plural processing 
elements comprising means for performing plural oper- 
ations simultaneously within said SIMD processor. 
7. The Real-time Robotic Controller and Simulator 
(RRCS) of claim 6 wherein said SIMD architecture of 
each said SIMD processor further comprises means for 
pendent primitive operations simultaneously by respac- 
tive plural groups of said processing elements. 
8. The Real-time Robotic Controller and Simulator 
(RRCS) of claim 6 wherein said plural groups comprise 
one of: (a) s u  groups of one processor element each for 60 
performing six primitive operations simultaneously, and 
@) two groups of three processor elements each for 
performing two three-dimensional operations ShUlta- 
neously. elements and wherein additionally: 
(RRCS) of claim 1 wherein a basic synchronization 
mechanism of the RRCS is data driven but wherein 
additionally: 
45 
50 ture means; 
causing said Processor dements perfom inde- 55 tiom m t  by the e x m a l  host computer, d m m -  
9. The Real-time Robotic Controller and Simulator 65 
11 
5,2 18,709 
12 
to perform two basic (matrix-vector) operations in 
parallel. 
12. The Real-time Robotic Controller and Simulator 
(RRCS) of claim 10 wherein there are six said processor 
elements and wherein additionally: 
said control and timing logic includes logic for caus- 
h g  dl six said processor elements to form a single 
group to exploit parallelism in matrix multiplica- 
tion. 
13. The Real-time Robotic Controller and Simulator 10 
forming an independent addition or multiplication 
by each of said processor elements while the opera- 
tion of three processor elements within a same 
group are synchronized. 
14. The Real-time Robotic Controller and Simulator 
(RRCS) of claim 10 wherein there are six said processor 
elements and wherein: 
in computing matrix-vector multiplication or vector- 
cross-product by a group of three said processor 
elements said control and timing logic includes 
logic for causing parallelism to be exploited by the 
performing of three independent vectordot prod- 
ucts wherein said group of said processor elements 
is distinguished by global synchronization and 
sharing of common data. 
5 
(RRCS) of claim 10 wherein there are six said processor 
elements and wherein: 
in computing vector addition or scalar-vector multi- 
plication by a group of said processor elements said 
control and timing logic includes logic for causing 15 
parallelism in the operation to be exploited by per- . , * . *  
20 
25 
30 
35 
40 
45 
50 
55 
65 
