Direct kinematics solution architectures for industrial robot manipulators: Bit-serial versus parallel by Kim, K. & Lee, J.
3rd NASA Symposium on VLSI Design 1991
N94-18358
6.2.1
Direct Kinematics Solution Architectures
for Industrial Robot Manipulators:
Bit-Serial Versus Parallel
J. Lee
Department of Electrical Engineering
University of Houston
Houston, TX 77204-4793
K. Kim
Superconducting Super Collider Lab.
2550 Beckleymeade Avenue
Dallas, TX 75237
Abstract - This paper investigate a VLSI architecture for robot direct kine-
matic computation suitable for industrial robot manipulators The Denavlt-
Hartenberg transformations are reviewed to exploit a proper processing ele-
ment, namely an augmented CORDIC. Specifically, two distinct implementa-
tions are elaborated on_ such as the blt-serlal and parallel. Performance of each
scheme is analyzed with respect to the time to compute one location of the
end-effector of a 6-11nks manipulator_ and the number of transistors required.
1 CORDIC Techniques
The matrix Aj describing the jth llnk is proposed to be implemented via 4 CORDICs:
parallel two for the w-axls operation, and another parallel two for the x-axls. Since, the
rotation and translation are disjoint each other, the 4 CORDIC can be done via a 2-stages
cascade [5].
Let the jth joint orientation vector denote by pj, where pj = Ajpj-1. Consider an
A betweenpj and pj-l:intermediate vector pj,
pj -- Trans(wj_l,dj)Rot(wj_:,6j)p_ : stage- 1 (1)
p_ -- Trans(xj,aj)Rot(xj,¢j)pj_l : stage- 2. (2)
One set of transformations for each stage, i.e. Trans(w, d)Rot(w, 6), is a block-diagonal
matrix and can be orthogonally implementable by two 2x2 matrix transformations. Note
that is implementable through an augmented PE, rather two different PEs, observing that
Trans(w, d) is a trivial operation. Then,
pj =
xj
Yj
wj
1
Rot(wj, ej) : 0
0 : Trans(wi,di)
Apj. (3)
https://ntrs.nasa.gov/search.jsp?R=19940013885 2020-06-16T18:08:01+00:00Z
pj (also, similarly for pC) is decomposed into two blocks, e.g the first two elements of Pi
becomes one vector Xi:
pj = [x_;ws, 1]' = [Rot(ws, 0s); ws + dj, 1], (4)
where wj is for the w-axis component of the vector pj, and Xj for x- and y-axis components
rbtated by 6j. in a similar way, for p# we can choose a rotated Vector of y- and w-_s
disjointly through axis shuffling. Finali_/, _eii_e-_U_e n-palrs of rotation and translation
Can be implemented via a 2n's{ag_s _sc/ide. We wilI name each stage _ a macr0-PE
(or, an augmented PE), which 6_.n be 2n-pipelined to Compose an n-links computation
processor. Not to differentiate the two different sets of transformations, w-axis and x-axis
respectively, we employ index i in unifie_ notations foia macro:l_E: foia referefice _S
wi,there are rotation of 0i and translati0n of di, Xi-1 = (;ei-l,yi-1) for an input, and
£, = (i,,v,) tor _il oli_i;uL _--_.....
Each macro-PE including one Trans(w_, d_) and one Rot(w_, $) can be implemented,
_s in F_gure 1.a: One:joint processor is shown in Figure i.]9. Finally, for a _-joln_s s_rs_ern,
Figiire i.c shows a fuJ]y __lSellned s_ruC_ure,
From this point, we will concentrate on implementation of a macro-PE. Observing
_hdt Rot and _ans run'ions are- d_sjoint: each=other, ie_ Us iS0iki8 _r_K{ion #iri ai
first. This vector rotation for X_ = (z_, y_) by the angle 0i can be realized by an iteration
algorithm called CORDiC [4] instead of computing trigonometric functions and applying
matrix-muitipiication. C-ORE/iC reaJizes a vector roiaiion by a partial sum of mier0-angle
ibia{ions with a pre-fixed Sequence of angles. When the ro_aiion macro-angle iS iepresent;ed
as a sum of decomposed micro-angies, i.e 0i = _=0 0ilk,
_o = tanOi,_ = 1 , i-_ (5)
where ki = cosOi,i _s a micro-scale composing a find scale factor, explained later. Such
a specific form of the pre-fixed micro-angle sequence as tan -_ 2 -i, is attractive for VLSI
impiem_ta_on s_ce it is c0mp0sed-on]-y of addltions, Sialhlngs, and a arc_angent iookUp
table For the simplicity of notaeon, subscript _ indexing a certain stage win be omitted,
and X, Y and 2 Stand for abridged notations for those having subscript i.
Non-redundant- The micr0-itera_ions of the conventional (hereafter, it wiU be called:_
naon-redundan_ ) C(_E_- are _ _near recurslve equations: X recurrence (X-rec.), Y-
reCurrenCe (Y-tee.) and ZLrecUrrence (z-rec.) [4].
Y[i + 1] = Y[i]- ai2-'X[i]
Z[i + 1] = Z[i] - ai tan-' 2 -i (6)
With an ini6al vaaue of Z[O] = 0_, CORDIC rotates initial values of X[O] and Y[O], to the
last value X[n] and Y[n], while making Z[i] close to zero, so that Z[n] is forced to be zero.
With n number of iterations, n-bit accuracy of X and Y in the output can be achieved.
L--
m
E
E
E
m
3rd NASA Symposium on VLSI Design 1991 6.2.3
xi-1 Yi-1 wi-1
×i A yAi wt
(1.a)
Xo YO Wo
I I J_
xl i Y_p_/wll ....
x i Yi wi
xi-1 Yi-1 wi-1
O i__(_ di
xi Yi wi
(1.b)
x 5
IF
Y5 w5
J L
x6 Y6 w6
(1.c)
Figure 1: CORDIC-based Pipelined Architecture for Direct Kinematics Computation: a.
A macro-PE, One-stage from an orientation to an intermediate, b. 2-stages cascade, An
Ai transformation module for a link, c. A complete pipelined Computation Module for
6-links system.
6.2.4
For a known angle, the direction of the rotation, _r, can be pre-computed or calculated one
by one on-the-fly using the following selection function.
1 if Z[i I >_ 0o-i-- -1 ifZ[i]<0 (7)
The CORDIC rotation does not preserve the input norm. To get a rotated vector having
the same length as the input (X[0], Y[0I), X[_I(Y[=]) needs to becompensatedby a scaling
factor K
g = ii[X[n],Y[n]]tlt ,,-1
- 1-I ¢1 + a_2 -2', (8)
ll[X[O],Y[O]]'ll ,=0
where I]" II stands for the norm of the vector. Note that K is constant for the non-redundant
scheme since ai is in {-1, 1}.
Redundant • Non-redundant CORD!C is slow inherently with delay of O(n 2) due to
its recurs_veness and serial dependency, since a micro-rotation with delay O(n) should be
finished before processing the next micro-rotation. Delay performance of a macro-rotation
(n micro-rotations) can be improved from O(n_ to O(n i by using redundantarithmetic
(carry-free addition such as carry save or signed-dlgit addition) to determine the direction
of the rotation _,, based on an estimate instead of an exact value [9]. Th e r edund_ant
arithmetic gives a delay of O(1) instead of O(n), and the estimation of direction is necessary
not to erode the advantage of O(1). This requires the .modification of the recurrences and
selection function. This redundant CORDIC scheme produces the output about 4 times
faster than tile non-redundant. However, it introduces additional cost since the scale factor
K is variable depending on a macro-angle by allowing &i to be in {-1, 0, 1}.
Constant-Factor-Redundant : To reduce implementation cost of redundant CORDIC,
it would be good to have a constant scale factor by forcing _i in {-1, 1}. However, since _i
is determined from an estimate, there arises a convergence assurance question. There was
proposed a scheme appending correcting iteration stages at proper positions [10]. Along
to this idea, the number of extra correcting iterations is further reduced by dividing the
micro-iterations (for i - 0 to i = n - 1) into two groups: one group where the direction of
the rotation is in {-1, I} for i - 0 to i = n/2 and the other in {-1, 0, 1} for i = (n + 1)/2
to i = n - 1 correcting _terations by 50 % since correcting iteration is not needed for the
second half of the micro-iterations a_nd we still obtain a constant scale factor K since the
value of K in n-bit precision does not depend on the _ value for (n + 1)/2 < i < (n- 1). Z-
recurrence also can be modified so that _i is determined quickly by looking at a few most
significant bits. This new scheme is called Constant-Factor-Redundant-CORDIC(CFR-
CORDIC). The modified recurrences and selection functions for the scheme are described
below.
x[i + 1] = x[i] +
Y[i + 11 = Y[i]- _,2-'X[i]
u[i + 1] = 2(tr[i]- tan-' 2-') (9)
3rd NASA Symposium on VLSI Design 1991 6.2.5
where U[i] is for the implementation simplicity, which is equal to 2iZ[i], and the selection
function is given as follows:
1 if 0"[i] > 0
or O[i]--0 n i < n/2
&i = 0 _r[i] = 0 A i > n/2
-1 ifU[i]<O
(10)
When t fractional bits are used in the estimate value, i.e., _r[i] is computed using t
fractional bits of redundant representation of U[i], the following correcting iteration need
to be included, where the interval between indexes of correcting iterations should be less
than or equal to (t - 1) up to the last iteration index equal to n/2. When the correction
stage is necessary at the jth step of micro-iteration,
uC[j + 1] = U[j + 1]- 2Oc2itan-i2 -_ (11)
with the direction of the rotation _ determined from the same selection function of
eq.(10), except being decided based on U[j + 1] instead of _'[i].
So far, we discussed about recursive structures of several CORDIC schemes to imple-
ment the rotation part in the basic PE, as depicted in Figure 1. The PE, augmented by a
translator, necessitates scaring operation at each stage, because shuffling of the output at
each stage makes continuous accumulation of the scaring factor complex to be processed
at the final stage. The scaling operation has been solved either by an explicit way or an
implicit. The explicit way is dividing the rotated vector by a constant, which is known for
the non-redundant, to be calculated while running the micro-steps of CORDIC [4,9]. The
division can be processed by another CORDIC (in a linear mode) or a divider. The implicit
approach reconfigures the sequence of micro-iterations of the CORDIC, eventually to have
a different norm from that without scaling micro-iterations. Scaling micro-iterations target
in general at making the adjusted scaling factor in a form of 2i or 1, which can be easily set
to the unit size. Each micro-iteration can be composed of i) reduction axis-scallng [11],
ii) repetition of vector-scaling, iii) expansion axis-scaling or combinations thereof [12].
Relevant issues regarding solution search are to be further studied, more than the greedy
method or the decomposed [13]. In summary, the explicit scaling almost doubles the
system complexity, while the implicit :increases 25 % for the non-redundant and about 30
% for the redundant.
2 Application to Direct Kinematics
In this section, we design an architecture for the direct kinematics computation, based on
CFR-CORDIC. The data-path is the parallel. To analyze its performance, we will define a
new measure, namely one-position calculation time. Via this measure, we will also analyze
performance the bit serial architecture similarly implementable as in
6.2,6
2.1 Performance Measure
Let's define the following parameters.
bi : the number of bits in each input z, y and w
b/: the number of bits in each output
n f : the number of links (=6)
fc : the available data shift rate
A : the step time per micro CORDIC iteration
fi :the input bit rate
Additionally, we define a measure parameter Tz_,
T_ = step-time(A) • number of steps, .......
to comi_are tiae performance o['-var_usschehaes. For a d-lscrete elernentqmpl-erneht-a210ia,
A corresponds to one single external clock time 1 ft. Note that A varies depending-6_ a
particular implementation of a macro-PE. Without loss of generality, let's define the unit
of A to be 1 for one-bit full addition time. The input processing rate can be alternatively
interpreted as
< , (12)
b_ T_
which limits the maximum rate of input vector sampling to be processable through an
implemented processor.
• 7-- --
2.2 Performance Comparison
Bit Serial: A macro-PE using serial data path and arithmetic units for CORDIC is shown
in Figure 2 [6]. Figure 2.a shows symmetric componentsofabit-serial PE in x, yandw
representation, and Figure 2.b is for the detail of each block (X-recurrence or Y-recurrence)
employing bit serial arithmetic. W-recurrence is in Figure 2.c, and Z-recurrence in Figure
2.d. The x and y components of the input vector Xi-1 are taken initially as X[0] and
Y[0], and the initial angle Z[0] is set to the corresponding joint angle. After performing n
m,cro-ateratlons, CORDIC produces n-bit prec,s_on outputs Ieadmg to X;.
In the serial scheme without macro-pipelining, denote a basic step-time as A,, w_ch
is equivalent to A. To use one adder recursively n/times to process an nl links; -
Ta, - A_ • n.t(b¢ + b_(b_ + Iog_bi)),
where the output has b/ bits buffer.
CFR-Redundant Parallel : To increase the throughput of the previous, the bit-
serial PEs can be substituted by those using parallel arithmetic. When parallel arithmetic
and non-redundant CORDIC are adopted, the corresponding parameter becomes
Ta_ = A2 * nf(b_ + log2bi)
where A2 equals to the time for one micro-rotation (time for variable shifter plus time for
carry-propagate addition), approximately 2 log_ bi assuming fast variable shifter and carry-
propagate adder. The step time can be further shortened by adopting CFR-CORDiC,
=
!
!
=
i
it
i
z
m
3rd NASA Symposium on VLSI Design 199I 6.2.7
x. y, wi 1 i
(2.a)
Xi_l(Yi.1)
_____ _ Y (x)
1
x i (Yi)
X(v)
(2.b)
wi_ I d i
W,
I
(2.c?
-1 -i
tan 2
(2.d)
Figure 2: A bit-serlal PE : a. A macro-PE with X-, Y- and W-recurrence, b. Detail of
either block, c. W-recurrence, d. Z-recurrence.
wheie a Ck_ry-i'reg adder (signed-digit adder) is replaced for car_y-pr6paga_e adder. Figure
3.a sho_ _k _a_ro-PE in componentsi and Figure 3.b is for the detail of each block (X-
i:_ettt_r_ii_ b_ _-reeurrence) empioying parallel/redundant arithmetic. Z-recurrence is in
x
i-1
I
4_
X- rec.
i x[i+l]
i
_._)
y
i-I
4P
Y-re.c.
4_
Y[i+l]_
i
d_
ii
!
!
J_
1 In
I
t x(-_,i) _--
CP_addcr [
T
[ R_eister [
x[i+U (or vii+l])
(3.b)
__ , ].
1
(_i+l _l
i-1
V
W.
1
21tafil2. i :=_tz_]
I I
! ÷ + I
I
zl_41
(3.c)
c
E
Figure 3: A parallel/redundant PE: a.
o_ eRher block, e. Z-re-eurrenee.
A macro-PE with X- and Y-recurrence, b. Detail
3rd NASA Symposium on VLSI Design 1991 6.2.9
Description AJA Ta_ Processing TRs
rate estimate
Bit-serial 1 1200A 600K 2K
(parallel) 4M 12K
5 500AParallel(CFR)
(parallel)
2M
10M
6K
40K
Table 1: Time and complexity comparison
In this case, the sign of Z[i] at the ith micro-iteration can not be detected by looking
at the most significant bit since Z[i] is in redundant number representation. To determine
the sign of Z[i] quickly by looking at a few significant bits, CFR-CORDIC uses an estimate
of shifted-Z[/] (V[i]) using t fractional bits. As discussed earner, the number of fractional
bits used for the estimate also determines the frequency rate of a correcting iteration: more
fractional bits are used, less number of correcting iterations are required. Let the number
of correcting iterations denoted by ,7. The corresponding TAs becomes
Ta3 =/ks * nf(bl + Iog2bi + TI)
where A3 equals to the time for carry-free addition plus the time for the maximum of a
selection function and a variable shifter, approximately (1 + log2bi). Note that a practical
number of correcting iterations is much smaller than bi, e.g. 1 for the 16bit resolution.
Hence, we can approximate TA 3 to be that for the redundant without a correcting iteration.
For a case, bi = 12, b/ = 16, the estimated Ta is summarized in Table 1. To get first
order estimates of available speed and area, we use a figure that one full adder (also one
bit shifter) requires approximately 50 TRs and one 20nsec clock cycle [14].
3 Conclusion
We have examined various kind of CORDIC schemes as a macro-PE module for the
direct kinematics processor, and showed that its micro-level regularity is suitable for
VLSI implementation, depicted along with specific schematics which include the conven-
tional non-redundant, the redundant and the Constant-Factor-Redundant schemes. The
cost-effectiveness of selected architectures has been analyzed using bit-serial, parallel or
pipelined structure with respect to the time and the number of modules required, to
compute one location of the end-effeetor for a 6-1inks manipulator, given a set of angle
measurements The comparison table exhibits the CORDIC-based robotics processor as a
prospective solution in VLSI to be used for a wide range of kinematics calculation require-
ment, compromising the size versus speed.
6.2.10
References
[1] J. Denavit and R. Hartenberg, "A Kinematic Notation for Lower-Palr Mechanisms
Based on Matrices," Journal of Applied Mechanics, pp.215-221, 1955.
[2] P. Nanua, K. Waldron and V. Murthy, "_Direct Kinematic S01u_ti9B of a Stewart Plat,
form," IEEE Trans on Robdt_cs and iutomation, Vol 6, No :4_pp.438-444, Aug. 1990.
[3] D. Moldovan and G. Lee, "On the Use of Parallel Architectures for Robotic Manip,
ulators: The Kinematics Problem," Int. J. Robotics and Automation, Vol 1, No 2,
pp.47-53,!9s 
[4] J. Walther_ "A Unified Algorithm for Elementary Functions," AFIPS Spring Joint
Computer Conference, pp.379-385, 1971.
[5] C. Lee, "CORDIC-based Architectures for Robot Direct Kinematics and Jacobian
Computation," 3rd Int. Syrup. Intelligent Control, pp.609-614, 1988.
[6] R. Harbe r et. al.,"Bit-serial C_ORDIC Circuits for Use in a VLSI Sincon Compiler,"
Int. Conf. Circuit and System, pp.154-157, 1989. - -
[7] M. Kameyama, T. Matsumoto and H. Hideki, "Implementation of a High Performance
LSI for Inverse Kinematics Computation," IEEE Int. Conf. Robotics and Automation,
pp.757-762_ 1989.
: .. .... } :_ ; . {
[8] H. Kung, "Let's Design Algorithms for VLSI systems," Caltech Conf. V£SI, pp'65-90.
1979.
[9] M. Ercegovac and T. Lang, "Redundant and On-Line CORDIC: Application to Matrix
triangularization and SVD," IEEE Trans. on Computers, Vol. C-39, No 6, pp.725-740,
June 1990.
[10] N: Taka_, T. Asada and S. Yajima, "Redundant CORDiC methods with a constant
scale factor for sine and cosine computation", Submitted to IEEE Trans. on Computers,
1989.
[111 G. Haviland and A. Tuszynski, "A CORDIC Arithmetic Processor Chip," IEE _ TraB .
[i2] J. Delosme, "VLS] Implementation of Rotations in Pseudo-Euclldean Spaces," Proc.
[13] J. Lee and T. Lang, "Matrix triangularization by fixed-point redundant CORDiC
with a constant scale factor," Proc. SPIE Conference on Advanced Signal Processing
Algorithms, Archite_tures_ and !mplementatigns _ July 1990.
[14] J. Harding, Ti:Lang and J. Lee, "A Comparison:of Redundant CORDIC Rotation
Engines," Int. Conf. Computer Design 91, Oct. 1991.
|
|
t::=
