A high performance multiplier processor for use with aerospace microcomputers by Pierce, P. E.
25
A HIGH PERFORMANCE MULTIPLIER PROCESSOR
FOR USE WITH AEROSPACE MICROCOMPUTERS
P. E. Pierce
Sandia National Laboratories
Albuquerque, New Mexico
An MC68000-based microcomputer including a hardware multiplier
processor has been designed and prototyped for a re-entry
vehicle navigation and control application. In this paper,
the microcomputer is discussed with emphasis on the multiplier
processor architecture, software control and theory of operation,
The MC68000 CPU of the microcomputer cannot satisfy the real-
time multiply processing requirements of a high accuracy RV
navigator. The standalone CPU thru-put for multiply intensive
applications is increased approximately seven times by the
addition of a board level Hardware Multiplier Processor (BMP).
Although the HMP was designed for the MC68000 microcomputer,
it can be used with any 16 or 32 bit CPU with minimal
modifications.
The memory mapped HMP performs 16 and 32 bit multiplications
and can optionally add or subtract the full product to previous
accumulator contents. The circuitry is sufficiently fast to
allow the MC68000 running at 8 MHz to write single or double
precision variables to the HMP using memory to memory transfers
and perform an operation with no wait states introduced or
overhead time for command passing.
The result of multiply and accumulate operations may be
transferred in its entirety or scaled by 2*30 and rounded
automatically prior to transfer to the destination location
specified by the CPU. Worst case CPU wait times introduced
are: 3.3 ysec for double precision scale by 2""^ * and round
to single precision; and 6.3 ysec for quadruple precision scale
by 2~30
 ana round to double precision.
i
The Hardware Multiplier Processor incorporates Serial/Parallel
Hardware Multiplier ICs, a translation PROM and address con-
trolled logic to implement previously mentioned arithmetic
functions. The use of serial arithmetic circuitry yields a
processor of small physical size, low power and significant
flexibility. The computation time of the HMP is shorter than
most of the general memory addressing modes of the host CPU.
The nine least significant CPU address bits in conjunction
with the translation PROM control all HMP functions. The
translation PROM provides the function related serial clock
count to the clock control logic which in turn controls all
HMP timing.
Preceding Page Blank
1ST
https://ntrs.nasa.gov/search.jsp?R=19810003161 2020-03-21T16:10:05+00:00Z
SANDIA AEROSPACE COMPUTER VERSION 4
CSANDAC IV)
ARCHITECTURE
MC68000 CPU
32 BIT DATA AND ADDRESS REGISTERS
56 INSTRUCTIONS
11 ADDRESSING MODES
MEMORY MAPPED I/O
16 BIT DATA BUS
16 M BYTE ADDRESS SPACE
HARDWARE MULTIPLIER PROCESSOR (HUP)
VECTORED INTERRUPTS
POWER REQUIREMENTS
+5V 9 3A TYPICAL
PHYSICAL
EXPANDABLE MODULAR CONSTRUCTION
STACKABLE PIN-SOCKET INTERMODULE BUS
17.8 CM x 15.9 CM x 1,27 CM MODULES
SANDAC IV CPU MODULE
MC68000 CPU
16K BYTE EPROM MEMORY
16K BYTE NON-VOLATILE CMOS RAM
POWER MONITOR & RESET CIRCUIT
SANDAC IV I/O MODULE
4 CHANNEL OPTO-ISOLATED USART SERIAL I/O
8 CHANNEL PRIORITY INTERRUPT CONTROLLER
5 CHANNEL PROGRAMMABLE 16 BIT TIMER/COUNTER
16 BIT MEMORY MAPPED «K WORD) I/O
188
SANDAC IV HMP MODULE
MEMORY MAPPED REGISTERS AND FUNCTIONS
SINGLE PRECISION (16 BIT) AND DOUBLE PRECISION
(32 BIT) FUNCTIONS
MULTIPLY WITH OPTIONAL ADD OR SUBTRACT TO PREVIOUS
ACCUMULATOR CONTENTS
SCALE 2±N AND ROUND
OVERFLOW DETECTION RELATING TO ACCUMULATION
ADDRESSING ERROR DETECTION
CONTROL FUNCTIONS DERIVED FROM LATCHED ADDRESS BITS
HOST CPU ALLOWED TO PROCEED IN PARALLEL WITH HMP
AUTOMATIC HOLD-OFF OF HOST CPU IF HMP BUSY
HARDWARE MJLTIPLIER PROCESSOR
BTX>CK DIAGRAM
L
DPROD
SPROD
DPROD
ACCUMULATOR OUT
ROUND PULSE
ERRORS
189
HARDWARE MULTIPLIER PROCESSOR
CONTROL BLOCK DIAGRAM
FUNCTION
*• CONTROL
(A4-A8)
DATA
TRANSFER
ACKNOWLEDGE
CLOCKS
EXAMPLE HMP ADDRESS MAPPED FUNCTIONS
ADDRESS
(HEX) FUNCTION
FEOO
FE02
FE(M
FE06
FE08
FEOA
FEOC
FEOE
FEOE
FE1E
FE3E
FE5E
FE7E
FE8E
FE90
READ-CLEAR STATUS REGISTER
READ/WRITE P4
READ/WRITE P3
READ/WRITE P2
READ/WRITE PI
READ/WRITE fW
READ/WRITE M3
WRITE M2
READ STATUS REGISTER
WRITE VS., S.P. MULTIPLY & ADD
CLEAR ACCUM., S.P. MULT. & ADD
S.P. MULTIPLY & SUBTRACT
CLEAR ACCUM., S.P. MULT. & SUB.
WRITE M2,
WRITE M2,
WRITE M2,
WRITE M2
WRITE Ml, D.P. MULTIPLY 8 ADD
190
EXAMPLE HMP ADDRESS MAPPED FUNCTIONS
ADDRESS
(HEX) FUNCTION
FFOO D.P. P REG x 2"M & S.P, ROUND
FF1A D,P. P REG x 2"1 & S.P. ROUND
FF1C D.P. P REG x 2° & S.P. ROUND
FF1E D.P. P REG x 21 & S.P. ROUND
FF38 D.P. P REG x 214 & S.P. ROUND
FF80 Q.P. P REG x 2"30 & D.P. ROUND
FFBA Q.P. P REG x 2'1 & D.P. ROUND
FFBC Q.P. P REG x 2° & D.P. ROUND
FFBE Q.P. P REG x 21 & D.P. ROUND
FFF8 Q.P. P REG x 230 & D.P. ROUND
FUNCTION EXECUTION TIME
FUNCTION EXECUTION TIME
S.P. MULTIPLY & ACCUMULATE 2.38 //s
D.P. MULTIPLY & ACCUMULATE 4,38 us
D.P. SCALE 2"14 & S.P. ROUND 3.31 ,is
D.P. SCALE 2° & S.P. ROUND 2M us
D.P. SCALE 214 & S.P. ROUND 1.56 //s
Q.P. SCALE 2"30 & D.P. ROUND 6.31 us
Q.P. SCALE 2° & D.P. ROUND H.HH &
Q.P. SCALE 230 & D.P. ROUND 2.56 MS
191
•SANDAC IV BENCHMARK EQUATION
All= B11C11 + B12C21 + B13C31 + K
NOTE: ALL TERMS ARE 32 BIT FIXED POINT,
CONFIGURATION ' EXECUTION TIME
MC68000 CPU 3 8 MHZ 235
(SUBROUTINE SOLUTION)
MC68000 CPU a 8 MHZ + HMP 31 ps
CHMP SOLUTION)
192
BENCHMARK EQUATION MACRO INSTRUCTION SOLUTION
B12C21 + B13C31
SOURCE CODE:
LQPP K /LOAD Q,P, CONSTANT
DPMA Bn, Cu /D,P, MULTIPLY & ADD
DPMA B12, C21 /D,P.'MULTIPLY & ADD
.v DPMA B13, C31 /D,P, MULTIPLY & ADD
DPSRM 6, An /QUAD P, SCALE, ROUND & MOVE
193
BENCHMARK EQUATION MACRO EXPANSION
+ B12C21 + B13C31 + K
ASSEMBLER EXPANSION:
MACRO
LQPP #0, K
DPMA Blp Cn
DPMA B^, C21
DPMA B13, C31
DPSRM 0, An
MC68000
MNEMONICS
MOVE.L 0, FE02
MOVE.L K, FE06
MOVE.L B-Q, FEOA
MOVE.L Cu, FE8E
MOVE.L Bio, FEOA
MOVE.L C21, FE8E
MOVE.L B13, FEOA
MOVE.L C31, FE8E
MOVE.L FFBC, An
COMMENT
/LOAD Q.P. CONSTANT
/D.P. MULTIPLY & ADD
/D.P. MULTIPLY & ADD
/D.P. MULTIPLY 8 ADD
/QUAD P. SCALE, ROUND & MOVE
SUMMARY
EFFECTIVELY EXPANDS HOST CPU INSTRUCTION SET
EASY INCORPORATION INTO ANY 16 BIT SYSTEM
HIGH PERFORMANCE DUE TO SIMULTANEOUS DATA & COMMAND
TRANSFER BY HOST CPU
SERIAL ARITHMETIC APPROACH REDUCES COMPONENT COUNT
EQUATION EXECUTION TIME PRIMARILY DEPENDENT ON CPU
MEMORY ACCESS TIME
STRAIGHT FORWARD SOFTWARE CONTROL
SINGLE jiP CPU PLUS HHP PROVIDES PERFORMANCE COMPARABLE
TO BIPOLAR BIT-SLICE DESIGNS
194
