An implementation and evaluation of a minimal-component minimal-power microcomputer system using Rockwell's AAMP by Nelson, Eric.
AN IMPLEMENTATION AND EVALUATION OF A
MINIMAL-COMPONENT MINIMAL-POWER MICROCOMPUTER
SYSTEM USING ROCKWELL'S AAMP
by
Eric Nelson
B. S. , Kansas State University, 1985
A. G. S., Colby Community Junior College, 1983
A MASTER'S THESIS
submitted in partial fulfillment of the
requirements for the degree
MASTER OF SCIENCE
Department of Electrical and Computer Engineering
KANSAS STATE UNIVERSITY
Manhattan, Kansas
1987
Approved by:
A11202 bSTMSfl
Acknowledgements
This work was sponsored and funded by the Base and
Installation Security Systems Program Office, Electronics
System Division of the Air Force Systems Command, Hanscom
Air Force Base, MA 01731 through the Systems Engineering
Division, Organization 5238, Sandia National Laboraties,
Kirtland Air Force Base, Albuquerque, New Mexico.
I would like to thank the professors who served on my
committee, Dr.S. Dyer, and Dr. J. G. Tnompson and
especially, Dr. D. H. Lenhert who provided regular
consultation on the project. I would also like to thank
Ken Albin who provided information and insight into the
workings of the AAMP.
Finally, thanks is extended to Gail Navinsky who did
my laundry during a busy semester and provided brownies to
my committee.
111
CONTENTS
Page
List of Figures vi
List of Tables vii
1. Introduction 1
2. Description of AAMP 3
2 .
1
Hardware 3
2.2 Packaging 4
2 .
3
Architecture 5
2.4 Instruction Set 9
3. Minimal Component AAMP Board 11
3.1 Objectives of Design 11
3.2 Memory 15
3.3 Address Decoding 16
3.4 I/O 18
3.5 Inverter/Oscillator 21
4. System Power Consumption 27
4.1 AAMP 29
4.2 Oscillator 29
4.3 Support 31
4.4 Wait State Operation 34
5. Software Evaluation 39
5.1 Verification of output 39
5.2 Speed optimization 40
iv
5.3 Accuracy of estimates 42
5.4 Timing Estimates 42
6. Conclusions 45
References 47
APPENDIX A - Pin Assignment for AAMP 48
APPENDIX B - Instruction Set Timing Estimates 52
APPENDIX C - Printed Circuit Board Layout 57
APPENDIX D - Software Listings 67
D.l Fractional 16-bit Precision
-Loop Coding 68
D.2 Fractional 16-bit Precision
-Inline Coding 78
D.3 Standard Precision Floating Point
-loop Coding 83
D.4 Standard Precision Floating Point
-Inline Coding 91
D.5 Extended Precision Floating Point
-Loop Coding 94
D.6 Extended Precision Floating Point
-Line Coding 103
LIST OF FIGURES
1. Addressing Using CENV and DENV Pointers 7
2. Block Diagram of Microcomputer System 13
3. Circuit Diagram of Microcomputer System 14
4. Memory Map 16
5. System Timing 17
6. Circuit for Software Confirmation 20
7. Transfer Function of Inverters 22
8. Phase Shift in Parallel Resonant Oscillator 23
9. Circuit Compensating for Gate Delay 24
10. Power Consumption of System 28
11. External Oscillator Circuit Used for System 30
12. Configuration for On-chip AAMP Oscillator 31
13. Power Consumption of Support Components 32
14. Effective Input Circuits of D Flip-flops 33
15. Wait State Circuitry Maintaining Two Alarm
Outputs 34
16. Wait State Circuitry Converting Alarm
Output to Controller 36
17. Power Consumption in High-impedance Mode 38
18. Half-Handshake Controller for GPIO Interface 40
A.l Pin Assignment for 68 pin PGA AAMP 4 8
C.l Assembly Drawing 58
C.2 Layer 1 -Component Side 59
C.3 Layer 2 -Ground Plane 60
VI
C.4 Layer 3 61
C.5 Layer 4 62
C.5 Layer 5 -Ground Plane 63
C.5 Layer 6 -Circuit Side 64
C.6 Keepout Areas For Power and Ground Planes 65
C.7 Silkscreened Text on Component Side 66
LIST OF TABLES
1. Executive Entry Table 6
2. Opcode to Stack Depth Mapping 10
3. Algorithms used for Software Evaluation 43
4. Timing Estimates For Digital Filtering
Sub-programs 44
A.l AAMP Pin Assignments by Functions 49
B.l Timing Estimates for Instruction Set 53
VII
INTRODUCTION
In the past several years, Kansas State University has
designed and developed several ultra-low power Analog to
Digital Converters. These converters typically consumed
far less power than the signal processing sections of
integrated systems and provided better resolution than 16-
bit single precision processors could maintain. The
Advanced Architecture Microprocessor (AAMP), Rockwell's
CMOS floating point processor provides solutions to both of
these problems. The following thesis describes an AAMP-
based microcomputer system with a measured power
consumption of under 25 milliwatts at the suggested
operating frequency of 2.6 MHz. Use of the AAMP also
supplies the system with a capacity for 32- and 48-bit
floating point arithmetic, the dynamic range and extended
precision of which should allow the processor to maintain
precision from the A/D through several stages of
multiplications without truncation errors.
Previous work at Kansas State University involving the
AAMP is presented in a thesis by K. L. Albin 1 documenting
the architecture and coding some typical signal processing
algorithms, and a thesis by G. S. Mauersberger detailing
a large-scale AAMP-based microcomputer design which he
built and tested. Mike Gaches also designed a board using
the AAMP which could be a direct replacement for 8086
boards presently in use. The system described in this
thesis was also originally designed by Mike Gaches with
several changes being made since that time.
This thesis begins with a brief description of the
AAMP detailing hardware and architecture features which
were important in the design and evaluation of this system.
Details of the design are discussed with separate sections
provided for each sub-system. Power consumption figures
for the system and most components of the system are shown
and discussed. And, finally, listing of several typical
signal processing code segments are presented with
estimated time of execution for each given.
2. Description of AAMP
The following section provides an overview of the
hardware and software features of the AAMP with emphasis on
those aspects utilized in the present design. Much more
thorough descriptions of these are given in the documents
provided by Rockwell 3 and in the previous theses dealing
with the AAMP. 1 2
2.1 Hardware
The AAMP is a 16-bit floating-point microprocessor
built in either 2-micron CMOS or 2-micron CMOS/SOS. Data
transfer is done through a 24-bit address, 16-bit data,
non-multiplexed bus with the AAMP supporting either
synchronous or asynchronous data transfer. When operated
in synchronous mode, as in this design, the AAMP provides
selection of bus timing parameters. By strapping the two
setup select pins (S Q , S->) high or low, the user can select
the total time required for a complete bus read cycle which
may allow the use of slower memory devices while still
maintaining processing speed.
The AAMP also has provisions which allow straight-
forward bus arbitration schemes to be used in large, shared
bus, multiprocessor systems. When deselected by OE being
held high, all bus outputs from the AAMP are tri-stated
allowing other users complete access to a system bus. The
Bus Request (BR) pin is, however, still active and may be
used to poll a master controller device.
Although the on-chip oscillator is rated for use from
4-20 MHz, use of an external oscillator can extend the
bandwidth DC to 30 MHz. The on-chip oscillator was,
however, found to operate at frequencies down to 1 MHz
although power consumption at the low end was relatively
high compared to the power consumption for an external
oscillator.
2.2 Packaging
The AAMP is presently being packaged in two different
forms, a chip carrier package, and pin grid array (PGA)
package. The most common package, and the one used in this
design is the 1.1" by 1.1" 68-pin PGA package. Early
versions of the AAMP had a different pin assignment than
the present version. Appendix A shows the pin assignment
for an early CMOS/SOS version, the Bulk CMOS version used
in the prototype, and for the new, top cavity device.
Current PGA versions also have an alignment pin at the C-3
location.
Bulk CMOS AAMPs are currently being produced by
Rockwell's Semiconductor Products Division (SPD) and by
American Microsystems Incorporated (AMI). For radiation
hardened applications, CMOS/SOS versions of the AAMP are
still being produced by Rockwell's Microelectronics
Research and Development Center (MRDC)
.
Future plans by Rockwell for changes in the AAMP
include a shrink to 1.2 microns which should allow a 50 MHz
operating frequency. Rockwell also plans to begin in Dec.
1986, designing a new AAMP with a 48-bit ALU , a 6-deep
stack cache, a block move instruction, and improved
microcoding of the multiply and divide operations.
2.3 Architecture
The stack architecture of the AAMP was designed to
ease translation from the intermediate level output of high
level language compilers to assembly language. The AAMP
utilizes a stack architecture in which all logic and
arithmetic operations are performed on the top members of
the stack with the result being returned back to the top of
the stack. To lessen the need for constant bus accesses to
obtain each operand of an operation and to return the
result, the AAMP maintains an on-chip cache of the top four
elements of the stack.
Programming can be run in either Executive or User
modes with Executive mode generally being used for
initialization and control of program transfers to the
various User programs. The first nine words of memory
contain entries, called the Executive Entry Table which set
the values for the various environment pointers and define
vectors for start-up, bus error, exception, trap, and
interrupt routines. Upon invoking a program in executive
mode, the first four of these entries are read while the
rest are read only as needed. A similar table, called the
User Processor State Descriptor (PSD) Table is used
similarly upon a switch to User mode. All software
evaluation for this system was done with programs written
in the Executive mode. Table 1 shows the items listed in
the Executive Entry Table.
Table 1. Executive Entry Table
$0000 Continuation Status Pointer
$0001 Initial Executive Stack limit
$0002 Initial Executive Top of Stack
$0003 Initial Executive Procedure Identifier
$0004 Bus Error Procedure Identifier
$0005 NMI Procedure Identifier
$0006 INT Procedure Identifier
$0007 Trap Procedure Identifier
$0008 Exception Procedure Identifier
To aid in the access of a 24-address bit bus with 16-
bit words, the AAMP provides environment pointers.
Effective addresses are formed by using these pointers as
the top eight or the top nine bits of the address and a
normal 16-bit word for the lower address bits. (Figure 1)
On data accesses, the Data Environment pointer (DENV)
provides the upper eight bits for all but the Universal
addressing mode in which all 24 bits are taken from the top
of the stack.
CO
CM
cr
U> IT)
a. a.
B BITS
DENV
IB BITS
ADDRESS UITHIN
ENVIRONMENT
Pi
<\l
<r
in *- •
E CE CE
9 BITS
CENV
IB BITS
PROGRfln COUNTER
____
SPECIFIES HIGH OR LOU
BYTE Of CODE WORD
Figure 1. Addressing Using DENV and CENV Pointers
Code addressing requires 25 bits as each 16-bit code
word often contains two separate single-byte opcodes. The
Code Environment pointer (CENV) provides the upper nine
bits of address. The 16-bit program counter provides the
lower bits, the lowest of which specifies high or low byte
within a 16-bit word of code. CENV and DENV are
automatically set to zero in the Executive mode.
Also maintained within the processor is a Local
Environment (LENV) pointer, a 16-bit word which in
combination with DENV, points to the top of the Local
Environment, a portion of memory at the bottom of the
active stack frame set aside for quick access. Address
calculation within the Local Environment is very efficient
requiring a minimum of bus accesses since only the offset
into the environment must be specified. When using Local
environment addressing, this offset is specified by the
lower nibble within the opcode itself. While using Local
Extended addressing, the offset is specified by a single
byte following the opcode.
Another powerful feature of the AAMP's architecture is
its dynamic memory allocation and parameter passage. Prior
to a CALL to a new procedure, the calling procedure may
copy into its stack a list of parameters to be passed to
the called routine. Upon invocation of the CALL
instruction, a user-specified number of arguments on the
top of the stack become the bottom values of the new
procedure's Local Environment and can be efficiently
accessed as local variables. Upon return to the original
procedure, a user-specified number of elements from the top
of the called procedure's stack are copied back to the
caller's stack. This makes parameter passage extremely
efficient and nearly automatic.
8
2.4 Instruction Set
The AAMP comes with a very powerful instruction set.
Included on-chip are 32- and 48-bit floating point add,
subtract, multiply, and divide operations which make the
AAMP a very capable single chip processor. These long
operations, however, have execution rates which depend upon
the data being processed, making exact timing prediction
almost impossible. Appendix B provides a listing,
generated by a program written by N. M. Mykris , showing
typical times of execution for the entire instruction set.
Variable length instructions are indicated with an equation
which shows dependence upon variables.
A major concern when estimating time of execution for
programs is that of "stack thrashing". Upon the loading of
an opcode into the micro-engine of the AAMP, a bit-mapping
is performed which determines how many items must be on the
stack in order for the impending operation to take place.
This bit-mapping is optimal in most cases. In a few,
unnecessary stack updating is performed during which data
is read from or written to memory and the internal cache is
rotated until the desired internal stack depth is achieved.
These operations are very time consuming and can
drastically slow down a non-optimized piece of code. By
carefully selecting an appropriate order of execution and
the proper opcodes, stack thrashing can, however, be
minimized. Table 2, borrowed from K. L. Albin1 , shows the
bit-mapping and required stack depth for each instruction.
Stack depth
Opcodes allowed
00-1F 0-3
20-3F 0-2
40-5F 1-4
60-7F 2-2
80-9F 4-4
AO-BF 3-4
CO-DF 2-4
EO-FF 2-4
Table 2 . Opcode to stack depth mapping
(from K. L. Albin 1 )
10
3 Minimal Component AAMP Board
This section describes an AAMP-based microcomputer
system designed and tested at Kansas State University for
Sandia National Laboratories. The system decribed in this
section is one used for power consumption measurements.
For software testing, the system was modified to allow the
output of data for verification of proper program
execution.
3.1 Objectives of Design
The proposed use for the minimal component AAMP board
is to provide the signal processing section of an ultra-low
powered, "shirt pocket" sized helicopter noise detection
system. The design was specified to minimize parts count
and most importantly, minimize power usage.
Since the frequencies of interest for the detection of
a helicopter are very low, 10 to 38 Hz 6 , sampling by the
A/D portion of the system can be done at a low rate.
Subsequently, data rates into the signal processing portion
are relatively low. This allows the system clock frequency
to be set low and aids in the minimization of power usage
by CMOS parts whose power consumption is almost directly
proportional to switching frequency.
The AAMP has a specified frequency range of from 4 to
20 MHz and is generally used as a high speed
11
microprocessor. Although the AAMP's speed is not fully
utilized in this design, the powerful instruction set and
its low power CMOS design make it the microprocessor of
choice.
Sampled data is provided to the system through two
16-bit buffered inputs. The only outputs from the system
are two flip-flops, one for each channel, to be set upon
the detection of a helicopter. A block diagram of the
system is shown in Figure 2 and a circuit diagram is shown
in Figure 3. The system upon which all testing has been
done was constructed using wire-wrap techniques.
Armando Corrales and Jim Heise have designed a printed
circuit board which places the entire system, less the
input buffers which may not be needed in a final system, on
a 3" by 4" board. The design was done in six layers and is
shown in Appendix C.
The original design and parts selection for the
minimal component board were performed by Mr. Gaches in
the Fall of 1985. Since the original design, changes have
been made in the chip select/address decode circuit and in
the approach taken to set the alarms.
The support components for the board can be classified
into four subsystems: address decoding, memory,
input/output, and clock. Each section, as assembled in the
test system, is discussed separately below.
12
i*
s!
UJ
ZZ
a:
x
o
ui
z
z
<ri
SdOU-dllJ
TT^
Figure 2. Block Diagram of Microcomputer System
13
YYYYYYYY
^^
°l
VVYYYYVY
u • - r- i-t •
g ^
HSOHVZ
88858888 J
1
a
J»b—
i
M*' vv
YYYYYYYY
_/*^
hsdhv^
O B D O O
B^
ITITIT TM
IVSOHVZ
• — n o • r tt ' O88888888},
l
s
n i
s
L|
\
Y Y Y Y Y Y Y
lit:
'[Wi *UUUh
IVSDHVZ.
I o o o o o o c S
1 m ' H g a ] B g H
JMl •h
1
2C0£2
tt'iit ||_U
STlTfi
? Boooooo»
2C3Z.2
ItXiiiliiilJj
minium^
E
S88SS8 88
9119
r Tf
AMi
I S o b b o a o
9119
m
liiiiiiii is 8
mmmr
? tl 8i a al cl ul s
z 5-.,
¥ 1 1 t 1 1 t t:;b
t t t t t t t:«8
L
"nhnnn?rtnfr
8CI3HVi
:A
/7777777777T/77
« il t! a! ti a ci al a i
\\\\
a i a t , t l a i l ij t l ij i | « | *
iiiiititti:l:;i::::£8iS& ississssss o c o c
MS = 3 ;3
y^yy
lligg Hiisljj
1 1 1 Tl 1 1
1
n
ZCDO CD
i/> en
>- — »«
m — *
-z. LO
12 f\i
£ n
3 fV >c '
(V u. ^O
1—
2
Ld2O Q
Q. a: !
n 31
io °
o QQ
_i a.
CL n£ CL
— CL
<£.
»—
i
c
Figure 3. Circuit Diagram of Microcomputer System
14
3.2 Memory
Program memory space for the system is provided by
two National Semiconductor NM27C32, 4096 X 8 bit UV
erasable CMOS EPROMS connected side-by-side to provide a 16
bit wide data access. Upon assertion of RST low, occuring
either on power-up or during a manual reset, the AAMP reads
the Executive Entry Table from the lowest ten words of
memory. Figure 4 shows the system memory map along with
the location of these RST pointers. The function of these
pointers is explained in the Architecture section and
typical values are used in the Software section.
System RAM space is provided by two Hitachi HM6116
ALP-20, 2048 X 8 bit, 200 nsec static CMOS RAMS also
connected side-by-side to provide a 16-bit data path.
Addressing for the RAM is also shown in Figure 4.
3 .
3
Address decoding
System address decoding and timing is performed by a
National Semiconductor MM74HC138, 3 to 8 line decoder. A
system timing diagram is shown in Figure 5. Chip enable
for specific components occurs upon the combination of
XRQ/XAK high, BG/BR low and a corresponding address. Under
this configuration, the AAMP has control of the bus at all
times, chosen by tying BG to BR, data transfer is
synchronous, chosen by connecting XAK to XRQ, and a
15
EXEC ENTRY TRBLE
$0080
$0FFF
$1000
$1FFF
$2000
$2FFF
$5000
$5FFF
$7000
$77FF
$7800
$7FFF
$0*00 CONTIN STATUS PTR
$0001 INITIAL STACK LIMIT
$0002 INITIAL EXEC TOS
$0003 INITIAL EXEC PROCID
$0004 BUS ERROR PROCID
$0005 NMI PROCID
$0008 INT PROCID
$0007 TRAP PROCID
\10008 EXCEPTION PROCID
PROCEDURE HEADER
CODE
STRCK
S0
SI
S2
<TC
ACTIVE
STACK
*
FRAME
PROGRAM COUNTER T
CENV STACK
PROCID MARK
LENV
_±_
LENV C©
LENV (1)
LENV©
X LENV 00
Figure 4 . Memory Map
16
Y0
BG/B
RDDRESS
XRQ/XRK
C5\ FOR RRH
CE,OE FOR EPROM
CK FOR RLRRHS
G1,G2 FOR BUFFERS
R/W
8T —
VRLID
TCX TCX—
H
-TPR TPR
R/P
X 7
en
a:
<rQ
DRTfl ON WRITE
VALID
DRTR ON READ
Tdv
•TCE/TOE "=*
VALItj)
!
Figure 5. System Timing
TOH
17
complete bus transaction requires 8 clock cycles, set by
strapping mode select pins S^ S 2 to ground. The most
restrictive of the system timing equations pertains to a
read from the EPROM. The equation is as follows:
"cxmax + Tprmax + Tcemax + Tdv <=: 4Tcyc
Where Tcyc is the period at the oscillator input Y Q .
Tcx is the time from a rising clock edge to the
assertion of XRQ.
T_r is propagation delay through decoder
tce is the time from chip enable until valid data is
available from memory.
T^v is the length of time that valid data must be
held prior to a data read by the AAMP
Substitution of values from the AAMP Reference
Manual 3 and from the National Semiconductors CMOS Databook
revealed that for a 450 nsec EPROM, maximum frequency is
6.78 MHz, while for a 350 nsec EPROM, maximum frequency is
8.16 MHz. All other system timing equations are less
restrictive. National Semiconductor has promised a 200
nsec version, due to be available in January 1987, which
speculatively could raise the maximum frequency to 11.8
MHz.
3.4 I/O
The two 16-bit input channels of the system are each
provided by two National Semiconductor MM74HC541 Octal tri-
18
state buffers. Addressing for these is also shown in the
memory map. For software testing, one of these input
channels was changed to an output channel so that proper
program execution could be confirmed. This was done by
replacing two of the input buffers with MM74HC573 octal
latches and inverting the existing address decoding to
latch valid data from the system data bus. (Figure 6)
The original design of the system specified the alarm
circuit to be two JK flip-flops in toggle mode using the
address decoder for activation. This design was discarded
since any noise in the system could have caused the flip-
flops to toggle and since the microprocessor would not then
know in which state the flip-flop had landed. Instead, two
D flip-flops, activated by the address decoder and with
inputs connected to the data bus were used. Under this
configuration, the state of the alarm is controlled with a
much greater confidence of outcome.
The type of D flip-flop was then chosen on the basis
of power consumption. The choices were, using National
Semiconductor parts, the MM74HC74, a 14-pin dual D flip-
flop with separate preset and clear, or the MM74HC174, a
16-pin hex-package D flip-flop with single clear and no
preset. On the basis of lower power consumption, the
MM74HC174, although in a larger package and having extra
components, was chosen. The reasons for this choice are
explained in the Power Consumption section.
19
YYYYYYYY
^^
g .V.-.
Y Y Y Y Y
tVSOHV^
5 o o o o o o o
—
^t r r~ "
iA 12
8L=Li
S»
* « «t ft «i o
o 5 o 5 5 o 5 S
•- O
li
6
As
A A A A A A
\\\\\14
D D O CO O O O O O _ I
o 5 5 5 5 5 d o o
iA
AAAAAAAA
>°° oo*o*c^oo"iO O O D O o"|j 2 1
1
Sr • - n n . ^ c,85S5S 5 5 6
V^ /V//
t+
UJJ
o a o o o o o
ZZ2LZ
5 c c c £ c ? g g c c
-I -i fll
!
—
oooooao o ° B
ccccccc a « c r
LS^
/////
5 ^ i. *j i*t J
> ooooaooo
9119
ccacccccc c 5" 5
J B 4
LLL ^5
? lii iJii il
g ss Q O O
9; 13
-
1
•»
-( i i\ q
numn-
cTTV
115
* * &
s § = =
¥ 1 1 1 1 1 1
1
O *• n <- Z
£ t t t fttt H
• = 2:8
ap—
I
—3p:
4
s i si :: »• -
P~^
:A
Vs
«1 cl » s! «l b. a ;l a ej a] a t ti i! t
\\\\\\\
))))))))))))
m oooooooooooooooo
dum
83—
r;3 §SS*¥ iSSgsggS I g
* 1
s
i
5
I
i A
J 5 1 •' I d l - I TTT
1 u i
*
]
!
I" j| A !
bcidh^
n o—W\—
"
4H"
-H"
Figure 6. Circuit for Software Confirmation
z CDO CD
CO CD
>
*-*
m l.
Z IP
o (\J
^ CI
c?*
>
O
Q. LJ-
o
1—
Z
UJ
ZO LJ
a. QL
e: <r
o o
o CQ
_J a.
cr TL
c CE
•—
.
(X
z
•—
<
n
20
3.5 Inverter/Oscillator
Since the AAMP's on-chip oscillator is rated for
operation from 4 to 20 MHz and since the system may be
working at frequencies below this, an external oscillator
circuit was used. The circuit used is a parallel resonant
gate oscillator which utilizes a CMOS inverter as a driver
and another inverter to "square up" the waveform. The
particular inverter chip to use is dependant upon the
system's frequency of operation.
A standard (54C/74C family) CMOS inverter will use
less power and make a more stable driver for an oscillator
than will a High Speed (HC) CMOS part. This can be
explained by examining the transfer characteristics of each
device. Standard CMOS parts have a smooth linear region
with constant although low gain whereas HC parts have very
high gain with the switching region nonlinear and suffering
from massive jitter. See Figure 7.
These non-linear portions of the waveform for the HC
parts inject higher frequencies into the loop causing the
oscillator to operate at an overtone of the crystal
frequency. The overtone problems seemed to vanish when
using crystal frequencies above 4 MHz making HC parts an
acceptable choice for higher frequency oscillators.
Unbuffered high-speed CMOS (HCU) parts which are
touted for use in gate oscillator circuits were also
21
f"'-
,"
I
*
I
i |
i
*-
.
.
.
,>
.... ,..,
R -.. .
-
-
•
j
Standard CMOS
x = 1 Volt/div y = 1 Volt/div
High-Speed CMOS
x = 1 Volt/div y - 0.1 Volt/div
Figure 7 Transfer Function of Inverters
22
tested. The transfer function for the HCU parts was smooth
and linear with little jitter and the waveform output from
an HCU oscillator showed a very square waveform with short
rise and fall times. Power consumption for the HCU parts
was, however, very high and further testing was not deemed
useful.
In order for an oscillator to operate in the parallel
resonant mode, there must exist 3 60° phase change through
the loop, 180° of which must exist in each half of the
circuit. See Figure 8.
180"—
0-
f UJJBHCX
<— 180"—
>
10 M
-AAAr
t>
X-TflL
__
1 -rC2
model circuit used
Figure 8. Phase Shift in Parallel Resonant Oscillator
Adapted from Holmbeck
In this system, the feedback loop consists of a
crystal connected in parallel with a large valued resistor.
The tie-down capacitors at each terminal of the crystal are
to match the effective load capacitance of the circuit
23
with that of the crystal while the resistor acts to force
the gate into its linear conducting region. At high
frequencies, the propagation delay through the inverter
causes the following phase shift:
6 = f * t
r
* 360°
For a 3 MHz oscillator using the worst-case
propagation delay of 90 nsecs for standard CMOS, this value
is 97.2°. To compensate for this inductive phase change,
(output current lags input voltage) , a capacitor may be
placed in the loop as shown in Figure 9.
10 M
A/W
_L X-TflL J_
1 -rC2
Figure 9. Circuit Compensating for Gate Delay.
A suggested value for C f is 1/Ceq
8
, where Ceq is the
input impedance viewed from the output of the gate into the
24
crystal feedback network. Adding this component to the
circuit indeed increased the useful bandwidth of the
system. However, power consumption was also increased.
Instead, the value of C 2 was raised and the value of C 1 was
decreased. This was done to increase Ceg and minimizing
the value needed for the now missing Cf , yet maintaining a
balanced load across the crystal.
Using this type of capacitive phase shifting, the
standard CMOS inverter oscillator was found to be very
reliable up to 3MHz. Above 3MHz, the slow rise time and
large propagation delay of the standard CMOS became so
dominant that the waveform no longer attained the AAMP's
required external oscillator input voltage swing of from
C.6 Vmax on the low cycle to 4.2Vm ^n on the high cycle.
The inverters are also used to produce a delayed reset
on power-up. Rockwell suggests that the reset should
remain low for around 1000 clock cycles prior to the
microprocessor being enabled which would dictate a reset
time of around 1 millisecond for the lowest clock speed
used. However, the power supplies used in the laboratory
require nearly 4 milliseconds to reach 5 volts when
warm and nearly 20 milliseconds to reach 5 volts when
started cold. Taking this into account and designing the
reset to be delayed by a sufficiently long time resulted in
such a gradually rising output from the RC network that
25
upon switching from high to low the output of a single-
buffered network chattered. The problem is solved by
triple-buffering the reset signal from the RC network to
the Reset input on the AAMP. This raises the total gain
and causes the switching to be more abrupt, disallowing
chattering at the output.
In conclusion, while operating at frequencies below
3MHz a National Semiconductor MM74C04 hex inverter should
be used and while operating above 3MHz an MM74HC04 hex
inverter should be used. These chips are pin-for-pin
compatible and can easily be interchanged.
26
4 System Power Consumption
Power consumption testing for the system was
performed by breaking into the power bus and connecting a
Fluke 8010A digital multimeter in series with the supply
voltage to the component under test and measuring the
current. To obtain an instruction mix which should be
typical for the proposed application, the Standard Widrow
Adaptive Linear Predictor algorithm, originally coded for
the AAMP by K. L. Albin 1 , was used.
Data input to the system was provided by simply tying
the inputs to ground. This prevented peripheral devices
from affecting consumption readings but also allowed many
flip-flops to remain unswitched and possibly caused a lower
power reading than real data input should cause. For this
reason and since power consumption for a particular
component varies from chip to chip, the figures presented
here should be used for comparisons and not as a guaranteed
rating.
This section discusses the power usage of the system.
Parts selections made on the basis of power consumption are
discussed in more detail here and a type of oscillator is
suggested for each portion of the system's frequency range.
Figure 10 shows the power supply current versus
operating frequency for the total system, the AAMP and if
used, the external oscillator circuit. All system
27
componets run on a five volt supply. Conversion from
current to power consumption required a multiplication by
5. A straight line has been fit to the data set indicating
the trend for the data in the reliable regions of each
configuration. Although worst case timing showed that the
system should only be expected to operate up to 8.16 MHz,
testing of power consumption was continued up to 10 MHz.
Bulk CMOS RRMP
c
o
E
c
o
u
L
D
3
o
Q.
G5-i
60
55
50
45
40
354
30
254
20
15
10
5 J
c
L
3U
Q.
Q.
3
13-
12-
1 1
12-
9
B
7
6
5
4
3
2
Total
RRMP
Osc
CMOS 0.0 1 I iiur HC Oiolllltor
1
t -$ $ tF
Operating Frequency (MHz)
Figure 10. Power Consumption of System
Straight-line approximations for the external
oscillators show a good fit. Consumption for the system
using the on-chip oscillator however, showed a slight
exponential increase but is still reasonably approximated
with a straight-line fit.
28
4.1 AAMP
The AAMP itself shows a y-intercept value
uncharacteristic of CMOS parts. The varying y-intercept
values for the three oscillator configurations can be
related to the amount of gain required by the AAMP's clock
input gate. While using the AAMP's on-chip oscillator, the
buffer at the Y Q pin is required to act like a linear
device. In this mode, rise times are long and the gate
spends a large amount of time between the on and off
states. Nearly all power consumption by CMOS gates can be
attributed to the amount of time spent in this state. The
sharp square wave output of the high-speed CMOS oscillator,
on the other hand, causes the y-intercept of the AAMP to be
lowest in this configuration. Standard CMOS waveforms have
slower rise times and cause a slightly higher y-intercept.
4.2 Oscillator
As seen in the plot, the Standard CMOS oscillator was
only useful below 3 MHz but in this region showed the
lowest power usage. The oscillator circuit is shown in
Figure 11.
Trimmer capacitors were used in the test system. The
values were adjusted to minimize power consumption and
maximize voltage swing of the output. These two conditions
generally occurred simultaneously with the value of C-^
often around three forths the value of C2 and their series
29
combination being approximately equal to the crystal's
specified load capacitance.
RAMP
Figure 11. External Oscillator Circuit Used for System
The circuit of Figure 11 was also used for the HC
oscillator resulting in about the same capacitor ratio but
with the series combination now being equal to around twice
the rated load capacitance. As seen in the plot, the HC
external oscillator should be used in the frequency range
from 4 to 6 MHz.
In the range of frequencies from 3 to 4 MHz and again
above 6 MHz, the AAMP's on-chip oscillator should be used.
Although its operation is reliable throughout the tested
range of 1 to 10 MHz, power consumption is generally higher
than that of the others outside this suggested range. The
on-chip oscillator of the AAMP uses the same equivalent
circuitry as the external oscillator in Figure 11. The
suggested circuit for using the on-chip oscillator requires
no load capacitors. 3 Power consumption can however be
30
lessened by attaching a small load capacitor with a value
near the specified load capacitance of the crystal to the
Y± pin. (Figure 12)
RflMP
c =b
X-TflL
10 n
Figure 12. Configuration for On-chip AAMP Oscillator
4 . 3 Support Components
Power consumption for support components is shown in
Figure 13. Values for power usage by support components
are well fit by a straight line with no noticeable y-
intercept. During power measurements, data was static
therefore creating an artificially low amount of gate
switching in the RAM. The slope for the RAM will , of
course, change with the amount of gate switching on writes.
The type of flip-flop used for the alarm was chosen
on the basis of power consumption. The configuration used
calls for the flip-flop to load the values from the data
bus upon a write from the microprocessor with the clock
31
input to the flip-flop held high except in the rare case of
the detection of an intruder.
Support Parts
4.5
x 4
E
~ 3.5
c
*
3
a.
§2.5
W
u
i. 1.5
m
3
o 1
0.
a
a
3
m
. IB ms
tnt- .n^a
1 2 3 4 5 G
Operating Frequency (MHz)
Figure 13. Power Consumption of Support Components
Under these conditions, and although no output
switching was being performed, the MM74HC74 dual D Flip-
Flop used around five times as much power as the MM74HC174
hex D Flip-Flop. A call to Larry Wakeman, Applications
Engineer for National Semiconductors, confirmed that the
MM74HC74 dual package was a poorly designed CMOS part,
with flaws that make it non-ideal for ultra-low power
systems. These problems are shown in Figure 14.
The dual package routes the input through two buffers
and directly into two logic gates whereas the hex package
runs the input through a single inverter then uses a
32
transmission gate to virtually isolate the single-buffered
input from the rest of the circuit. The more densely
packed hex inverter also requires smaller internal
component size than the less space-restrictive design of
the dual package. Figure 14 shows the effective circuitry
of the two packages.
-£>—
'
STFHE HCLCING
LOGJC
^
^>
STRTC HOLDING
LOGIC
MM74HC174 MM74HC74
Figure 14. Effective Input Circuits of D Flip-Flops
Although power consumption for either component, while
not being clocked, but under the highest bus rate possible
is under one milliwatt, the MM74HC174 hex flip-flop was
chosen for the design. The extra flip-flops may also be
used to control a proposed wait state mode.
4.4 Wait State Operation
The following section describes a technique where,
through hardware, the need for exact timing of algorithms
33
may be eliminated. The method would allow the AAMP to
perform all necessary calculations for a sampling interval,
then go to a high-impedance, lessened-power state until
awakened by a signal from a controller upon the next
completed sample conversion. The added hardware is shown
in Figure 15. The flip-flop controller was taken from the
spares of the MM74HC174 Hex package. Therefore, no active
components are added to the design, with circuit additions
being only resistors, diodes and added traces.
STSTOI Offffl
•5V
-R*l
ADDRESS
DECODER
* i
=G^a
->
CH I RLRRfl
Vj£^
->
->
CH II ALARH
i
>
TO OE ON RAMP
<
FROM EXTERNAL TIMER
Figure 15. Wait State Circuitry Maintaining Two Alarm
Outputs
During the high impedance state, the controller flip-
flop will hold the OE pin high until a new clock signal is
34
received from an external timer at which time the
controller flip-flop will be cleared. Timer and decoder
clocking of the flip-flops are wire ANDed to allow either
to pull the clock low. During clocking of the flip-flops
by the timer, the present states of the alarms are
maintained by the identity feedback resistors while the
input to the controller is pulled low by a high valued
pull-down resistor.
Programming to reach the high impedance state would
simply require that a hexadecimal four be ORed with the
alarm state and written to the alarms as the final
instruction of the routine. Upon being awakened from the
wait mode, a dummy read should be performed and program
execution may then be restarted.
The present version of the system has two alarm
outputs. If it were possible to run the system with only
one alarm for output, a more straight-forward wait
controller could be built. Figure 16 shows the simplified
controller which is made possible by using the MM74HC74
which has individual asynchronous clears. Software using
this version would require a write to the alarms upon
completing the processing of a set of samples and a dummy
read upon being awakened. This method, although using the
less efficient MM74HC74, would probably require less power
since pull-down and identity feedback resistors would no
35
longer be needed.
SYSTEM DRTfl BUS
Q*5V
R
XAK/XRQ
RDDRESS
DECODER
>
RLRRH OUT
>
TO OE ON RflnP
<FROM EXTERNRL
TIMER
Figure 16. Wait State Circuitry Converting Alarm Output
to Controller
The operation is the same when using either type of
controller. With the OE pin held high, the AAMP will try
to assert a transfer request (XRQ) signal which would be
read at the transfer acknowledge (XAK) pin and would allow
program execution to continue. Instead XRQ is tri-stated
and program execution is suspended. In this mode of
suspension, all output pins except Bus Request (BR) are
tri-stated and only the oscillator portion of the processor
is switching. The AAMP, however, buffers the clock signal
36
several times before using it throughout the processor and
therefore power consumption does remain considerable during
this high impedance state.
Power consumption for the total system in the high
impedance state is shown in Figure 17. Dashed lines
indicate total power for normal operation, while solid
lines show a best-fit line for wait mode power consumption.
While using the AAMP oscillator, power consumption stays a
near-constant eight milliamps with even a slight decrease
in power consumption as the frequency of operation
approaches the frequency for which the oscillator was
primarily designed. When using the on-chip oscillator, the
break-even point for using the high impedance wait mode as
compared to continuous operation occurs around 3.5 MHz
while when using either of the external oscillators a
slight decrease is seen at all frequencies.
Effective power consumption for the system while using
wait state mode can be calculated by adding each phase
multiplied by its duty cycle. Programming time overhead is
not large, requiring only an additional 15.5 clock cycles
for the masking of the data bit and a non bus-access
function during wake-up.
This method allows accurate timing of sampling rate to
be performed by an external device which should be possible
if, like previous low-power A/D's built at KSU, the A/D
37
chosen for the system utilizes a microcontroller with fixed
times for instruction execution.
2
E
Q.
e
VI
C
o
u 25
B5
G0
55
50-
45-
40
35
30
L
V
3
o
0.
20-
15
10
5 J
a
a
3
in
13-
12-
1 1-
12-
9 -
B
7
G
5
4
3
2
1
Bulk CMOS RRMP
CMOS 0»ei 1 Imter
1 2 j * 5" t $ T0"
Operating Frequency (MHz)
Figure 17. Power Consumption of High Impedance Mode
-dashed lines represent normal operation
38
Software Evaluation
In order to properly evaluate the performance of the
AAMP as a signal processor, various digital filtering sub-
programs were coded. The time of execution figures listed
here should provide rough estimates of processing speed and
should provide a useful basis for comparing the AAMP to
other signal processors.
5.1 Verification of output
Several of the algorithms have been tested on the
proto-type board and the actual time of execution is shown
for those. Program segments from which timing figures were
taken are shown in Appendix D. Testing was performed with
the system operating at 1 MHz which was chosen to allow
easy conversion to the number of cycles and to allow the
test equipment, an HP 9845B with GPIO, to keep up with data
transfers while debugging and testing of program segments.
Time of execution for the segments tested on the
proto-type system was measured using an HP 1611A Logic
State Analyzer. Output from the filter was monitored for
proper execution using an HP 9845B computer with data
transfer exchanged through a half-handshake board-to-GPIO
controller. (Figure 18)
39
1/0 o
PCI I L>
Figure 18. Half-handshake Controller for GPIO Interface
HP's GPIO provides a programmable control register
which allows the user to determine mode of transfer and
type of handshaking. This register is, however, programmed
by installing the appropriate wire-wrap jumpers and cannot
be changed during program execution. Therefore, a
controller, which allows the AAMP system to run at full
speed requiring only the host computer to poll for data
transfer was used.
5.2 Speed optimization
Appendix D shows listings of the programs with time of
execution for each block of code identified. Coding of the
algorithms was done in the most straight-forward approach
possible with speed optimization done only on a near-
sighted basis.
In loop coding, speed optimization for storage and
retrieval of counters and intermediate results was
40
accomplished by using the Local Environment storage area,
which contains the 16 quickest access locations. To speed
line coding, arrays of variables were placed in the Local
Environment Extended. Segments were made interchangeable
by maintaining these memory assignments in both forms of
coding.
Addressing of array elements while loop coding is
accomplished by calculating the base address of the LENV,
adding a constant offset component to specify which array
was being accessed, and adding I to find the I th element of
the array. The AAMP has quick and efficient commands for
these three actions.
A method to increase speed of execution which was
tried but with little success was that of leaving a copy of
the count variable on the stack at the end of each
iteration. Using this method, calculated times for
execution were much lower than for the final versions, but
observed times were much greater.
This difference from calculated to observed may be
attributed to stack thrashing. The prime instruction for
creating copies of the count variable is the DUP command
which, as indicated by Table 2 of the Architecture section,
is a non-optimal stack depth command. Using DUP causes the
processor to read or write from its internal cache to RAM,
each time transferring members within the internal cache
41
itself, until exactly two elements are left in the internal
stack. This stack thrashing caused execution times to
greatly increase. Therefore, the use of DUP should be
avoided except when the stack depth is already two or when
the time to calculate the data on the stack was more than
that encountered during a stack penalty. (approx. 24
cycles for a one-item stack update).
5.3 Accuracy of estimates
Timing estimates, especially for loop coding, were not
extremely accurate, generally falling only within +20% of
the observed. Therefore, the timing figures listed should
only be used as a rough comparison with other processors.
Precise prediction of timing is not possible since some
parameters are data-dependant while others are dependant
upon the recent history of the processor. The large
descrepeancy between calculated and observed times for loop
coding can be justified by considering that any error in
the prediction of timing within the loop is multiplied by
the number of times through the loop. Timing estimates for
inline coding were consistent with the observed.
5.4 Timing Estimates
Table 3 shows time of execution for the various
digital filtering sub-programs which have been coded. Each
subprogram was coded in three data formats: Single
Precision Fractional, Single Precision Floating ( 24-bit
42
mantissa, 8-bit exponent), and in Double Precision Floating
(40-bit mantissa, 8-bit exponent). Where applicable,
segments were written using both loop- and linear-coding.
Overhead time for loop set-up is included in the total time
for 16 iterations while time-per-N figures exclude this
figure.
Table 3 shows the algorithms used for the software
evaluation. In the fixed-point fractional data format of
programming, alignment of data is performed in assembly
language whereas in both floating point implementations,
data is automatically aligned in the much more efficient
microcoding. This explains why the fixed-point ratio
calculation required more time than the floating-point
versions. In other sections of code, the time of execution
increases with precision.
Table 3. Algorithms used for Software Evaluation
N
FIR Filter Y = a(i) * x(i)
i=0
Filter Update
x(i + 1) = x(i) i= to N-l
Ratio Calculation
r = x
2/y2
43
Decision
1 r> THETA
d =
r < THETA
Weight Update
a(i) = a(i) + da * y(i) i = to n
IIR Filter
v(m) = (1-BETA) * v(mil) + BETA * x2
Table 4 Time of Execution of Several Digital Filters
Operation Single Prec Single Prec Double Prec
Fractional Floating Floating
loop line loop line loop line
FIR Filter
time/N 269.5 137
16-coeff 4344 2213
Filter Update
time/N 143.5 33
16-coeff 2163.5 495
635.5 493 1104 936
10209.5 7918.5 17720.5 15016
153.5 43 193 107.5
2473 643 2911 1612.5
Ratio Calculation
Decision
1236 996 2048
49 103 195
Weight Update
time/N 297 169
16-coeff 4768.5 2704
IIR Filter
1 = 2 378
658.5 530,,5 1146.5 1043
10552.5 8488 18360.5 16696
___ — 1132..5 — — — 2255
44
CONCLUSIONS
Although the AAMP was designed largely for use in
high-speed, large imbedded-processing systems, the AAMP
makes a very efficient microprocessor for minimal
component, minimal power systems. Its capacity for
floating point arithmetic and the inherent low power
consumption of CMOS make the AAMP a very attractive
processor for applications in remote intruder detection
systems. Bus protocol using the synchronous mode requires
a minimum number of glue chips (one) while bus transaction
time may be adjusted to allow the use of slow memories
while maintaining a high ALU speed.
Once the change-over from register-oriented assembly
language programming to stack-oriented programming is made,
code can actually be written quicker using the AAMP's high
level commands. Local variables, although slower than true
registers, may be used for quick access and temporary
storage. The various built-in data types of the AAMP make
coding for various precisions of data simple and closely
parallel.
For this thesis, a system intended to provide the
digital signal processing portion of an ultra-low power
intruder detection system was built and tested. Very few
problems were encountered while using the AAMP and, as a
45
rule, items stated in the reference manual were found to be
true. However, a problem was found in the intermediate
Bulk CMOS packages as pull-up resistors were not placed on
some test pins with no mention of this made in the AAMP
manual . Pull-ups were added to the circuit solving the
problem and Rockwell has since changed packaging and
included the pull-ups in the new design.
Excellent support for the project was provided by
Rockwell through constant contact with K. L. Albin. Much
of the information given for packaging and availability of
the AAMP was gathered during a Sandia-sponsored trip to
Cedar Rapids in October, 1986. Several meetings were
arranged where, Dave Best and several other Rockwell
personnel gave presentations on the AAMP.
With the type of support for the AAMP which was shown
and with the outstanding characteristics of the AAMP
itself, the AAMP should definitely be considered when
designing any "modern" system.
46
REFERENCES
1Kenneth L. Albin, "An Evaluation of Rockwell's
Advanced Architecture Microprocessor for Digital Signal
Processing Applications," (Master's Thesis, Kansas State
University, 1984).
2 Gary S. Mauersberger , "The Design and Hardware
Evaluation of an Advanced 16-bit, Low-Power, High
performance Microcomputer System for Digital Signal
Processing", (Master's Thesis, Kansas State University,
1985)
.
3Advanced Architecture M icroprocessor Reference
Manual, (Avionics Group, Rockwell International
Corporation, Cedar Rapids Iowa, 1985).
4 Statement by Ken Albin, Technical Staff Member,
Avionics Group, telephone interview, Rockwell International,
Cedar Rapids IA 52498, Nov 21, 1986.
N. M. Mykris, Avionics Group, a program to calculate
Bulk CMOS AAMP Instruction Execution Times, Rockwell
Internation Corporation Cedar Rapids IA)
.
Nasir Ahmid, T. Natarajan, Discrete-Tim e Signals and
System s, (Reston Va: Reston Publishing, 1983).
7 _ _ ...
J. D. Holmbeck, "Frequency Tolerance Limitations
With Logic Gate Clock Oscillators," (Proceedings of the
thirty-first annual Frequency Control Symposium, Fort
Manmouth, NJ, 1977, pp. 390-95).
8 Thomas B. Mills, Application Note 340, Logic
Databook Volume 1, National Semiconductor Corporation, pp.
2-138, 1984.
47
APPENDIX A
The latest version of the AAMP has a different pin-
assignent than the previous versions tested. The following
is a table of the pin-assignments for the three versions of
the AAMP seen at Kansas State. The original CMOS/SOS
version was used by G. S. Mauersberger2 and by Mike Gaches
in earlier designs. The implementation described in this
thesis uses a Bulk CMOS AAMP from a transition stage in
packaging. The AAMP is now packaged as a top-cavity (TC)
device and the pin-out has again changed. The printed
circuit board design in Appendix C uses this new design.
Present and future versions of the AAMP will follow the
newest assignment. Figure A-l, taken from the AAMP
reference manual 3 shows the pin assignment and physical
dimensions for the 68 pin pin-grid-array package.
(BOTTOM VIEW)
L ©©© ©©©© ®®
K ®@®®®®®®®@®
J © ® ©@
H ®® @®
G ®® ®®
(77m . Ml) f ®® ®®
E ® © © ©
®@ ®®
C ® ®® ®@
100 "I
- *® ®®@®©©®®@®
,
^-S-A^-®®®®®®®®®
Oie . 003
< 457 . 7 5)
1~
,~
(T77i~_b
170 • 010—r
.019
H!3"i
1234 56789 10 11
Figure A-l Pin assignment for 68 pin PGA AAMP
(from AAMP Reference Manual 3 )
48
Table A-l AAMP Pin Assigments by Functions
SIGNAL CMOS/SOS BULK CMOS TC BULK CMOS
Supply
VDD Bll
GND
Address
A23
A22
A21
A20
A19
A18
A17
A16
A15
A14
A13
A12
All
A10
A09
A08
A07
A06
A05
A04
A03
A02
A01
A00
Data
L6
K2
L3
K3
L4
K4
L5
K5
K6
Bl
B2
CI
C2
Dl
D2
El
E2
F2
Gl
G2
HI
H2
Jl
J2
Kl
Bll
Fl
L6
L2
A5
K2
L3
K3
L4
K4
L5
K5
K6
Bl
B2
CI
C2
Dl
D2
El
E2
F2
Gl
G2
HI
H2
Jl
J2
Kl
B2
Bl
Dl
K6
BIO
A10
F2
Fl
G10
A2
Kl
Jl
J2
HI
H2
Gl
G2
El
L10
L9
K9
L8
K8
L7
K7
L6
L5
K5
L4
K4
L3
K3
L2
K2
D15
D14
D13
A2
B3
A3
A2
B3
A3
K10
Kll
J10
49
D12 B4 B4 Jll
Dll A4 A4 H10
DIO B5 B5 Hll
D09 B6 B6 Gl
D08 A6 A6 FIO
D07 B7 B7 Fll
D06 A7 A7 ElO
D05 B8 B8 Ell
D04 A8 A8 DIO
D03 B9 B9 Dll
D02 A9 A9 CIO
DOl BIO BIO Cll
DOO AlO AlO Bll
Clock
YO
Yl
CLK
DIO
Cll
CIO
DIO
Cll
CIO
B8
A9
B9
Interrupt
SIGNAL CMOS/SOS BULK CMOS TC BULK
IRQ
NMI
RST
L8
K8
L9
L8
K8
L9
D2
CI
C2
Control
BR Jll Jll B3
BG Ell Ell A7
XRQ KIO KIO A3
XAK FIO FIO B6
XER Fll Fll A6
R/W J10 JIO A4
50 Dll Dll A8
51 ElO ElO B7
OE H10 B4
Monitor
E/U G10 G10 B5
C/D Gil Gil A5
Test
SOUT L7 L7
L/S K7 K7
SCLK K9 K9
SIN LIO
HLD Kll
50
HB Hll
LB H10
No connects
NC Fl
L2
L10
A5
Alignment
NC ___
Hll E2
C3
51
APPENDIX B
Instruction Set Timing Estimates
The following list, adapted from a listing created by
a computer program written by N.M. Mykris 5 , shows the
calculated time of execution for the AAMP's entire
instruction set. Variable-length instructions are
indicated by an equation showing how to calculate time of
execution. For these variable-length instructions, the
number of alignments (A) and normalizations (N) is
dependant upon the data while the number of shifts (S) and
the length (L) of the segment shifted is determined by the
programmer. Typical values for these four variables are
indicated with each listing. Time of execution for LOCNL
is dependant upon the nesting level of called subroutines
from the main routine. Time of execution for RETURN
depends upon the number of arguments returned from the
routine.
The left-hand column of the listing shows the
percentage of each command used to attain a standard Gibson
benchmark. Throughput per MHz for the specified mix is
shown at the end of the list.
52
Table B.l Timing Estimates for Instruction Set
Mix =
Mix =
ABS
ABSD
ABSF
ABSFE
6.10%, ADD
ADDD
6.90%, ADDF
Mix = 1.60%,
"50",
"DC",
"DE",
"AE",
"E4",
"80",
Time
Time
Time
Time
Time
Time
13.0 eye
15.0 eye
5 .
5
eye
9 . eye
9 . eye
13.0 eye
"84", Time =153.0 eye A=6, N=l
Time = 125 + 4* (A + N) eye
ADDFE "92", Time =229.0 eye A=6, N=l
Time = 173 + 8* (A + N) eye
AND "E8", Time = 5.5 eye
ARS "B8", Time =49.0 eye S = 8
Time = 17 + 4 * S eye
ASNBX
ASND
ASNDC
ASNDI
ASNDL
ASNDLE
ASNDU
ASNDX
ASNDXI
ASNF
Time
ASNS
ASNSC
ASNS I
ASNSL
ASNSLE
ASNSU
ASNSX
ASNSXI
ASNT
ASNTC
ASNTI
ASNTLE
ASNTU
ASNTX
ASNTXI
CALL
CALLI
CALLP
CALLPI
"A4"
"A8"
"A9"
"F6"
•i C m
"8B"
»8C"
"AA"
"9E"
= 108
"D3"
»D4"
ii 5 4 ii
ii 411
"5C"
"A7"
"A6"
"D5"
"98"
i' 9 9 ••
"B6"
"B5"
"9A"
"9B"
"9C"
"5D"
"23"
"5E"
«ipii
Time = 43.0 eye
Time =14.0 eye
Time = 19.5 eye
Time = 25.0 eye
Time =14.0 eye
Time =19.5 eye
Time = 26.0 eye
Time = 22.0 eye
Time = 25.0 eye
Time =108.5 eye
5 + 4*(S - L) eye
Time
Time
Time
Time
Time
Time
Time
Time
Time
Time
Time
Time
Time
Time
Time
Time
Time
Time
Time
10.0
15.5
21
10
15
22
14
21
18
23
29
23
44
44
33
68
75
81
88
L=8, S=8
eye
eye
eye
eye
eye
eye
eye
eye
eye
eye
eye
eye
eye
eye
eye
eye
eye
eye
eye
53
CALLU "64", Time =68.5 eye
CVTBIT "F8" , Time =13.0 eye
CVTDF "D9", Time =137.0 eye N = 15
Time = 77 + 4 * N eye
CVTDFE "6C", Time =225.0 eye N = 15
Time = 105 *= 4 * N eye
CVTDS "DA", Time =13.5 eye
CVTFD "DB" , Time =113.5 eye A = 15
Time = 53.5 + 4 * A eye
CVTFED "AF", Time =113.5 eye A
Time = 53.5 + 4 * A eye
= 15
CVTFEF "B4", Time = 49.0 eye
CVTFFE "6D", Time = 13.0 eye
CVTSD "65", Time = 9.0 eye
DECS "7F", Time = 20.0 eye
DECSI "7E", Time = 31.0 eye
DECSLE "7D", Time = 25.5 eye
Mix = 0.20%, DIV "FA", Time =109.0 eye
DIVD "97", Time =313.0 eye
Mix = 1.50%, DIVF "87", Time =313.0 eye
Time = 309 + 4 * N eye
DIVFE "95", Time 706.0 eye
Time = 698 + 8 * N eye
DIVI "E7", Time =109.0 eye
DIVID "83", Time =317.0 eye
DO "8F", Time = 45.0 eye
DUP "6A", Time = 5.5 eye
DUPD "6B", Time = 9.0 eye
DUPT "79", Time = 49.5 eye
ENDO "9F", Time = 41.0 eye
EQ "EB", Time = 13.0 eye
EQD "88", Time = 15.0 eye
EQT "90", Time = 29.5 eye
Mix = 5.30%, EXCH "ED"
,
Time = 13.0 eye
EXCHD "8D", Time = 25.0 eye
EXCHT "9D", Time = 47.0 eye
Mix = 3.80%, GR "EC", Time = 13.0 eye
GRD "89", Time = 17.0 eye
GRF "8A", Time = 49.0 eye
GRFE "91", Time = 53.0 eye
HIGHER "F5", Time = 13.0 eye
INCS "7C", Time = 20.0 eye
INCSI "7B", Time = 31.0 eye
INCSLE "7A", Time = 25.5 eye
N = 1
N = 1
54
INSERT
Tir e
INTE
LIT16
LIT24
LIT32
LIT4A
LIT4B
LIT48
LIT8
LIT8N
LITDO
LOCL
LOCNL
Time = 2
LOCU
LOCX
Mix = 0.60%, MPY
MPYD
Mix = 3.80%, MPYF
8E",
93
IB"
1A"
24"
25"
1"
2"
26"
18"
19"
27"
53"
D2"
4.5
Time =93.0 eye L=8,S=8
+ 4*(S - L) eye
Time = 5.5 eye
Time = 16.5 eye
Time = 22.0 eye
Time =27.5 eye
Time = 5.5 eye
Time = 5.5 eye
Time =68.0 eye
Time = 11.0 eye
Time = 11.0 eye
Time = 9.0 eye
Time = 5.5 eye
Time =48.5 eye 1 stack
24* (Number of stacks) eye
Time =
Time =
Time =93.0 eye
Time =301.0 eye
Time =293.0 eye N = 1
9 . eye
5.5 eye
"66",
"FF",
"F9",
"96",
"86",
Time = 289 + 4 * N eye
MPYFE "94", Time =539.0 eye N
Time = 531 + 8 * N eye
= 1
Mix =
MPY I "E6",
r
Time = 93.0 eye
MPYID "82", Time =301.0 eye
NEG "51" Time = 9.0 eye
NEGD "DD"
,
Time = 13.0 eye
NEGF "DF" Time = 13.0 eye
NEGFE "AD" Time = 17.0 eye
NOP "20"
,
Time = 5.5 eye
NOT "F4" f Time = 5.5 eye
OR "E9", , Time = 5.5 eye
POP "52" Time = 5.5 eye
POPD "B7" r Time = 9.0 eye
REFBX "Dl", Time = 42.5 eye
REFD "67", Time = 18.0 eye
REFDC "68" Time = 23.5 eye
REFDI "21", Time = 29.0 eye
REFDL it 3 ii Time = 18.0 eye
REFDLE "22", Time = 23.5 eye
REFDU "D6", Time = 29.0 eye
REFDX "D7", Time = 26.0 eye
REFDXI "69", Time = 29.0 eye
REFS "55", Time = 12.0 eye
REFSC "56", Time = 17.5 eye
REFS I "1C", Time = 23.0 eye
31.20%, REFSL " 0", Time = 12.0 eye
55
Mix =
REFSLE "IE", Time = 17.5 eye
REFSU "D8"
,
Time m 23.0 eye
.00%, REFSX "DO", Time = 16.0 eye
REFSXI "57", Time as 23.0 eye
REFT "75", Time = 57.5 eye
REFTC "76", Time = 59.0 eye
REFTI "74", Time = 93.5 eye
REFTLE "77", Time = 84.0 eye
REFTU "78", Time = 36.0 eye
REFTX "6F", Time = 40.5 eye
REFTXI "6E", Time SB 79.5 eye
RETURN "5F", Time = 87.0 eye 1 Arg
Time = 63 + ;24* (Number of Args) eye
Mix = 4.40%, SHIFT "FB", Time =53.0 eye S = 8
Time = 21 + 4 * S eye
SHIFTL "FD", Time =49.0 eye
Time = 17 + 4 * S eye
S = 8
Mix = 16.60%,
SHIFTR "FC", Time =49.0 eye S = 8
Time = 17 + 4 * s eye
SKIP "59", Time =10.0 eye
SKIPI "ID", Time =12.0 eye
SKIPNZ "EF", Time =16.0 eye
SKIPNZI "5B", Time =17.0 eye
SKIPZ "EE", Time =16.0 eye
SKIPZI "5A", Time =17.0 eye
SUB "E5", Time = 9.0 eye
SUBD "81", Time =13.0 eye
SUBF "85", Time =161.0 eye A=6, N=l
Time = 133 + 4* (A + N) eye
SUBFE "93", Time =237.0 eye A=6, N=l
Time = 181 + 8* (A + N) eye
SWAPSU "AB", Time =28.0 eye
XOR "EA", Time = 5.5 eye
XTRACT "AC", Time =85.0 eye L=8 , S=8
Time = 85 + 4*(S - L) eye
Instruction throughput based on the given percentages:
25.254 KOPS
56
Appendix C
The printed circuit layout shown in this section was
created by Armando Corrales and Jim Heise on an HP9845C
computer using the Engineering Graphics System (EGS).
Specifications for board connections were not known at the
time the board was laid out so traces simply lead to the
edge of the board and stop. The design uses the new Top-
Cavity PGA pin assignment for the AAMP.
Not included on this board is the input buffer section
which may or may not be needed in a final system. This
design also differs from the prototype circuit in that the
Flip-Flop used is the MM74HC74 which, as pointed out in the
power consumption section, was found to use slighlty more
power than the MM74HC174. Room is, however, available for
replacing this 14-pin chip with the 16-pin chip if deemed
necessary.
57
0000000
'
§
000000000000
O0000001
00000000
DOOOOOQO
DOOOOOOOOOOO
ooooooooo
poooooooooq
oopoooooooo
ooooooooo
000000000000
D
D C
DOOOOOOOOOOO
^D n n
000000000000
oKOQ)
o
0000000
D000000
(O o^o
©3)o
©=
©=
n l
DOOOOOOOOOOO
000000000000
DOOOOOOOOOOO
KANSAS STATE UNIVERSITY
DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING
MINIMAL COMPONENT RAMP BOARD
RRMRNDO CORRALES SEPTEMBER 8.1 986 SCALE 1.75/1
Figure C.l Assembly Drawing
58
KRNSRS STRTE UNIVERSITY
DEPARTMENT OF ELECTRICAL RND COMPUTER ENGINEERING
minimal component flflnp board
RRMRNDO CORRRLES SEPTEMBER 8.1986 SCALE 1.75/1
Figure C.2 Layer 1 -Component Side
59
=8= ^8°
-e*
-8°
-8°
=8° ^°e=
=8°
-#=•
=^
«8=
=8=
KANSAS STATE UNIVERSITY
DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING
MINIMAL COMPONENT AAMP BOARD
ARMANDO CORRALES SEPTEMBER 8.1986 SCALE 1.75/1
Figure C.3 Layer 2 -Ground Plane
60
&~i
CZZ)
J
©
s
C_J
C__J
0©00QQQ0©0©
z< ./; ,4; .^ .// .^J ./
<!m!m!k!m!m:m!><j|>
I
oooooooVioo
a ,n M M M ,',< M .',' M M ,/,
row
®
<8><5><H>
ftj*
© Q
&<a><a><tt>6<9>6<2>&0(!>
ft
KANSAS STATE UNIVERSITY
DEPARTflENT OF ELECTRICAL AND COHPUTER ENGINEERING
minimal component flflnp borrd
ARMANDO CORRALES SEPTEMBER 8.1986 SCALE 1.75/1
Figure C.4 Layer 3
61
ffl OOOM
<::
Y*
e ©
<0>©
^ ©©©a
KANSAS STATE UNIVERSITY
DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING
MINIMAL COMPONENT flATIP BOARD
ARriANDO CORRALES SEPTEMBER 8.1986 SCALE 1.75/1
Figure C.5 Layer 4
62
-e3
*s^
<e*
-03
<#* °^
°^^
-e3
^
c#°
KANSAS STATE UNIVERSITY
DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING
MINIMAL COMPONENT RAT1P BOARD
ARMANDO CORRP.LES SEPTEMBER 8.19BG SCALE 1.75/1
Figure C.6 Layer 5 -Ground Plane
63
KANSAS STATE UNIVERSITY
DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING
niNinflL COMPONENT flflTIP BOARD
BRnONDC CORRALES SEPTEMBER 8.1986 SCflLE 1.75/1
Layer C.7 Layer 6 -Circuit Side
64
KANSAS STATE UNIVERSITY
DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING
n INI URL COMPONENT RAMP BOARD
RRTIflKDO CORRRLES SEPTEMBER 8,198G SCALE 1.75/1
Figure C.8 Keep-out Areas For Power and Ground Planes
65
Ul
us
U2
UE
U3
CI U7
P4
PC LM
C2
&
1
US
C3
KANSAS STATE UNIVERSITY
DEPARTMENT OF ELECTRICAL AND COHPUTER ENGINEERING
HINinRL COMPONENT RRHP BORRD
RRflfiNDO CORRRLES SEPTEMBER 8.1986 SCALE 1.75/1
Figure C.9 Silkscreened Text on Component Side
66
APPENDIX D
The following sub-programs were combined in an order
which allowed results to be monitored to confirm proper
program execution. The fixed precision listings show the
program as it was written for testing. Other tested code
was substituted into this original order of execution.
To make subprograms interchangeable, a filler opcode
was placed in the final high byte of code when the sub-
program consisted of an odd number of bytes. This also
made it possible to differentiate between blocks of code
during time measurements. For the same reason mentioned
with the DUP command in the Architecture section, the use
of NOP's was avoided and instead the INTE command, which
does not cause stack thrashing, was used. Timing estimate
and observation figures were altered to exclude these
filler commands.
Data types for fixed-point implementations are shown
in program headers. The values of S, L, and R in the
notation (S / L / R) represent the presence or absence of a
sign bit, the number of bits left of the decimal point and
the number of bits right of the decimal respectively.
67
D.l Fractional 16-bit precision - loop coding
Executive Entry Table
$0000 00 00 Cont. Status pointer
$0001 71 00 Init. Exec Stack limit
$0002 74 FF Init. Exec Top of Stack
$0003 00 40 Init. Exec PROCID
$0004 00 00 bus error PROCID
$0005 00 00 NMI PROCID
$0006 00 00 INT PROCID
$0007 00 00 Trap PROCID
$0008 00 00 Exception PROCID
Local variables
I - Lenv(l)
Yguick - Lenv(2)
xZ - Lenv(3,4)
THETA - Lenv(5)
vfm-1) - Lenv(6)
y
2
- Lenv(7,8)
x(0) - Lenv(10) input buffer 16 long
x(F) - Lenv(lF)
a(0) - Lenv(20) coefficient table
a(F) - Lenv(2F)
y(0) - Lenv(30) output buffer 16 long
y(F) - Lenv(3F)
none used
ti ii
it ii
ii ii
ii ii
68
Initial Coefficient Table
[Band-pass filter]
$0010 01 7B
$0011 09 3F
$0012 09 9D
$0013 FA IB
$0014 E8 54
$0015 EA 2F
$0016 03 00
$0017 1C 16
$0018 S3 EA
$0019 FD 00
$001A 15 Dl
$001B 17 Al
$001C 05 EA
$001D F6 6E
$001E F6 C2
$001F FE 86
a (0) = .01155124
a (1) = .07222172
a (2) = .07476273
a (3) =- .04603866
a (4) =- .18493122
a [5) =- .17043359
a (6) = .02344914
a [7) = .21941864
a [8) =- .21941864
a (9) =- .02344914
a (A) = .17043359
a (B) = .18493122
a (C) = .04603866
a CD) ss- .07476273
a (E) =- .07222172
a (F) =- .01155124
This block of code copies the initial coefficients
into the Local Environment for efficient access in the FIR
Filter subprogram. Data is assumed to be in the data
format used in the Filter routine
This is the first executable code of the program,
therefore, immediately after invocation of the program, the
executive stack mark, consisting of the program counter
(PC), the Code environment (CENV) , the Procedure identifier
(PROCID), and the Local Environment pointer (LENV) , is
copied into the four memory locations immediately above the
start of the Local Environment.
69
$0020 00 32 procedure header (# local vars)
Block Move
address # cycles
$0021 10 18 1810 LIT8 16 I = 16 11
LI 11 LIT4A.1 5.5
$0022 E5 11 E5 SUB 1 = 1-1 9
41 ASNSL.l save I 10
$0023 01 41 01 REFSL.l 12
$0024 10 56 56
01
10 REFSC $10
REFSL.l
get table(I) 17.5
12
$0025 53 01 53 LOCL 5.5
$0026 20 D4 D4
01
20 ASNSC $20
REFSL.l
store a(I) 15.5
12
$0027 01 01 01 REFSL.l 12
$0028 0E 19 19 0E LIT8N 0E 11
EF SKIPNZ LI 1 = 0? 16
$0029 52 EF 52 POP kill old counter 5.5
i * Time of execution *!
Calculated Observed Error
Inside loop 138 144 - 4%
Set-up 16.5
Total 2225 2624 -15%
input x(0)
This section of code inputs data from input channel one and
stores it to x(0)
$002A 00 1C 1C 1000 REFSI $1000
$002B 5C 10 5C 10 ASNSLE $10
$002C IB 10 IB INTE
get input
store to x(0)
#cycles
23
15.5
! * Time of execution *!
Calculated Observed Error %
Time for input 38.5 38.5 0%
70
FIR Filter
This block of code performs an FIR filter on the array of
input data and writes the result into the Local
Environment. Data and coefficients are in the (1 / / 15)
data format.
Y = SUM a(i) * x(i)
address
i = 1 . . 16
#cyc3 es
10 LIT4A.0
ASNSL.2$002D 42 10 42
$002E 10 18 18 10 LIT8 16
Yquick =
I = 16
5. 5
10
11
L2
F9
$002F E5 11
$0030 01 41
$0031 56 53
$0032 01 10
$0033 56 53
$0034 F9 20
$0035 E4 02
$0036 01 42
$0037 19 01
$0038 EF 13
$0039 IB 52 IB
11 LIT4A. 1
E5 SUB
41 ASNSL. 1
01 REFSL. 1
53 LOCL
56 10 REFSC $10
01 REFSL.,1
53 LOCL
56 20 REFSC $20
mpy
02 REFSL.
2
E4 ADD
42 ASNSL.2
01 REFSL.
1
01 REFSL.
19 13 LIT8N 13
EF SKIPNZ L2
52 POP
INTE
1 = 1-1
get x(I)
get a(I)
a(I) * x(l)
add to y
1 = 0?
5. 5
9
10
12
5.5
17.5
12
5.5
17.5
93
12
9
10
12
12
11
16
kill old counter 5.5
!*
Inside loop
Set-up
Total
Time of execution
Calculated
269.5
32
4344
Observed
224
3701
*!
Error %
20%
17%
71
Filter Update
This block of code moves the array of time delayed input
data to one sample greater delay and leaves x(0) empty for
the next input of data. Data is maintained in the (1 / /
15) format.
x(i + 1) = x(i) i = 1 . 15
flcycles
2F
LU $003A 11 2F 11
E5
$003B 41 E5 41
01
$003C 53 01 53
$003D 10 56 56 10
01
$003E 53 01 53
$003F 11 D4 D4 11
01
$0040 01 01 01
$0041 OF 19 19 OF
EF
LIT4B.F I = 15 5.5
LIT4A.1 5. 5
SUB 1 = 1-1 9
ASNSL.
1
save I 10
REFSL.l 12
LOCL 5.5
REFSC $10 get x(i) 17.5
REFSL.l 12
LOCL 5.5
ASNSC $11 save x(i + 1) 15.5
REFSL.l 12
REFSL.l 12
LIT8N OF 11
SKIPNZ LU 1 = 0? 16
$0042 52 EF 52 POP kill old counter 5.5
i * Time of execution *!
Calculated Observed Error
Inside loop 143.5 156 - 8%
Set-up 11
Total 2163.5 2585 -16%
72
Output buffer Update
This section of code shifts the array of time delayed
output data to one sample greater delay. Data type is
maintained.
y(0) = Yquick
y(i + 1) = y(i) i = 1 . .15
^cycles
02
$0043 5C 02 5C 30
LO
$0044 2F 30 2F
11
$0045 E5 11 E5
41
$0046 01 41 01
53
$0047 56 53 56 30
$0048 01 30 01
53
$0049 D4 53 D4 31
$004A 01 11 01
01
$004B 19 01 19 OF
$004C EF OF EF
52
$004D IB 52 IB
REFSL.2 12
ASNSLE $30 y(0) = Yqu:LCk 15.5
LIT4B.F I = 15 5.5
LIT4A.1 5.5
SUB 1 = 1-1 9
ASNSL.l save I 10
REFSL.l 12
LOCL 5.5
REFSC $30 get y(i) 17.5
REFSL.l 12
LOCL 5. 5
ASNSC $31 save y(i + 1) 15.5
REFSL.l 12
REFSL.l 12
LIT8N OF 11
SKIPNZ LO 1 = 0? 16
POP kill old counter 5.5
INTE
!* Time of execution *!
Calculated
Inside loop 143.5
Set-up 38.5
Total 2334.5
73
Ratio Calculation
r = x /y
To make r comparable to THETA (0/8/8) in the decision
section, x is shifted 4 right prior to squaring. The full
32 bits of the intermediate result are maintained through
divide with a single precision value left on the stack upon
exit.
Overflow will occur if outside range
x
2/ 256 < Y
2
< 256 x2
address
$004E 10 IE IE 10
10
$004F 2C 10 2C
$0050 0C 18 18 0C
8E
(0/8/8) set-up
REFSLE $10
LIT4A.0
LIT4B.C
LIT8 0C
INSERT
Is word
get x(0)
abed mnop
mnopOOO. . . 000
^cycles
17.5
5.5
5.5
11
93
$0051 IE 8E IE 10
$0052 14 10 14
B8
$0053 ED B8 ED
ms word
REFSLE $10 get x(0) 17.5
LIT4A.4 abed mnop 5.5
ARS ssssabed.
.
jkl 49
EXCH ls-ms order 13
End of (0/8/8) set-up
$0054 96 6B
6B
96
DUPD
MPYD X2
9
301
C3 ASNDL.3 store temp 14
$0055
$0056
$0057
IE
10
96
C3
30
6B
IE 30
10
6B
96
REFSLE $30
LIT4A.0
DUPD
MPYD
get y(0)
fractional CVTSD
y
i
17.5
5.5
9
301
33 REFDL.3 get x 2 back 18
$0058
$0059
8D
52
33
97
8D
97
52
EXCHD
DIVD
POP fractional CVTDS
25
313
5.5
74
I * Time of execution *!
(0/8/8)
time of
set-up
execution
Calculated
217.5
1236
Observed
220
1132
Error %
1%
-10%
The same (0/8/8) set-up could be used to shift THETA
in the decision with the shift value then equal to 8
instead of 4.
Decision
The following section compares the result of the ratio
calculation with a threshold, placing a boolean flag
indicating T or F on the top of the stack.
d = 1 r > THETA
r < THETA
^cycles
05 REFSL.5 get THETA 12
$005A ED 05 ED EXCH 13
EC GR T if THETA > r 13
$005B 11 EC 11 LIT4A.1 5.5
EA XOR T if THETA < r 5.5
$005C IB EA IB INTE
! * Time of execution *!
Calculated Observed Error
time of execution 49 54.5 -10 5
75
Weight Update
The following section of code uses past output values to
adapt the coefficients of an FIR filter. In this case,
a(i) is assumed (1 / 4 / 11)
a(i) = a(i) + da * y(i)
^cycles
$005D 10 18 18 10 LIT8 16 I = 16 11
LW 11 LIT4A.1 5.5
$005E E5 11 E5 SUB 1=1-1 9
41 ASNSL.l 10
$005F 01 41 01 REFSL.l get y(i) 12
53 LOCL 5.5
$0060 56 53 56 30 REFSC $30 17.5
$0061 1A 30 1A dada LIT16 da 16.5
$0062 da da F9 MPY 93
$0063 01 F9 01 REFSL.l get a(i) 12
53 LOCL 5.5
$0064 56 53 56 20 REFSC $20 17.5
$0065 E4 20 E4 ADD 9
01 REFSL.l 12
$0066 53 01 53 LOCL store 5.5
$0067 20 D4 D4
01
20 ASNSC $20
REFSL.
1
a(i)+da*y (i) 15.5
12
$0068 01 01 01 REFSL.l 12
$0069 13 19 19
EF
13 LIT8N 13
SKIPNZ LW
11
16
$006A 52 EF 52 POP 5.5
!* Time of execution *!
Calculated
Inside loop 297
Set-up 16.5
Total 4768.5
Observed Error
256 13%
4483.5 6%
76
#cycles
IE 10 REFSLE $10 get x(0) 17.5
IE 10 REFSLE $10 17.5
F9 MPY X 2 93
IIR Filter
v(m) = (l-BETA) * v(m-l) + BETA * x * x
Assume BETA fixed and can be referenced immediate.
The temporary value x 2 was calculated in RATIO CALCULATION
section. 116 microseconds could be saved by recalling the
value and not recalculating.
$0069 10 IE
$006A 10 IE
or replacing above code
04 REFSL.4 get top 16 bits 12
of 32-bit x 2
$006B 1A F9 1A 3E68 LIT16 BETA BETA = .98 16.5
$006C 3E 68 F9 MPY 93
$006D 06 F9 06 REFSL.6 get v(m-l) 12
$006E 47 1A 1A 0147 LIT16 (l-BETA) = .02 16.5
$006F F9 01 F9 MPY 93
E4 ADD 9
$0070 46 E4 46 ASNSL. 6 store v(m) 10
I* Time of execution *!
Calculated Observed Error
normal data 378 372 2%
input = 220
overflow 401
77
D.2 Fractional 16-bit Precision -Inline coding
FIR Filter
This section of code performs an FIR filter on the array
of input data and writes the result into the Local
Environment. Data and coefficients are in the (1 / / 15)
data format.
Y = SUM a(i) * x(i) i = 1 . .16
$002D 00 18 18
$002E 20 IE IE 20
$002F 10 IE IE 10
F9
$0030 E4 F9 E4
$0031 21 IE IE 21
$0032 11 IE IE 11
F9
$0033 E4 F9 E4
$005B 2F IE IE 2F
$005C IF IE IE IF
F9
$005D E4 F9 E4
$005E 30 5C 5C 30
!* Time of execution *!
Calculated Observed Error
Set-up 26.5
time per N 137 —
16 coefficients 2213 2316 -4 ;
^cycles
LIT8 00 y = 11
REFSLE $2 get a(0) 17.5
REFSLE $10 get x(0) 17. 5
MPY 93
ADD y = y + * 9
REFSLE $21 get a(l) 17.5
REFSLE $11 get x(l) 17.5
MPY 93
ADD y = y + * 9
REFSLE $2F get a(F) 17. 5
REFSLE $1F get x(F) 17. 5
MPY 93
ADD y = y + * 9
ASNSLE $30 store to y(0) 15..5
78
Filter Update
This block of code moves the array of time delayed input
data to one sample greater delay and leaves x(0) empty for
the next input of data. Data is maintained in the (1 / /
15) format.
x(i + 1) = x(i) i = 1 15
$005F IE IE IE IE
$0060 IF 5C 5C IF
$0061 ID IE IE ID
$0062 IE 5C 5C IE
REFSLE $1E
ASNSLE $1F
REFSLE $1D
ASNSLE $1E
get x(E)
store to x(F)
get x(D)
store to x(E)
#cycles
17.5
17.5
15.5
$0071 10 IE IE 10
$0072 11 5C 5C 11
REFSLE $00
ASNSLE $01
get x(O)
store to x(l)
17.5
15. 5
!* Time of execution *!
Calculated Observed Error
time per N
16 coefficients
(15 iterations)
33
495
Ratio Calculation
r = (x * x) / (y * y)
This routine retains the full 32 bits after
multiplies. Numerator and denominator are then normalized
in 8-bit increments until the leading byte is non-zero.
The final divide is a full 32 bit by 32 bit fractional
divide. Finally, the LSB is discarded leaving the result,
a 16-bit unsigned fraction. (0/0/16)
offset #cyc]
$0000 10 IE IE
10
10 REFSLE $10
LIT4A.0
get x 17.5
CVTSD fractional 5.5
$0001 6B 10 6B
96
DUPD
MPYD
9
X 2 301
$0002 C3 96 C3 ASNDL.3 14
79
$0003 10 IE IE 10
10
$0004 6B 10 6B
96
$0005 C7 96 C7
04
$0006 08 04 08
E9
$0007 5B E9 5B 1A
REFSLE $30
LIT4A.0
DUPD
MPYD
ASNDL.7
REFSL.4
REFSL.8
OR
SKIPNZI Over
If top word zero
17.5
5.5
9
301
14
12
12
5.5
17
I* initial set-up takes 740.5 cycles *!
$0008 03 1A 03 REFSL.3
44 ASNSL.4
$0009 07 44 07 REFSL.7
48 ASNSL.8
move full word 12
10
12
10
$000A 10 48 10
43
$000B 10 43 10
47
LIT4A.0
ASNSL.3
LIT4A.0
ASNSL.7
clear LSB's 5.5
10
5.5
10
!* double shift takes 75 cycles *!
Over
$000C 04 47
$000D E9 08
$000E 00 1A
$000F E8 FF
$0010 1A 5B 5B 1A SKIPNZI Out
04 REFSL.4
08 REFSL.8
E9 OR
1A FF00 LIT16 FF00
E8 AND
test top byte
mask out bottom
12
12
5.5
16.5
5.5
17
!* single shift test takes 68.5 cycles *!
$0011 28 07
07
28
FD
REFSL.4
LIT4B.8
SHIFTL
shift MSB
12
5.5
49
$0012
$0013
13
11
FD
53
13
53
11
Dl
LIT4A.3
LOCL
LIT4A.1
REFBX
get top 8 bits
from LSB
5.5
5.5
5.5
42.5
$0014 Dl E9 E9
44
OR
ASNSL.4
store 5.5
10
$0015 03 44 03
28
REFSL.3
LIT4B.8 shift LSB
12
5.5
80
$0016 FD 28 FD
43
SHIFTL
ASNSL.3
49
10
$0017 08 43 08 REFSL.8 12
28 LIT4B.8 Shift MSB 5.5
$0018 FD 28 FD
17
SHIFTL
LIT4A.7
49
5.5
$0019 53 17 53 LOCL get top 8 bits 5.5
11 LIT4A.1 from LSB 5.5
$001A Dl 11 Dl REFBX 42 .5
E9 OR store 5.5
$001B 48 E9 48
07
ASNSL.8
REFSL.7
10
12
$001C 28 07 28
FD
LIT4B.8
SHIFTL
shift LSB 5.5
49
$001D 47 FD 47 ASNSL.7 10
!* single shift takes 435 cycles *
OUT 33 REFDL.3
$001E 37 33 37 REFDL.7
97 DIVD
$001F 52 97 52 POP
get x^ 18
get y 2 18
fractional divide 313
save top 16 bits 5.5
!* 32-bit divide and memory accesses take 354.4 cycles *!
Calculated time of execution:
Zero Shifts 1163.5
One Shift 1598.5
Two Shifts 1238.5
Three Shifts 1673.5
81
Weight Update
The follwing section of code uses past output values to
adapt the coefficients of an FIR filter. In this case,
a(i) is assumed (1 / 4 / 11)
a(i) = a(i) + da * y(i)
offset tfcytri
$0000 30 IE IE 30 REFSLE $30 get y(0) 17. 5
$0001 da 1A 1A dada LIT16 da 16. 5
$0002 F9 da F9 MPY y(0) * da 93
$0003 20 IE IE
E4
20 REFSLE $20
ADD
get a(0)
y(0) + da * y(0)
17.
9
,5
$0004 5C E4 5C 20 ASNSLE $20 15..5
$0005 IE 20 IE 21 REFSLE $21 get y(l) 17..5
$0006 1A 21 1A dada LIT16 da 16..5
$0007 da da F9 MPY y(l) * da 93
$0008 IE F9 IE 21 REFSLE $21 17, , 5
$0009 E4 21 E4 ADD y(l) + da * y(l) 9
$000A 21 5C 5C 21 ASNSLE $21 15..5
$004B IE 2E
$004C 1A 2F
$004D da da
$004E IE F9
$004F E4 2F
$0050 2F 5C
IE 2F REFSLE $2F get y(F) 17..5
1A dada LIT16 da 16.,5
F9 MPY y(F) * da 93
IE 2F REFSLE $2F 17.,5
E4 ADD y(F) + da * y(F) 9
5C 2F ASNSLE $2F 15.,5
!* Time of execution *!
Calculated Observed Error
time per N 169
16 coefficients 2704
82
D.3 Standard Precision Floating Point -Loop Coding
Executive Entry Table
$0000 00 00 Cont. Status pointer
$0001 71 00 Init. Exec Stack limit
$0002 74 FF Init. Exec Top of Stack
$0003 00 60 Init. Exec PROCID
$0004 00 00 bus error PROCID none used
$0005 00 00 NMI PROCID " "
$0006 00 00 INT PROCID " "
$0007 00 00 Trap PROCID " "
$0008 00 00 Exception PROCID " "
Local variables
I - Lenv(l)
Yquick - Lenv(2,3)
y
2
- Lenv (4, 5)
THETA - Lenv(6,7)
v(m-l) - Lenv (8,9)
da - Lenv (A, B)
BETA - Lenv(C,D)
1-BETA - Lenv(E,F)
x(0) - Lenv(10,ll) input buffer 16 long
x(F) - Lenv(2E,2F)
83
a(0) - Lenv(30,31) coefficient table
a(F) - Lenv(4E,4F)
y(0) - Lenv(50,51) output buffer 16 long
y(F) - Lenv(6E,6F)
Initial Coefficient Table
[Band-pass filter]
= .01155124
= .07222172
= .07476273
=-.04603866
=-.18493122
=-.17043359
= .02344914
= .21941864
=-.21941864
=-.02344914
= .17043359
$0010
$0011
82
01
D3
7A
a(0)
$0012
$0013
8F
09
60
3E
a(l)
$0014
$0015
D3
09
3E
91
a(2)
$0016
$0017
67
FA
B7
IB
a(3)
$0018
$0019
2C
E8
7E
54
a(4)
$001A
$001B
3B
EA
6F
2F
a(5)
$001C
$001D
61
03
A0
00
a(6)
$001E
$001F
E8
1C
F6
15
a(7)
$0020
$0021
17
E3
05
EA
a(8)
$0022
$0023
9E
FC
60
FF
a(9)
$0024 C4 91 a(A)
$0025 15 DO
84
$0026 D3 82 a(B) = .18493122
$0027 17 A6
$0028 98 49 a(C) = .04603866
$0029 05 E4
$002A 2C C2 a(D) =-.07476273
$002B F6 6E
$002C 70 50 a(E) =-.07222172
$002D F6 CI
$002E 7D 2D a(F) =-.01155124
$002F FE 85
This block of code copies the initial coefficients
into the Local Environment for efficient access in the FIR
Filter subprogram. Data is initially stored in ROM in
extended precision Fractional data format and is converted
to floating by this routine.
This is the first executable code of the program,
therefore, immediately after invocation of the program, the
executive stack mark, consisting of the program counter
(PC) , the Code environment (CENV) , the Procedure identifier
(PROCID), and the Local Environment pointer (LENV)
,
is
copied into the four memory locations immediately above the
start of the Local Environment.
85
$0030 00 70 procedure header
Block Move
Address
$0031 20 18 18 20 LIT8 32
#local variables
I = 32
Icycles
11
Ll 12 LIT4A.2
$0032 E5 12 E5 SUB
41 ASNSL.l
$0033 01 41 01 REFSL.l
$0034 10 68 68 10 REFDC $10
D9 CVTDF
$0035 01 D9 01 REFSL.l
53 LOCL
$0036 A9 53 A9 30 ASNDC $30
$0037 01 30 01 REFSL.l
01 REFSL.l
$0038 19 01 19 OF LIT8N OF
$0039 EF OF EF SKIPNZ Ll
1 = 1-2
save I
get table(I)
store a(i;
1 = 0?
5.5
9
10
12
23.5
137
12
5.5
19.5
12
12
11
16
52 POP
$003A IB 52 IB INTE
kill old counter 5.5
I* Time of execution *!
Inside loop
Set-up
Total
Calculated
285
16.5
4576.5
Observed
296
4444
Error
- 4%
2%
FIR Filter
This block of code performs an FIR filter on the array of
input data and writes the result into the Local
Environment. Data and coefficients are in floating-point
data format.
Y = SUM a(i) * x(i) i = 1 . .16
86
^cycles
10 LIT4A.0 Yquick = 5.5
$003B 10 10 10
C2
LIT4A.0
ASNDL.2
5.5
14
$003C 18 C2 18 20 LIT8 20 I = 32 11
L2 $003D 12 20 12 LIT4A.2 5.5
E5 SUB 1 = 1- 2 9
$003E 41 E5 41
01
ASNSL.
1
REFSL.l
10
12
$003F 53 01 53 LOCL 5.5
$0040 10 68 68
01
10 REFDC $10
REFSL.l
get x(I) 23 . 5
12
$0041 53 01 53 LOCL 5.5
$0042 30 68 68 30 REFDC $30 get a(I) 23.5
86 MPYF a(I) * x (I) 293
$0043 32 86 32
84
REFDL.2
ADDF
18
153
$0044 C2 84 C2
01
ASNDL.2
REFSL.l
add to y 14
12
$0045 01 01 01 REFSL.l 12
$0046 13 19 19
EF
13 LIT8N 13
SKIPNZ L2 1 = 0?
11
16
$0047 52 EF 52 POP kill old counter 5.5
i • Time of execution * i
Calculated Observed Error
Inside loop 635.5 748 -15 5
Set-up 41.5
Total 10209.5 12533 -19*
87
Filter Update
This block of code moves the array of time delayed input
data to one sample greater delay and leaves x(0) empty for
the next input of data.
X(i + 1) = X(i) i = 1 . .15
address # cycle
$0048 IE 18 18 IE LIT8 3 I = 30 11
LU 12 LIT4A.2 5.5
$0049 E5 12 E5 SUB 1 = 1-2 9
41 ASNSL.l save I 10
$004A 01 41 01
53
REFSL.l
LOCL
12
5.5
$004B 68 53 68 10 REFDC $10 get x(i) 23.5
$004C 01 10 01
53
REFSL.
1
LOCL
12
5.5
$004D A9 53 A9 12 ASNDC $12 save x(i + 2) 19.5
$004E 01 12 01
01
REFSL.l
REFSL.l
12
12
$004F 19 01 19 OF LIT8N OF 11
$0050 EF OF EF SKIPNZ LU 1 = 0? 16
52 POP kill old counter 5.5
Input
The following segment inputs data from the input channel,
converts to floating, and stores is as x(0) in the Local
Environment Extended.
$0051 1C 52 1C 1000 REFSI $1000 get input 23
$0052 10 00 10 LIT4A.0 fractional CVTSD 5.5
$0053 D9 10 D9 CVTDF 137
$0054 10 F7 F7 10 ASNDLE $10 19.5
88
!* Time of execution *!
Calculated Observed Error %
Inside loop 153.5 —
Set-up 16.5 —
Total 2473 2800 -12%
input 185 201 - 8%
Ratio Calculation
r = x
2/y 2
Assume x and y are stored in Local Environment Extended,
r is left on stack upon completion of segment.
offset tfcycles
$0000 50 22 22 50 REFDLE $50 get y(0) 23.5
6B DUPD 9
$0001 86 6B 86 MPYF 293
C4 ASNDL.4 store y 2 temp 14
$0002 22 C4 22 10 REFDLE $10 get y(0) 23.5
$0003 6B 10 6B DUPD 9
86 MPYF X2 293
$0004 34 86 34 REFDL.4 get y 2 back 18
87 DIVF 313
Calculated time of execution = 996 cycles
Decision
d = 1 r > THETA
r < THETA
d is boolean
Assume THETA stored in Lenv(6,7)
Assume that r was left on stack from RATIO CALCULATION
segment.
offset
$0006 36 87 36 REFDL.6
#cycles
18
$0007 8A 8D
8D
8A
EXCHD
GRF
25
49
$0008 EA 11
11
EA
LIT4A.1
XOR
toggle flag 5.5
5.5
Calculated time of execution = 103 cycles
89
Weight Update
The follwing section of code uses past output values to
adapt the coefficients of an FIR filter. Data is in
floating-point notation.
a(i) = a(i) + da * y(i)
Assume da stored in Lenv(A,B)
offset #cycles
$0000
LW
20 18 18
12
20 LIT8 3 2
LIT4A.2
I = 32 11
5.5
$0001 E5 12 E5
41
SUB
ASNSL.l
I = 1-2 9
10
$0002 01 41 01
53
REFSL.l
LOCL
12
5.5
$0003 68 53 68 50 REFDC $50 get y(i) 23.5
$0004 3A 50 3A
86
REFDL.A
MPYF
18
293
$0005 01 F9 01
53
REFSL.l
LOCL
12
5.5
$0006 68 53 68 30 REFDC $30 get a(i) 23.5
$0007 84 30 84 ADDF 153
01 REFSL.l 12
$0008 53 01 53 LOCL store 5.5
$0009 30 A9 A9
01
30 ASNDC $30
REFSL.l
a(i)+da*y (i) 19.5
12
$000A 01 01 01 REFSL.l 12
$000B 16 19 19
EF
16 LIT8N 16
SKIPNZ LW
11
16
$000C 52 EF 52 POP 5.5
!* Time of execution *!
Calculated Observed Error %
Inside loop 658.5 —
Set-up 16.5 —
Total 10552.5 —
90
IIR Filter
v(m) = (1-BETA) * v(m-l) + BETA * x * x
Assume BETA and 1-BETA stored in Lenv
offset #cycles
$0000 10 22 22 10
6B
REFDLE $10
DUPD
get x(0) 23 .5
9
$0001 86 6B 86 MPYF X 2 293
3C REFDL.C get BETA 18
$0002 86 3C 86 MPYF 293
38 REFDL.8 get v(m-l) 18
$0003 3E 38 3E
86
REFDL.E
MPYF
get (1-BETA) 18
293
$0004 84 86 84 ADDF 153
C8 ASNDL.8 store v(m) 14
$0005 IB C8 IB INTE
Calculated time of execution = 1132.5 cycles
D.4 Standard Precision Floating Point -Inline Coding
FIR Filter
This block of code performs an FIR filter on the array of
input data and writes the result into the Local
Environment. Data and coefficients are in floating-point
data format.
Y = SUM a(i) * x(i) i = 1 . .16
offset ^cycles
10 LIT4A.0 5.5
$0000 10 10 10 LIT4A.0 y = 5.5
$0001 30 22 22 30 REFDLE $30 get a(0) 23.5
$0002 10 22 22 10 REFDLE $10 get x(0) 23.5
86 MPYF 293
$0003 84 86 84 ADDF y y + * 153
91
$0004 32 22 22 32 REFDLE $32 get a(l) 23.5
$0005 12 22 22 12 REFDLE $12 get x(l) 23.5
86 MPYF 293
$0006 84 86 84 ADDF y = y + * 153
$002E 3E 22 22 4E REFDLE $4E get a(F) 23.5
$002F 2E 22 22 2E REFDLE $2E get x(F) 23.5
86 MPYF 293
$0030 84 86 84 ADDF y = y + * 153
$0031 50 F7 F7 50 ASNDLE $50 store to y(0) 19.5
!* Time of execution *!
Calculated Observed Error
Set-up 3 0.5
time per N 493
16 coefficients 7918.5
Filter Update
This block of code moves the array of time delayed input
data to one sample greater delay and leaves x(0) empty for
the next input of data.
x(i + 1) = x(i) i = 1 . .15
offset #cycl
$0000 2C 22 22 2C REFDLE $2C get x(E) 23. 5
$0001 2E F7 F7 2E ASNDLE $2E store to x(F) 19. 5
$0002 2A 22 22 2A REFDLE $2A get x(D) 23.5
$0003 2C F7 F7 2C ASNDLE $2C store to x(E) 19.5
$001E 10 22
•
22 10
•
REFDLE $10
•
get x(0) 23.5
$001F 12 F7 F7 12 ASNDLE $12 store to x(l) 19.5
92
!* Time of execution *!
Calculated Observed Error %
time per N 4 3
16 coefficients 646
(15 iterations)
Weight Update
The follwing section of code uses past output values to
adapt the coefficients of an FIR filter. Data is in
floating-point notation.
a(i) = a(i) + da * y (i)
Assume da stored in Lenv(A,B)
offset Icycles
$0000 50 22 22 50 REFDLE $50 get y(0) 23.5
3A REFDL.A get da 18
$0001 86 3A 86 MPYF y(0) * da 293
$0002 30 22 22 30 REFDLE $30 get a(0) 23.5
84 ADDF 153
$0003 F7 84 F7 30 ASNDLE $30 save new a(0) 19.5
$0004 22 50 22 52 REFDLE $52 get y(l) 23.5
$0005 3A 52 3A REFDL.A get da 18
86 MPYF y(l) * da 293
$0006 22 86 22 32 REFDLE $32 23.5
$0007 84 32 84 ADDF 153
$0008 32 F7 F7 32 ASNDLE $32 save new a(l) 19.5
$003A 22 6C 22 6E REFDLE $6E get y(F) 23.5
$003B 3A 52 3A REFDL.A get da 18
86 MPYF y(F) * da 293
$003C 22 86 22 4E REFDLE $4E 23.5
$003D 84 4E 84 ADDF y(F) + da * y(F) 153
$003E 4E F7 F7 4E ASNDLE $4E 19.5
!* Time of execution *!
Calculated Observed Error %
time per N 530.5 —
16 coefficients 8488 —
93
none used
it ii
ii ii
ii ii
ii ii
D.5 Extended Precision Floating Point -Loop Coding
Executive Entry Table
$0000 00 00 Cont. Status pointer
$0001 71 00 Init. Exec Stack limit
$0002 74 FF Init. Exec Top of Stack
$0003 00 80 Init. Exec PROCID
$0004 00 00 bus error PROCID
$0005 00 00 NMI PROCID
$0006 00 00 INT PROCID
$0007 00 00 Trap PROCID
$0008 00 00 Exception PROCID
Local variables
I - Lenv(l)
Yquick - Lenv(2,3,4)
y2 - Lenv(5,6,7)
THETA - Lenv(8,9,A)
v(m-l) - Lenv(B,C,D)
da - Lenv(A0,Al,A2)
BETA - Lenv(A3,A4,A5)
1-BETA - Lenv(A6,A7,A8)
x(0) - Lenv(10,ll,12) input buffer 16 long
x(F) - Lenv(3D,3E,3F)
a(0) - Lenv(40, 41,42) coefficient table
a(F) - Lenv(6D,6E,6F)
94
y(0) - Lenv(70,71,72) output buffer 16 long
y(F) - Lenv(9D,9E,9F)
Initial Coefficient Table
[Band-pass filter]
= .01155124
= .07222172
= .07476273
=-.04603866
=-.18493122
=-.17043359
= .02344914
= .21941864
=-.21941864
=-.02344914
= .17043359
= .18493122
= .04603866
=-.07476273
$0010
$0011
82
01
D3
7A
a(0)
$0013
$0014
8F
09
60
3E
a(l)
$0016
$0017
D3
09
3E
91
a(2)
$0019
$001A
67
FA
B7
IB
a(3)
$001C
$001D
2C
E8
7E
54
a(4)
$001F
$0020
3B
EA
6F
2F
a(5)
$0022
$0023
61
03
A0
00
a(6)
$0025
$0026
E8
1C
F6
15
a(7)
$0028
$0029
17
E3
05
EA
a(8)
$002B
$002C
9E
FC
60
FF
a(9)
$002E
$002F
C4
15
91
DO
a (A)
$0031
$0032
D3
17
82
A6
a(B)
$0034
$0035
98
05
49
E4
a(C)
$0037 2C C2 a(D)
$0038 F6 6E
95
$003A 70 50 a(E) =-.07222172
$003B F6 CI
$003D 7D 2D a(F) =-.01155124
$003E FE 85
This block of code copies the initial coefficients
into the Local Environment for efficient access in the FIR
Filter subprogram. Data is initially stored in ROM in
extended precision Fractional data format and is converted
to floating-point extended by this routine.
This is the first executable code of the program,
therefore, immediately after invocation of the program, the
executive stack mark, consisting of the program counter
(PC), the Code environment (CENV) , the Procedure identifier
(PROCID)
, and the Local Environment pointer (LENV) , is
copied into the four memory locations immediately above the
start of the Local Environment.
$0040 00 A8 procedure header #local variables
96
Block Move
Address #cycles
$0041 30 18 18 30 LIT8 48 I = 48 ll
Ll
$0042 E5 13
13
E5
41
LIT4A.3
SUB
ASNSL.l
1 = 1-3
save I
5.5
9
10
$0043
$0044
01
10
41
68
01
68
ce
10
REFSL.l
REFDC $10
CVTDFE
get table(I)
12
23.5
225
$0045
$0046
01
99
D9
53
01
53
99 40
REFSL.l
LOCL
ASNDC $4 store a(I)
12
5.5
23.5
$0047
$0048
$0049
01
19
EF
40
01
OF
01
01
19
EF
OF
REFSL.l
REFSL.l
LIT8N OF
SKIPNZ Ll 1 = 0?
12
12
11
16
$004A IB 52
52
IB
POP
INTE
kill old count er 5.5
1 * Time of execution *!
Inside loop
Set-up
Total
Calculated
377
16.5
6048.5
Observed Error %
FIR Filter
This block of code perforins an FIR filter on the array of
input data and writes the result into the Local
Environment. Data and coefficients are in floating-point
data format.
Y = SUM a(i) * x(i) i = 1 16
97
address
$004B 10 10
$004C B5 10
$004D 18 02
L2 $004E 13 30
$004F 41 E5
$0050 53 01
$0051 10 76
$0052 53 01
$0053 40 76
$0054 77 94
$0055 92 02
$0056 02 B5
$0057 01 01
$0058 16 19
$0059 52 EF 52
^cycles
10 LIT4A.0
10 LIT4A.0
10 LIT4A.0
B5 02 ASNTLE 02
18 02 LIT8 3
13 LIT4A.3
E5 SUB
41 ASNSL.l
01 REFSL.l
53 LOCL
76 10 REFTC $10
01 REFSL.l
53 LOCL
76 20 REFTC $3
94 MPYFE
77 02 REFTLE 02
92 ADDFE
B5 02 ASNTLE 02
01 REFSL.l
01 REFSL.l
19 16 LIT8N 16
EF SKIPNZ L2
Yquick =
I = 48
1 = 1-3
get x(I)
get a (I)
a(I) * x(l)
add to y
POP
5.5
5.5
5.5
23.5
11
5.5
9
10
12
5.5
59
12
5.5
59
539
84
229
23.5
12
12
111=0? 16
kill old counter 5.5
Time of execution
Calculated
Inside loop 1104
Set-up 56.5
Total 17720.5
Observed
*!
Error
Filter Update
This block of code moves the array of time delayed input
data to one sample greater delay and leaves x(0) empty for
the next input of data.
98
x(i + 1) = x(i) i = 1 15
$005A 2D 18 18 2D LIT8 45 I = 45
#cycles
11
LU 13
$005B E5 13 E5
41
$005C 01 41 01
53
$005D 76 53 76 10
$005E 01 10 01
53
$005F 99 53 99 13
$0060 01 13 01
01
$0061 19 01 19 OF
$0062 EF OF EF
52
LIT4A.3
SUB
ASNSL.l
REFSL.l
LOCL
REFTC $10
REFSL.
1
LOCL
ASNTC $13
REFSL.l
REFSL.l
LIT8N OF
SKIPNZ LU
POP
5.5
9
10
12
5.5
59
12
5.5
23.5
12
12
111=0? 16
kill old counter 5.5
1 = 1-3
save I
get x(i)
save x(i + 2)
Input
The following segment inputs data from the input channel,
converts it to floating-point extended, and stores is as
x(0) in the Local Environment Extended.
$0063 1C 52 1C 1000 REFSI $1000
$0064 10 00 10 LIT4A.0
$0065 6C 10 6C CVTDFE
$0066 10 B5 B5 10 ASNTLE $10
get input 23
fractional CVTSD 5.5
225
23.5
I * Time of execution *!
Inside loop
Set-up
Total
input
Calculated
193
16.5
2911
277
Observed Error
99
Ratio Calculation
r = x /y
Assume x and y are stored in Local Environment Extended,
r is left on stack upon completion of segment.
offset ^cycles
$0000 70 77 77 70 REFTLE $70 get y(0) 84
7 9 DUPT 4 9.5
$0001 94 79 94 MPYFE 539
$0002 05 B5 B5 05 ASNTLE 05 store y 2 temp 23.5
$0003 77 C4 77 10 REFTLE $10 get y(0) 84
$0004 79 10 79 DUPT 49.5
94 MPYFE X2 539
$0005 77 94 77 05 REFTLE 05 get y 2 back 23.5
$0006 95 05 95 DIVFE 706
Calculated time of execution = 2098 cycles
Decision
d m 1 r > THETA
r < THETA
d is boolean
Assume THETA stored in Lenv(8,9,A)
Assume that r was left on stack from RATIO CALCULATION
segment.
#cycles
$0007 08 77 77 08 REFTLE 08 84
9D EXCHT 47
$0008 91 9D 91 GRFE 53
11 LIT4A.1 toggle flag 5.5
$0009 EA 11 EA XOR 5.5
Calculated time of execution = 195 cycles
100
Weight Update
The following section of code uses past output values to
adapt the coefficients of an FIR filter. Coefficients are
in floating-point extended format.
a(i) = a(i) + da * y(i)
Assume da stored in Lenv (AO , Al , A2
offset ^cycles
$0000 30 18 18 30 LIT8 48 I = 48 11
LW 13 LIT4A.3 5.5
$0001 E5 13 E5 SUB 1=1-3 9
41 ASNSL.l 10
$0002 01 41 01 REFSL.l 12
53 LOCL 5.5
$0003 76 53 76 70 REFTC $70 get y(i) 59
$0004 77 70 77 A0 REFTLE $A0 84
$0005 94 A0 94 MPYFE 539
01 REFSL.l 12
$0006 53 01 53 LOCL 5.5
$0007 40 76 76 40 REFTC $40 get a(i) 84
92 ADDFE 229
$0008 01 92 01 REFSL.l 12
53 LOCL store 5.5
$0009 99 53 99 40 ASNTC $40 a (i)+da*y (i) 23.5
$000A 01 40 01
01
REFSL.l
REFSL.l
12
12
$000B 19 01 19
EF
17 LIT8N 17
SKIPNZ LW
11
16
$000C EF 17 52 POP 5.5
$OO0D IB 52 IB INTE
!* Time of execution *!
Calculated Observed Error %
Inside loop 1146.5 —
Set-up 16.5 —
Total 18360.5 —
101
IIR Filter
v(m) = (1-BETA) * v(m-l) + BETA * x * x
Assume BETA and 1-BETA stored in Lenv
offset flcycles
$0000 10 77 77 10 REFTLE §10 get x(0) 84
79 DUPT 49.5
$0001 94 79 94 MPYFE x2 539
$0002 A3 77 77 REFTLE A3 get BETA 84
94 MPYFE 539
$
REFTLE
REFTLE
MPYFE
$0B
$A6
get v(m-l)
get 1-BETA
ADDFE
ASNTLE
INTE
$0B store v(m)
$0003 77 94 77 OB 84
$0004 77 OB 77 A6 84
$0005 94-A6 94 539
92 229
$0006 B5 92 B5 0B 23.5
$0007 IB 0B IB
Calculated time of execution = 2255 cycles
102
D.6 Extended Precision Floating Point -Line coding
FIR Filter
This block of code performs an FIR filter on the array of
input data and writes the result into the Local
Environment extended. Data and coefficients are in
floating-point data format.
Y = SUM a(i) * x(i) i = 1 . . 16
offset tfcycles
10 LIT4A.0 5.5
$0000 10 10 10 LIT4A.0 5.5
10 LIT4A.0 y = 5.5
$0001 77 10 77 40 REFTLE $40 get a(0) 84
$0002 77 40 77 10 REFTLE $10 get X(0) 84
$0003 94 10 94 MPYFE 539
92 ADDFE y=y+* 229
$0004 77 92 77 43 REFTLE $43 get a(l) 84
$0005 77 43 77 13 REFTLE $13 get X(l) 84
$0006 94 13 94 MPYFE 539
84 ADDFE y = y + * 229
$002E 77 92 77 6D REFTLE $6D get a(F) 84
$002F 77 6D 77 3D REFTLE $3D get X(F) 84
$0030 94 3D 94 MPYFE 539
92 ADDFE y = y + * 229
$0031 B5 92 B5 70 ASNTLE $70 store to y(0) 23.5
$0032 IB 70 IB INTE
!* Time of execution *!
Calculated Observed Error %
Set-up 40 —
time per N 936 —
16 coefficients 15016 —
103
Filter Update
This block of code moves the array of time delayed input
data to one sample greater delay and leaves x(0) empty for
the next input of data.
x(i + 1) = x(i) i = 1 15
offset
$0000 3A 77 77 3A REFTLE $3A
$0001 3D B5 B5 3D ASNTLE $3D
$0002 37 77 77 37 REFTLE $37
$0003 3A B5 B5 3A ASNTLE $3A
get x(E)
store to x(F)
get x(D)
store to x(E)
tfcycles
84
23. 5
84
23.5
$001E 10 77 77 10
$001F 13 B5 B5 13
REFTLE $10 get x(0) 84
ASNTLE $13 store to x(l) 23.5
!* Time of execution *!
Calculated Observed Error
time per N 107.5
16 coefficients 1612.5
(15 iterations)
Weight Update
The follwing section of code uses past output values to
adapt the coefficients of an FIR filter. Data is in
floating-point notation.
a(i) = a(i) + da * y(i)
Assume da stored in Lenv(A0, Al, A2)
offset #cycles
$0000 70 77 77 70 REFTLE $70 get y(0) 84
$0001 A0 77 77
94
A0 REFTLE
MPYFE
$A0 get da 84
539
$0002 77 94 77 40 REFTLE $40 get a(0) 84
$0003 92 40 92 ADDFE 229
$0004 40 B5 B5 40 ASNTLE $40 store a(0) 23.5
104
$0005 73 77 77 73 REFTLE $73 get y(l) 84
$0006 A0 77 77 AO REFTLE $A0 get da 84
94 MPYFE 539
$0007 77 94 77 43 REFDLE $43 get a(l) 84
$0008 92 43 92 ADDFE 229
$0009 40 B5 B5 40 ASNTLE $43 store a(l) 23.!
$0044 9D 77 77 9D REFTLE $9D get y(F) 84
$0045 AO 77 77 AO REFTLE $A0 get a(F) 84
94 MPYFE 539
$0046 77 94 77 6D REFTLE $6D get a(F) 84
$0047 92 6D 92 ADDFE 229
$0048 6D B5 B5 6D ASNTLE $6D Store a(F) 23.5
!* Time of execution *!
Calculated Observed Error %
time per N 1043
16 coefficients 16696
105
AN IMPLEMENTATION AND EVALUATION OF A
MINIMAL-COMPONENT MINIMAL-POWER MICROCOMPUTER
SYSTEM USING ROCKWELL'S AAMP
by
Eric Nelson
B. S. , Kansas State University, 1985
A. G. S., Colby Community Junior College, 1983
AN ABSTRACT OF A MASTER'S THESIS
submitted in partial fulfillment of the
requirements for the degree
MASTER OF SCIENCE
Department of Electrical and Computer Engineering
KANSAS STATE UNIVERSITY
Manhattan, Kansas
1987
ABSTRACT
This thesis describes a hardware inplementation of a
minimal component, minimal power, microcomputer system
using Rockwell's Advanced Architecture Microprocessor
(AAMP), a high-performance 16-bit floating-point CMOS
microprocessor.
The research was funded through a contract between
Kansas State University and Sandia National Laboratories
which has traditionally supported ultra-low power designs
of A/D converters and microcomputer systems.
The system described is intended for use as the signal
processing portion of an ultra-low power, highly portable,
intruder detection system, therefore low power consumption
and minimal parts count were the main objectives in design.
The thesis briefly describes the AAMP with more
emphasis placed on items used in the design. Power
consumption is shown for the system and for each component
individually, with parts selections based on these figures
explained. Performance of the AAMP as a signal processor
is tested using several signal processing code segments
with time of execution shown for various types of data.
The final product is a floating-point capable
microcomputer system with power consumption at a proposed
2.6 Mhz operation of under 25 mW.

