Digital Signal Processing Capabilities of the Fujitsu MB8764 by Creech, Harold B.
University of Central Florida 
STARS 
Retrospective Theses and Dissertations 
1985 
Digital Signal Processing Capabilities of the Fujitsu MB8764 
Harold B. Creech 
University of Central Florida 
 Part of the Engineering Commons 
Find similar works at: https://stars.library.ucf.edu/rtd 
University of Central Florida Libraries http://library.ucf.edu 
This Masters Thesis (Open Access) is brought to you for free and open access by STARS. It has been accepted for 
inclusion in Retrospective Theses and Dissertations by an authorized administrator of STARS. For more information, 
please contact STARS@ucf.edu. 
STARS Citation 
Creech, Harold B., "Digital Signal Processing Capabilities of the Fujitsu MB8764" (1985). Retrospective 
Theses and Dissertations. 4809. 
https://stars.library.ucf.edu/rtd/4809 
DIGITAL SIGNAL PRcx:ESSilJG CAPABILITIES 
OF THE FUJITSU MB8764 
BY 
HARALD BEAADALL CREECH 
B.S.E.E., United States Coast Guard Academy, 1977 
RESEARCH REPORT 
Subnitted in partial fulfillment of the requirements 
for the degree of Master of Science in Engineering 
in the Graduate Studies Program of the College of Engineering 
University of Central Florida 
Orlando, Florida 
Sumner Tenn 
1985 
ABSTRACT 
The Fujitsu MB8764 digital signal processing chip is designed 
to operate with a machine cycle of up to 10 MHz. The chip's ability 
to perform a 16-by-16 bit multiply and add operation in one 
machine cycle makes it a good candidate for real time digital signal 
processing. Unlike the Intel 2920 the Fujitsu MB8764 does not have 
an onboard analog-to-digital, digital-to-analog converter. There-
fore, this paper will be restricted to the operation of this 
devire with digital data input and output. 
The use of the MB8764 as a digital filter is analyzed. 
Perfonnance limitations due to finite word length, memory size 
and configuration, and clock rate are considered. The MB8764 
capabilities in computing fast Fourier transforms are discussed. 
Developnent of a working digital filter with the MB8764 work 
station is presented. 
ACKNOWlEIQ:MENTS 
I would like to thank Barry Mattox and Michael Gorlicki at 
Martin Marietta aerospace division for granting me access to the 
only MB8764 developnent system in Orlando; to Dr. Fred O. Simons for 
advising me in my graduate studies; to my wife, Terry, for giving ne 
the tim::: to write this paper; and to Jesus Christ, my savior, who rrakes 
all things possible. 
iii 
TABLE OF CONTENTS 
INTRODUCTION . . . . . . . 
DESCRIPTION OF THE MB8764 • . . . . . . . . . . . . 
DESIGNING A DIGITAL FILTER ON THE MB8764 
THE MB8764 DEVELOPMENT SYSTEM 
COMPUTING THE DISCRETE FOURIER TRANSFORM 
ON THE MB8764 • • • • • • • • • 
CONCLUSIONS • • . . . . . . . . 
LIST OF REFERENCES . . . . . . . . . . 
iv 
. . . . 
1 
3 
• 18 
• 43 
• 53 
59 
62 
INTRODUCTION 
Many microprocessors are available today that are specifically 
designed for digital signal processing. The Fujitsu MB8764 is one of 
the newest digital signal processing chips on the market, and has 
incorporated recent advances in VLSI teclmology into its design. 
Two widely used . chips that may be compared with the Fujitsu MB8764 
are the Intel 2920 and the TMS 320. 
A comparison between the Intel 2920. and the Fujitsu MB8764 
shows the MB8764 to be a much faster chip with a more extensive 
instruction set. The Intel 2920 offers 24-bit internal precision 
which is much better than the .16-bit precision offered by the 
MB8764. The Intel 2920 also offers an .onboard AOC and DAC for 
analog input and outputo The MB8764 accepts digital input output 
only. Internal RAM and program ROM are much larger in the MB8764 
and orily the MB8764 permits the external expansion of them. 
'l:he TMS 320 is a much closer match to the ~B8764 than the Intel 
2920. The MB8764 is once again the faster machine with a 0.1 µsec 
instruction cycle compared to the TMS 320's 0.2 µsec instruction 
cycle. Specifications fran the manufacturers show the TMS320 and 
MB8764 implementing a second order filter in 2. 2 µsec and 0. 7 µsec 
respectively. Both the TMS 320 and the MB876'+ use an assembly 
language level instruction set and neither accepts analog inputs. 
Internal ~ccuracy of the TMS 320 is 16 bits but its design makes it 
· ~ssible to :implement. double-p:rBcision oper_9..tionso The design of 
2 
the MB8764 makes it :inp:ra.ctical to implement double precision. The 
MB8764 offers more than twice as much internal RAM as the TMS 320 
but only two-thirds the internal instruction RCM. 
The MB8764 can be favorably compared to both of these widely 
distributed chips. It excels in the area of execution speed but is 
deficient in its internal accuracy. 
DESCRIPrICN OF THE MB8764 
Introduction 
The Fujitsu MB8764 digital signal processing chip is a VLSI, 
Cl10S design optimized to provide high-speed prucessing with flexible 
memory operation and input/output operation. Internal and external 
buses provide 16-bit data transfer, and the ALU provides a 26-bit 
result to the accumulator. The instruction list provides the chip 
user with a variety of :instructions, rrost of which are specifically 
designed to simplify the irnplerrentation of digital signal processing 
flll1ctions. Internal rnerrory provides for a program RCM of 1024-by-24 
bits, and RAM storage of 256-by-16 bits. Both RCl1 and RAM are 
expandable externally. These features are all provided on an 88-pin 
chip less than 31 mn squareo Th.is chapter will describe the basic 
operation of the blocks that make up the MB8764. Figure 1 is a 
block diagram of the MB 8764. The material in this section cones 
fran references (1) and (2). 
Registers on the chip can be divided into four groups: data 
registers, cmmter registers, index/ address registers, and flags 
(see Figure 2). The functions of these registers will be explained 
in the follc:Ming sections. 
3 
4 
-· 
m 
pl u XS [ YS SEQUEnCE 
conTROL 
e1 n~r: 
w--~ ......,_ __ " ,... UP -
us 
rl >< 
tp I • 
co : .... :~ .... ;m 
C 1 ·.~ ~ ::: ,. --... I A02 ;: --,~~....,.__ _ --..-~ ::: - OPE ____ .......,.11!.._pl.______, AO, I 
TBA~ I PCS I 
l 
.,...__ ..... : pc 1 111.-.~ PGT I+ I D~A I 
J. 
_ _,. lll~PGM 
•• ____ .,..._ _ ____.1-6--J 
( OPB !--,:~~· .. -
I 
.l. 
·· · 
I OPR l 
l I I ADM I 
I I ~o I l 
l 
I IR 1 l 
.. I OEC I LAO 
41 
DECODER 
IAAM 
RRn 1-------------1 
AARM BRAM 
..... 
I A I 
I D ~--~ ------i.--; jjj FL 0 . Fi I 
'----__.::=======':...... ____ __, ~ 
RRITHnETIC RHO LOGIC BLOC( 
Figure 1. MB8764 Block Diagram. 
5 
1. Data registers 
2. 
A ~T1111111111111P1 
B /j I I I I .1 I I I I I I I I p I operation input registers 
0 fy I I I I I I I I I I I I I I I I I I I I I I I P I a cc um u I at or 
El ~Y1111111111111P1 external input register 
EO ~Y1111111111111P1 external output register 
Control registers 
PC g p t I I I I I I I I : I program counter 
PCS g p I I I I I I I I I : I program counter stack 
co jl c I I I I I II I loop counters 
Cl to-Pi 
X f 1 I I I I P, 
XS f I I I I I P, 
Y t1 I I I I I P, 
YS t 1 I I J J I p I 
index f"'egisters 
and their stocks 
one 
PGM 
PGT 
UP 
u 
jl I I I I I I p I OMA counter 
tfJ 
tr:fJ 
r c I I I I 
ERAn page register 
ROM page register 
vif"'tuol shift pointer 
unit address register 
EIA~Y I J 11111111111 P1 external input address register 
EA ~I I I I I I I I I I I I I rfi ext erno I address register 
1/0 flags 
FD-input flag 0 
F 1 - input f I ag 1 
IF-EI f I ag 
OF -EO f I ag 
DMM-DMA mode 
ROM-address •ode 
ALU f lggs 
PL-D positive 
MI - D neg at i ve 
ZR- D zero 
OU- D overf I ow 
CLP~ c I i pp i ng mode 
Figure 2. MB8764 Registers. 
Other flogs 
UP-vitual pointer 
mode 
6 
Clock Generator 
The clock generator requires an external clock source or a 
crystal oscillator of 20 MHz or less for its input, and outputs a 
50% duty clock source at one-half the input clock frequency. The 
output is used to time all internal operations; one internal clock 
cycle equals one instruction cycle. The majority of instructions 
require one instruction cycle to operate , or 0 • 1 µ sec when using a 
20 MHz external clock. 
Arithmetic and ·lbgic Block 
The aritrunetic and logic block accepts input into registers A, 
B, and D. Instructions in the MB8764 are classified as: 
1) Arithmetic or logic instruction, and 
2) Control instructions. 
Arithrretic and logic instructions are executed in the arithmetic and 
logic block by the AllJ with the exception of multiplication 
instructions. All arithmetic and logic instructions can be executed 
together with a control instruction; this type instruction is called 
a compolllld instruction. A comPJund instruction that does not 
include a multiplication instruction performs: 1) the control 
instruction specifiec;l, and 2) the arithmetic and logic instruction 
based on the register contents as of the previous instruction cycle. 
An example is shcw.n below. (Asswne B register has $0002 in it.) 
Step 1 LDI :NOP #$0001 Put $000+ into the A. No math operation. 
Step 2 lDI:ADD #$0005 Put $0005 into A. Add $0001 to $0002. 
· The D register contains $0003 in step · 3. 
7 
The rrn.lltiplication of the contents of register A by register B 
is performed during each instruction cycle, regardless of the 
instruction. A multiplier circuit separate fran the AUJ and using 
Booth's second-order algorithm performs the multiplication. Booth's 
algorithm is a simple and direct method for multiplication of signed 
binary m.unbers ( 3) • The intermediate results of regfster A 
multiplied by register B are stored in temporary storage registers 
TRO and TRl. When a rrn.lltiplication instruction is given, the AllJ 
completes the multiplication by adding TRO and TRl. The results of 
JID.lltiplying two 16-bit registers wa.ild ideally result in a 32-bit 
number. The ALU provides a 26-bit result to the D register by 
rounding the addition of the two 27-bit registers TRO and TRl and 
deleting bit 25. 
Inputs 
t 0 D'IU r t i p I i er 
Mu It i p Ii er 
outputs 
A 
x B 
TFiO 
TF: 1 
15 0 
I I 11 I I I I I I I I I I I I I 
15 0 
I I I I I I I I I I I I I I I I I 
2e o 
I I I I I I I I I I I I I I I I I I I I I I I I I I I 
~ . 0 
I I I I I I I I I I I I I I I I I I I I I I I I I I I 
ALU TRO+TRl 
27 0 
I I I I I I I Ir~~~~ il~gl I I I I I I I I : I ~I 
operations 
ALU resu I ts i~r:Q: : : ; : ; : : : : : : : : : : : : ~~:i: Transfer· to D 
Fi~ 3. Multiplication. 
8 
r The round-off causes an error less than plus or minus 2-24. It 
is necessary to delete bit 25 of the AW result to obtain the 
correct two's canplirrent number. An error results only in the case 
of -2 x -2 where. zero is input into the D register; the overflow 
flag OV is set to show that an overflCM has occurred. A compound 
instruction that involves a multiplication instruction performs: 
1) The C'Ontrol :instruction specified, and 
2) The rrultiplication based ·on the register contents two 
instruction cycles befor~. 
An example is shown belcw. 
Step l, IAB:NOP $01,$02 Data is moved from ARAM to A and BRAM 
to B. 
Step 2 lAB:NOP $02,$03 New data is rroved into A and B. Step 
1 data enters the multiplier circuit. 
Step 3 lAB:MUL $03,$04 New data is moved into A and B. Step 
1 data multiplication is completed. 
In step 4 the register will contain <Astep 1) x (Bstep 1). 
Division operations in the MB8764 are carried out in the ALU 
without the help of a specialized circuit. It requires 17 machine 
··' 
cycles to perform division. All other operations performed .in the 
·AUJ require one machine cycle. 
AI1J operations are fixed po.int with the A and B registers 
having a range from -2 to 1.999938965, and the D register having a 
range of -4 to 3.999999881. Passing data fn::>m the 26-bit D register 
,. to the 16-bit internal bus is done as shown in Figure 4. 
·I 
9 
Figure 4. D Register to Internal Bus Transfer. 
If bit location 24 in the D register is not zero an error of +/- 2 
occurs , arrl the OV flag is set. The CLP flag, when set, minimizes 
the error by forcing data transferred to the internal bus to binary 
0111111111111111 in the case of a positive overflav, and to 
1000000000000000 in the case of a negative overflav. 
Sequence Control Block 
The sequence control block controls the execution of the 
program code for the MB8764. Execution is carried out in a pipeline 
style as shCMn in Figure 5, a timing diagram of a typical 
instruction. In the first Jn3.chine cycle, step one, an instruction 
fran program ROM is placed into the IRO (Instruction Register zen:>). 
During step two preljmjnary operations are perf orned based on the 
instruction in IRO. In step three IRl (Instruction Register one) 
receives the IRO contents, and signals are passed to complete the 
operation based on the IRl contents. When step four begins the 
instruction has o:>mpleted execution, and results are in place. The 
steps just outlined are stepped through by a cotmt of the internal 
clock with interruptions made as necessary for pn::>per program 
execution. 
A 10-bit PC (Program C.Ounter) :register addresses the program . 
ROM thra.lgh the DPR CROM Pointer Register). The program CQll'lter is 
reset and held at zero when a pulse is sent an the hardware reset 
10 
Instruction seQuence 
n-2 LAB:NOP $0,$0 
n-1 LAB:NOP $1 '$1 
n LAB:MLT $2,$2 
n+1 LAB:MSM $3,$3 
Timing diagram of instruction seQuence 
Internal clock 
IROM pointern-2 X n-1 X n X n+l X n+2 X n+J X n+4' >C 
IRO------- X n-2 X n-1 X n X n+l X n+2 X n+3 >C 
I A 1 ------------- X n-2 X n-1 X n X n+ 1 X n+2 >C 
MLT MSM 
A,B register contents X n-2 X n-1 X n X n+l >C 
so,So $1,$1 $2,$2 $3,$3 
TAO, TR1 output from multiplier X n-2 X n-1 X n }[ 
SOxSO $\ x$\ $2,$2 
D register X n X n+l >C 
$Ox$0 Sl x$1 
Figure 5. Timing Diagram. 
11 
pin RST. Program execution begins with the first clock µUse after 
the RSI' pulse is :removed; with each internal clock period the 
program counter is incremented by one, unless interrupted. 
Interruptions to the program counter incr'effienting occur when a 
multicycle instruction is being executed. A cycle counter within 
the sequence control block determines the proper interruption 
length. . Interruptions in the PC also occur when input/output 
operations arB performed during program operations that use the 
external data bus or associated registerso 
Program execution can be rranipulated by changing the PC 
register value. The follcwing instructions ~ used to contn:>l 
program execution through PC register executiono 
1) Jump, and jump on condition instructions replace the PC 
contents with the address of the instruction to jump too 
2) Jump to subroutine instructions load the first a:idress of 
the subroutine into the PC register and save the current PC 
value in the PCS . (Program Counter Stack Register). 
3) Return fran subroutine instructions return the value stored 
in the PCS register to the PC register and increment PC. 
Nesting of subroutines is not possible using the JSR 
instruction, because there is only one stack register. Jump 
mstructions can be used for the same end. PC contents can be saved 
in RAM at the tine of a jump, mc~ted by one and recalled by 
another jump mstruction at the end of the subroutine. 
12 
Two loop counters, Cl (eight bits) and C2 (four bits) are 
located within the sequence control block. They are deCT'erneI1ted by 
one with each pass through the loop and are used with a JOC 
instruction to control prugram execution. 
Program instructions may be obtained from EROM or IRON, with 
the status of the IRM pin determining the selection. A switch 
between internal ROM and external ROM can only be carried out when 
the hardware reset signal is on, thus IROM and EROM carmot be used 
in the same program. IRCM is a 1024-by-24 bits ROM, ERCl'1 is 
expa.rrlable to 4096-by-24 bits with bank switching. IROM is a mask 
ROM programmed to the designer's specification by Fujitsu. 
External ROM is not required to be a mask ROM but can be an EPRCM, 
allowing for field production of a design. 
ROM can be used to store data in any location wtihin the ROM 
except location zero. This location must hold an instruction 
because the PC accesses it after every RST pulse. RCM data is 
limited to 16-bit words because only the 16 least significant digits 
of the 24-bit RCM word ar'e read. The 8 high order bits are set to 
one. The ROM address is specified by a 10-bit input into the DPR 
fran the address calculation block. 
Decoder Block 
Instructions IRO and IRl introduced in the previous section are 
the inputs to the decoder block. . With each increnent of the program 
oounter, na.v data is passed fran these registers to the look ahead 
carry and decode registers respectively. When an instruction code 
13 
is loaded into the look ahead decoder Cl.AD) interpretation of the 
code begins. The nunber of cycles necessary to complete the 
instruction is decoded, .t\LU operations are interpreted , and 
effective addresses a:re calculated. With the next clock pulse the 
instruction rode noves on to the decode (DEC) register. The 
instruction is further decoded and then executed. The time 
necessary to romplete the instruction execution in this step 
determines the number of clock cycles necessary to execute the 
instruction. 
RAM 
Internal RAM (IRPM) is divided into two equal parts of 
128-by-16 bits. ARAM is located in the first 128 addresses, BRAM in 
the last 12 8. These RAM areas can be operated independently of one 
another or as a single unit calle:i IR.AM. External RAM (ERAM) of 
1024-by-16 bits may be accessed from the chip. The ER.AM is 
cxmsidered as either an extension of BRAM or IRAM. .Address 
selection is rrade through the address calculation block, and memory 
data is passed directly to the A register, B register, or IBUS. 
Address Calculation Block 
The many nodes of merrory access pennitted by the control 
conmands are supported by the address calculation block. The two 
independent aI"eaS in M1 are accessed by two independent address 
indexing sections; this architecture can be seen in Figure 1. 
Register X and its stack XS are used for indexing ARAM only. 
Indexing calculut:ions are made in the 7-bit adder ADl, and the 
<,, 
14 
result is passed to the ARAM pointer DPA. Register Y ?Jld its stack 
YS, both 8-bi t registers, are used for indexing BRAM or IRAM. AD2 
is used for indexing calculations, and the result is passed to the 
BRAM-IRPM pointer .DPBo The calculation of ERAM addresses follc:Ms 
that of .the BRAM addresses except for the final result, which is 
passed to the ERAM pointer DPE o Two higher-order bi ts of the ERAM 
addresses are provided by a page register PGM. 
The virtual shift m:xie is an optional indexing mode which may 
be specified at any time within a program. In this node only the 
four lc:M-order bits of the Y register are used in indexing an 
address. In the computation of the effective address no carry is 
made to the fifth bit. This mode provides a 16-bit loop index at a 
desired location in IRAM or ERAM. 
The address calculation block provides the ROM pointer DPR wi~h 
the address of ROM constant data. The table address register TBA, which 
can be indexed by the X register, provides the seven low-order bits 
to DPR. The table page register FBT provides .the three most significant 
·bits to the DPR. 
Input/Output Irtterf ace 
Data being input to or output from the MB8764 passes through 
the chip's input/output interface. The interface allc:Ms the 
selection of three different input modes ·and two different output 
modes. Im input/output controller operating independently of 
internal ·progparn execution regulates the flc:M of data. Eight . 
hardware pins, follr" input and follr" output, . connect the controller 
15 
.with external circuits. Four internal flags also provide an input 
to the controller. Two of these internal flags, the IM1 and AIM, 
determine the input mxle. These flags can be set or cleared by 
program control. The three input modes are: 
1) The program read mode, or P mode, DMM=O, AIM=O, . 
2) The non-address-attached direct memory access (IMA) read 
mode, or D mode, rMM=l, All1=0, 
3) The address-attached IMA read mode, A mode, IMM=l, ADM=l. 
The program read mode allcws data to be read fran an external 
circuit to the El register. There the data rray be manipulated by 
the DSP as needed. The non-direct-attached Il1A read node performs 
the same function as the program read node but in addition 
autanatically transfers the data to the internal RAM address 
indicated by the DMC register. In the address-attache::i IMA read 
mode an address is transferred to the EIA register along with the 
data going to the EI register. The address passed is the address 
used for storing the data in IRAM. 
In all :input modes three pins, the AIF, RCK, and ACT pins, 
control the transfer of information into the MB8764. The Ail' pin 
signals to the controller that the external device is ready to pass 
information to the MB8764. A zero level on the ACT pin signals that 
the MB8764 is ready to aC'Cept information. The RCK pin provides the 
write clock for the information transfer. 
The two output modes are selected by the value that is entered 
into the fifth nost significant bit of the EA register. If the bit 
equals zero, the E mode is selected; if one, the I node is selected. 
16 
The 1 E mode uses an external signal to clock the signal into the 
external circuit. The I mode provides a clock signal fran the WCK 
pin to the external circuit. The output precess is begun with a 
request to output . fr.om the REQ pin. The external circuit prcvides 
its response to the request to send data to the BCT pin. 
Info:rnation transfer is clocked as discussed earlier. Address 
information and/or data can be passed to the external circ.uit. In 
the I mode, the .AOF pin contrcls the type of data sent. In the E 
mode, the ASL pin is used to prcvide the same function. 
I / 
Sl1II1IlfilY 
The Fujitsu MB8764 performs basic arithmetic functions, with 
the exception of division, at a very high rate. Its speed in 
processing arithmetic functions is due to: 
1) An instruction cycle of O~l µsec (with 20 MHz clock), 
2) A parallel pipeline structure with · a multiplier circuit 
separate frcm the ALU, 
3) kl ability to execute compound staterrents. 
Claims to a 0.1 µ se0 nultiplication operation may be misunderstcod. 
Actual,.;t:ime from input of the multiplicands into the A and B 
registers to the result being placed into the D register is 0. 2 µ sec. 
But, due to the pipeline structtwe, multiplication operations can be 
carried out one directly after another giving rise to the 0.1 µsec 
multiply claim. The ALU prcvides a 26-bit result into the D 
17 
register but only 16 bits of this result are easily accessed; thus, 
tmder normal operations, the internal accuracy of the chip is 
limited to 16 bits. 
External expansion capabilities of the ROM allav the user to 
develop his CMn working device without having Fujitsu create an 
internal mask ROM. A limitation when expanding ROM eXternally is 
that the chip is unable to access from internal and external ROM in 
the same program. 
Data transfer within RPM, although adequate, could be made rrore 
flexible by allCNJing MOV instructions to specifically address ARAM 
and BAAM rather than IMM as a whole. 
Input/output operations allow a variety of modes to the user 
and require just a few lines of code to implement, thus they do not 
slow down program execution appreciably. 
DESIGNING A DIGITAL FII..IT'ER 
ON THE MB8764 
IntrDduction 
Digital filters provide advantages over analog filters in sane 
applications. They provide the designer with a rrDre reliable and 
mon= flexible filter, that is reproducible to exact specifications. 
Two characteristics of digital devices limit the implerrentation of 
digital filters, finite processing speed and finite word length. A 
digital device must OJ:>erate on discrete data at a finite rate of 
speed. For adequate perforrrance input data is limited to 
frequencies of less than one-fifth the sample rate of the device. 
Finite word length limits the poles arrl zeroes of the filter to a 
finite number of points. This becomes critical in cases of high 
sample frequency to maximum signal frequency ratios. 
Just as analog filter designers must consider the arrangement 
of discrete components, digital filter designers rrn.ist consider the 
digital filter structure. The structure of a digital filter affects 
its speed of operation, its sensitivity to finite word length, and 
its ease of implementation. A rule of thumb that should be applied 
to all IIR (Infinite Inpulse Response) digital filter structures is 
to implement the filter in sections no greater than serond-oroer. 
This reduces the sensitivity the. device has to errors in the filter 
coefficients. A cascade or parallel co:rril:>ination ·of these second 
order sections is rrost often used by designers. 
18 
'I 
19 
t Nl.Uilerous structures are available to implement second order 
sections. The direct structures are most frequently used because of 
their simplicity and speed. This chapter will show the capabilities 
of the MB8764 to .implement digital filters designed as cascades or 
parallels of direct structured biquadratic sections. The advantages 
and disadvantages of the various designs as .implemented on the 
MB8764 will be discussed. 
Implementing a Biquadratic 
Four direct structures will be analyzed and judged on their 
ability to .implement a biquadratic section on the MB8764. The four 
structures are judged by the following points: 
1) Time delay between input and output, 
2) Length of program, and 
3) Memory spare required. 
Clock rate for the . MB8764 is assumed to be at its maximum, thus one 
instruction cycle equals 0 .1 µ sec. Each strl,lcture' s model, MB87 64 
,memory map, and computation loop program are shown in figures 6,7,8, 
and 9. 
The ID Structure 
The lD direct structure computes the output y(k) in tenns of an 
effective input m(k). Two equations define its operations: 
m(k) = x(k) - b1m(k-1) ~ b2m(k-2) 
~ y(k) = aom(k) + alm(k-1) + a2m(k-2) 
. . 'I 
20 
The program for a ID structure requires 18 machine cycles or 
1.8 µsec to complete one loop. Input to output delay equals 1.0 
µsec. All locations in ARAM are used, with three occupied by active 
data. Seven locations in BRAM are used (see Figure 6). 
2D Structure 
The 2D structure first accepts the' input, then computes output 
using results from the previous cycle. The governing equations are: 
y(k) = aox(k) + p1 (k-l) 
P1(k) = a 1x(k) 
P2(k) = a 2x(k) 
b1y(k) + p2 (k-1) 
b2y(k) 
The program requires 17 machine cycles or 1. 9 µsec for computations 
between inputs. Output occurs 0.8 µsec after input. Two locations 
in ARl\M and seven locations in BRAM are used. ARAM is not cycled 
(see Figure 7). 
3D Structure 
In the 3D structure all possible calculations are performed 
· before the input is received. The governing eq~ation is: 
.,J 
The C'Omputation loop requires 1. 6 µ sec. The delay between input and 
output is 0. 8 µsec. Six locations in ARAM are active and cycled 
through . the whole ARAM. Six locations· lil BRAM are used 
(see Figure 8). 
21 
4D Structure 
The 4D structure is the transpose of the 3D structure. The 
governing equations are: 
r 0(k) = x(k) + r 1 (k-l) 
y(k) = aoro(k) + ql(k-1) 
q1(k) = a 1ro(k) + a 2r 0 (k-l) 
r 1 Ck) = -b1ro(k) - b 2r 0 (k-l) 
This is the slCMTest of the four structures, requiring 2. O µ sec for 
the program loop, and 1.1 µsec fran input to output. Six locations 
are rotated through AFAM, five locations are used in BRAM. 
Structure Ccmparisan Results 
The 3D structure offers the fastest processing time of the four 
structures and shares the shortest input to output delay with the 2D 
structure. The 2D structure uses the least memory locations and 
would be the best choice in applications where the designer does not 
want to cycle through ARAM. In each of the programs four 
instructions are required for input/ output and loop control. These 
four instructions require, as a minimum, six instruction cycles to 
be processed. A loop has been built into the input data 
instructions which causes the program to wait until new input data 
is received. The loop allows the speed at which data is input to 
control th: program sample rate, thus there is no need to control 
sample rate by inserting lines _of code. In the case where program 
length corresponds to the input data rate, the loop may be removed, 
allowing for a 0. 2 I.I sec faster program loop. Rerroving the loop will 
allow the same input to be acted on more than once if timing of the 
L1 
L2 
MODEL: 
•<k) 
22 
MEMORY MAP: 
'L ' -~,. ARAl1~ BRAM 
-· 
T 
) 
Code 
LAB:HOP $n1 'v··, $n1 ·1x···, 
- ~ .•, - ~ .... 
LREi: NOP $ w., f v ., $ n ) ( x ) -'-~ J} _._ I 
LAfi: f1L T $03('r'), $01 (X) 
L RE;: r1sr1 $n4 { •r $rr:;. Of1 
- ) J - ... . ·' 
r1ou: ML T D,$FF 
NOP:MRD 
1..IOC: HOP L2, IF 
nou •$800,EA 
MOU;HOF' EI, A 
MOU: sur1 $00 ( y), A 
MOU:HOP O,B,$FE 
MBA:HOP $00(X),$FE 
MOU:NOP $FF,O 
n~:v: MSM •$7F,•$00 
MOU:HOP D,EO 
JMP:HOP L1 
(1(1 
01 
02 
03 
i \ 
m ( k) \ 
I'll"' 
ao 
i 1 
a~. 
~ 
t11 
b2 
00 
01 
02 
03 
04 
"4.J 7E 
7F r---r--17E 
GIUj?F 
NOTE: f1rr·c1u.1s depict mc•vement c•f var· i at• I e 
des: i gnat i or1 e:aus:ed by i r1dex i ng . 
Cc•mment s 
*a1 inter A; m(k-1) inter B 
*02 inter A; rri(k-2) intc1 fi 
*01 x m(k-l);-b1 to A;m(k-1) to Ei 
*(02 x m(k-2))+(01 x m(k-l));etc 
*et ore re~; rJ ! t o f · I o st i n ~; t r ; et c 
*(-b1 x rn(k-l))-(b2 x m(k-2)) 
*loop here unti I input received 
*~et output ~ode and 5equence 
*input to A 
*m(k) found; oo to A 
*m(k) to 6 and to BRAM 
* "' ( k ) t o A RAM 
*(02 x m(k-2))+(01 x m(k-1)) to 0 
*x=x-l;y=y;y(k) calculated 
*ouput y(k) 
*~tart loop again at L1 
Figure 6. ID Filter Structure Model and Program. 
Block diagram: 
Program: 
Code 
L 1 JOC : MOF' L 1 , IF 
MDU •$800 .. ER 
MOU: NOP EI , $00 
L A Ei : ti (IP $ 0 0 .• $ 0 0 
MOU: tiOP $FF, D 
LAB: MSt1 $00, $0 l 
MOU:NOP $0,E0,$01 
LAB: f1L T $01 , 03 
HOP 
MOU:MRD $FE,A 
LAB:SUM $00,$02 
MOIJ: tiOP D, $FF 
LAB:MLT $01,$04 
ti OP 
NDP:MAD 
MOU:NOP D,SFE 
JMP:tiOP Ll 
23 
Me11or·y aap: 
y<k) ARAM BRAM 
00 x(k) 00 00 
01 y(k) a, 01 
,.. "-' ,.. _, a ... ~ 02 1FT I "' 03 
04 
P2 (k) 7E 
P 1 ( k) 7F 
Comments 
*waits for inpLJt 
*sets output mode and sequence 
*;.:: ( k ) t C• AF: AM 
*x(k) to A, a0 to B 
*p 1(k-1) to D 
*calr y(k);x(k) to A;a 1 to B 
*y(k) to output and ARAM 
* y ( k ) t c1 fi ; b 1 ( k ) t ci Ei ; x ( k ) x a 1 
* 
*p?(k-1) to A 
~ 
*calc P1(k);x(k) to A;a2 to B 
* p 1 ( k ) t o BF: AM 
*y(k) to A;b2 to B;x(k) x 02 
* 
*calc P2(k) 
* P2(k) to BRAM 
*start · loop again at L1 
Figure 7. 2D Structure MOO.el and Program. 
Block Diagram: 
00 
01 
02 
03 
04 
05 
?E 
7F 
24 
Mea.ory Map: 
r·\ 
AF:AW \ 
x (k) 
x(k-1) 
x (k-2) 
~ ( k) 
~(k-1) 
.. , ~ ) ~ \. ~.-l . 
\ 
•, 
\ ) 
,., .J 
BRAM 
a(t 
a, 
a'"> 
"' 
bl 
t•2 
00 
01 
02 
03 
04 
,.,,.J 
NOTE: Arrows depict movement of variable 
de::: i gno +. ion caused b1:1 i nde>=: i ng . 
Program: 
Ll 
L2 
Code 
LAB:NOP $01(X),$01 
LAB:HOP $02(X),$02 
LAB:MLT $01(X),$03 
LRB:MSM $05(X),$04 
LO I : MRD •$0000 
'"I IF : MF:D L2 
MOU •$800,ER 
MOU:NOP El, 6:$FF 
MOU:HOP $80,A 
MXY:HOP •$7F,•$00 
MBA:MSM $0l(X),$FF 
MOU:NOP D,E0:$FF 
MBA:HOP $01(X),$FF 
JMP L1 
Comment~; 
*x(k-1) to A;a1 to B 
*x(k-2) to A·a? to B 
J '" 
*y(k-1) to A;b1 to Bjx(k-1) x al 
*y(k-2) to R;b2 to B;cont calc of y(k) 
*y(k) colc made indep of 1 of L2 loops 
*waits for input;conl calc of y(k) 
*sets output mode and sequence 
*x(k) to B and BRAM 
*ao to A 
*shift x index back 
*x(k) from BRAM to ARAM;calc y(k) 
*y(k) to output and BRAM 
*y{k) from BRAM to ARAM 
*start loop again at L1 
Figure 8. 3D Structure and Program. 
Mode I: 
ro(k) 
Program: 
Code 
L 1 JOC: tiOP L I .• IF 
MOU: •$600,EA 
MOU: MOP E I , A 
MOU:NOP $FF,B 
NOP:ADD 
MOU: tiOP [I, fi, $FE 
MOU: tiOP $80, A 
MOU:tiOP $FD,O 
MBA:MSM $00(X),$FE 
MOU:tiOP D,EO 
LAB:MSM $00(X),$03 
LAB:NOP $01(X),$01 
LAB:MLT SOO(X),$01 
LAB:MRD $01(X),$02 
MOU:MLT D,$FF 
nXY:nsn •$FF,•$OO 
MOU:tiOP D,SFD 
JMP:HOP L1 
Figun: 9. 
25 
Me11ory •ap: 
ARAr1 BRAM 
00 ro ( k) 
01 r·o ( k-1) 
"_, 
Coninrent ~; 
*wait for input 
Q 1 ( k) 
ro ( k) 
r 1(k) 
" 
7[1 
7E 
7F 
*sets output mode and sequence 
*x(k) to A 
* r 1 ( k - 1 ) t c1 fi 
*colc ro(k) 
*ro(k) to B and BRAM 
*ao to A 
*q 1 ( k-1 ) to D 
*colc y{k) 
*output y(k) 
*ro(k) to A;-b1 to B 
*ro(k-1) to A; b2 to B 
*ro{k) to A;a1 to 6;-b1 x ro(k) 
*ro(k-1) to A; 02 to B:calc r1(k) 
*r1{k) to BRAM;o1 x ro(k) 
*calc Q1(k);shlft x Index back 
*ql (k) to BP.AM 
*start loop again at L1 
4D Structure Model and Program. 
26 
external input is slower than the program loop execution rate. The 
use of an input loop cannot prevent the loss of some data samples in 
cases where the device sample rate is slower than the input data 
send rate. 
Programming a Multiple Biquadratic Section Filter 
The 3D structure was f01.md to perform best on the MB8764. A 
canparison will now be made between the Nth order filter programmed 
as a cascade and a filter programme::l as a parallel combination of 3D 
biquadratic sections. The comparison will determine which is rrost 
suitable for the MB8764, considering advantages and disadvantages to 
each approach. Each program will be judged on the following 
aspects: 
1) Time delay between input and output, 
2) length of program, and 
3) Memory space required. 
Of course there are countless ways to program the MB8764 to 
have it accanplish the calculations necessary. The progr'ams shCMn 
are written for maximum speed in the computation loop. As in the 3D 
biquadratic section pn::>gramned previously, variables and initial 
conditions are assumed to be stored in the IRAM when the loop 
begins. Initialization is accomplished outside the loop and thus 
not shCMn. It is necessary to set initial oonditians even ~ they 
~ zero, because the MB8764 does not set all registers to zero when 
pc:Mered up. There is no instruction which clears· all the merrory 
locations; therefore they must be accessed one location at a time. 
27 
Cascade Realization 
The cascade model offers a method of splitting a large-
order filter into small sections, thus reducing the filter's 
sensitivity to coefficient quantization. Figure 10 shows 
the model for an Nth order cascade of 3D structure biquad-
ratic sections. To achieve the best results with the 
cascade model on a fixed-point processor such as the MB8764 
the- designer must: 
1) Balance the DC gain of the sections (this may 
be accomplished by proper pole zero pairing), 
2) Scale each section individually to prevent over-
flow within the section, and 
3) Arrange the sections in the order which 
minimizes the output noise. 
From a designer's viewpoint cascade realization can be 
difficult to implement, because pole zero pairing and section 
ordering are intricate steps. 
Figure 11 shows the program and Figure 12 shows the 
memory map used in implementing a cascade of 3D sections. 
The number of lines of code necessary to input initial 
code conditions and variables in the worst case is 16n+6, 
where n equals the number of biquadratic sections in the 
filter. Worst case implies that no variables or initial 
conditio~s share the same values. An additional two or 
three lines are needed for initialization. The computation 
28 
loop for the program requires lOn+ll machine cycles to 
complete. The delay from time of input to time of output 
is 4n+5 machine cycles. Data is shifted through the whole 
ARAM, with 3n+3 locations being occupied with active data 
at any one time. BRAM data is stationary, and Sn+l 
locations are used. 
Parallel Realization 
The parallel model is easier to design than the cascade 
model. Each section in the parallel model acts on the 
input x(k) and provides output to a summing junction (see 
Figure 13). Therefore, there is no concern about ordering 
of sections and no reason to pole zero pair each section. 
The steps necessary to implement the parallel form for a 
Nth order binomial are: 
1) Perform a partial fraction expansion of the Nth 
order binomial and group the resulting terms into 
biquadratic sections, 
2) Individually scale each section to prevent overflow 
within the section. 
A program implementing a parallel of 3D biquadratic 
sections is shown in Figure l~. The program's memory map is 
shown in Figure 15. Initial conditions and variable input 
into !RAM require 14n+l3 lines of code to enter for the 
worst case. The computation loop for the program requires 
7n+ll machine cycles to complete. The delay from input 
29 
instruction to output instruction, independent of the 
filters order n, is equal to seven machine cycles. The 
ARAM is used just as in the cascade program. BRAM data 
is stationary and uses 4n+5 locations. 
Comparing the Cascade and Parallel Programs 
In ease of design and in theoretical performance, the 
parallel model is superior to the cascade model, offering 
better signal-to-noise ratio (4) and fewer steps in the 
design process (5). But the section of the tyne of filter 
to be used is usually based on the performance of the 
program that is implementin~ it. The program implementing 
a parallel of 3D sections ran faster on the MB8764 for any 
number of sections. (For a single section, or s~cond-
order filter the cascade and parallel design are identical.) 
The performance of the oarallel example is due to the fact 
that it has n-1 less multiplications to perform than the 
cascade and that it ·can perform its summations very 
efficiently. The fact that all calculations except one 
multiplication and one addition are performed prior to 
input of x(k) make its input to output delay very short. 
Figure 16 is a graph of the implementation time of an Nth 
order parallel and cascade of 3D sections. 
Both programs are implemented with a minimum of loo~s. 
As a consequence much longer programs will be stored in the 
instruction ROM. The advantage for minimizing loops is 
30 
x(k) y ( k) 
_n_ 
input output 
Figure 10. Cascade of 3D Sections. 
31 
Code: Com1J1ent: 
L1 LAB:HOP $01(X),$01 
LAB:NOP $02(X),$02 
LAB:MLT $04(X),$03 
LAB:MSM $05(X),$01 
LAB:MRD $01(X),$06 
LAB:MRO $05(X),$07 
MOU:MLT D,$FF 
MBA:MSM $03(X),$FF 
*-------------1 
*pre I iniinary 
*calculation 
*of 
*y1(k) 
I 
1----.--------
*-------------! begin pre I iminary 
*precolc Y1(k)I calculation of y2 (k) 
*to 03(:x:) I 
L2 
* 
lt: 
*This section is repeated from 
LAB:HOP $[3i+1](X),$[5i-2] *i=2 to n-1, where i represents the 
LAB:NOP $[3i+2](X),$[5i-1] *biquadratic section being coded, and 
LAB:MRD $[3i+1](X),$[5i+1] *n is the total number of biquadrotic 
LAB:MRD $[3i+2](X),$[5i+2] *sections. In thi' ~ection Yi(k) 
MOU:MLT D,$FF *pre I iminory i~ calculated and the 
M6A:MSM $[3i+3](X),$FF *calculations for Yi+l(k) are begun. 
* 
•: 
•: _____________ _ 
LAB:NOP $[3N+1](X),$[5n-2] * 
LAB:HOP $[3H+2](X),$[5n-1] *Calculations for prel iminory Yn(k) 
HOP:MRD 
NOP:MfiD 
MOU:NOP 0,$FF 
MAB:HOP $[3H](X),$FF 
"I IF : HOP L2 
MOU •$800,EA 
*con1p I et ed 
* 
* 
*--------------------------~ 
*u.aa it for input 
*set output 1ode and sequence . 
Figure 11. Cascade of 3D Sections Program. 
MOU:NOP Et,A:$FF 
MOU:NOP $80,6 
LAB:HOP $03(X),$FF 
MOU:MLT $85,B 
NOP:SUM 
MOU:NOP D,$FF:A 
LAB:HOP $[3i](X),$FF 
r10 U : ML T $ [ 5 i ], B 
MBA:SUM $[3(n-1)](X),$FF 
MOU:NOP D,$FF,A 
LRB:NOP $[3n](X),$FF 
MOU: r-ll T $[Sn], B 
MBA:SUM $[3(n-1)](X),$FF 
MOU:HOP O,$FF:EO 
M6A:NOP $[3n](X),$FF 
MXY:NOP $FF,$00 
JMP:HOP Ll 
32 
*x(k) to A and BRAM 
*begin calculation of Y1(k) 
*continue calculations for Y1(k) 
*begin setup for Y2(k) calculations 
* 
* Y1(k) calculation is completed 
~: 
* ---------------
* 
*This section is repeated for 
~: i = 2 t o n- 1 . I t ca I e: u I at e e. t he C• '-' t put 
*of each section and places the 
*out~ut of the preuio~s section into 
*the proper location in ARAM. 
* 
*---------------
*final calculations to compute 
*filter output y(k) 
Jt: 
*output y(k) 
* 
*shift X index back 1 
*Ju~p back to beginning 
Figure 11. Continued. · 
33 
RAAM BRAM 
00.,._ ____ ____,. 00 
01 ...._ ___ ___. 01 
0 2 .,__ _____ .._. 02 
0 3 ..._____. __ ___. 03 
01 .,__ _______ ___. 01 
05 .,__ ____ .._. 05 
06...._ _____ ___. 06 
07 .,__ ___ ___. 07 
08 t-------~ OB 
09 
• 
• 
• 
Sn-5 
Sn-1 
Sn-3· 
Sn-2 
Sn-1 
F C .,._ ____ __, 
FD .,__ ____ ---4 
FE ...__ __ ---4 
FF TEMP STO NOTE: Arrows show movement of 
variable designation caused 
by indexing. 
Figure 12. Nth Order Cascade Filter Memory Map. 
x(k) 
input 
• 
• 
• 
• 
• 
• 
30 
n 
34 
NOTE: Onl~ delayed values of x(k) are used 
in the 30 sections to calculate ~Ck> 
Figure 13. Parallel of 3D Sections. 
y(k) 
output 
Code: 
35 
Cow.-.ent s: 
*-------------1 
* I 
*ca I cu I at i or. . I 
*of 
*y1(k) !---~--------
*-------------! begin 
* Y1(k) calculation of u .... ( k) 
... .!_ 
Ll LAB:HOP $0l(X),$01 
LAB:NOP $02(X),$02 
LAB:MLT $04(X)J$03 
LAB:MSM $05(X),$04 
LAB:MRD $01(X),$05 
LA5:MRD $02(X),$06 
MOU:MLT 0,$FF 
MBR:MSM $03(X),$FF *to O~i ( x ) __ __,.;.._.. ________ _ 
* 
* 
•: 
*This section is repeated from 
LAB:HOP $[3i+1](X),$[1i-1] *i=2 to n-1, where i represents the 
LAB:NOP $[3i+2](X),$[1i] *biquodratic section being coded, and 
LAB:MRD $01(X),$[4i+1] *n is the total number of biquadratic 
LAB:MRD $02(X),$[4i+2] *~ectione. In this ~ection Yi(k) 
MOU:MLT D,$FF *is calculated and the calculations 
MBA:MSM $[3i](X),$FF *for Yi+l(k) are begun. 
* 
* 
LA 6 : N 0 P $ [ 3 ti+ l ]( X ) , $ [ 1 n- 1 ] * 
LAB:HOP $[3N+2](X),$[4n] *Calculations for Yn(k) 
HOP:MRD *completed 
HOP:MRO * 
MOU:NOP D,$FF * 
Figure 14. Parallel of 3D Sections Program. 
MA6:NOP $[3n](X),$FF 
LAB:NOP $03(X),$00 
LAB:SUM $06(X),$00 
* 
* 
* 
Jt: 
* 
* 
~: 
36 
~um 
al I 
~ ( k) I~; 
"' 
Only the A register 
information is used 
in the summation. 
LAB:SUM $3[n-1](X),$00 * 
NOP: sur1 
L 2 ""I I F : H 0 P L 2 
MOU •$ti00, EA 
MOU: tWF' EI , A: $FF 
MBA:NOP $00(X),$FF 
MXY:MSM •$7F,•$00 
MOU:NOP D,EOJ$FF 
... H1P: HOP L 1 
*I oe:t yJk).__.~•--'-J ....... m __ m~ __ :.d _____ _ 
,.. w a i t f c• r i rap u t 
*~et output mode and sequence 
*x(k) to A and BRAM 
*x(k) to ARAr1 
*calc Iy(k)'s + x(k) x Do 
:+:c11.Jt put ~ ( k) 
*ju~p back to beginning 
Figure 1~. Continued. 
37 
RRAM BRAM 
00 00 
01 01 
02 02 
03 03 
01 01 
05 05 
06 06 
07 07 
06 08 
• 
• 
• 
3n 1n+1 
1n+2 
1n+3 
1n+1 
I FC 
J FF FD FE 
·v FF TEMP STO 
NOTE: Arrows show movement of 
variable designation caused 
by indexing. 
Figure 15. Nth Order Parallel Filter Memory Map. 
38 
Maxi•um sample frequency in KHZ 
1000 200 10.0 
20~---------------------r---------------~,,__ ______ ___ 
16 
16 
14 
L 12 
., 
"O 
L 
0 
L 10 
., 
~ 
LL. 6 
6 
4 
2 
1 2 3 1 5 7 8 9 10 
Minimum sample period in µsec 
Figure· 16. Nth Order ·Filter Implementation Time 
Graph for Parallel and Cascade Filters · on the HB876~. 
39 
sneed in nrogram execution. For example ~n Nth order 
parallel filter with loops requires 6(n-1) additional mac-
hine cycles to execute and saves 7n-29 locations in the 
instruction ROM Cn=number of 3D sections). 
~B8764 Capabilities in Imnlementing 
~ultinle Filter Programs 
Designing a program to imnlement more than one filter 
with multi?le inputs and outputs is easily accomplished on 
the MB8764. Due to limited amount of memory available, 
restrictions are placed on the number of and cornnlexity of 
filters to be nro~rammed to~ether. Restrictions are ~ls~ 
nlaced on the sa~nle rate of the filters which must be 
inte~er multinles of one another. Table 1 shows the 
capability of the M38764 to imnlement multiple filter oro-
grams of 3D sections olaced in oarallel. The MBB764 
has only one input/output port which must be time-shared 
in a m~ltinle filter orogra~. To acco~plish this ti~e-
s~aring, the inputs must be synchronized to occur in a 
s~ecific order. 
If all filters are of the same frequency, then 
pro~ra~~ing multiple filters in · the one nrogram is accom-
plished in three steps. 
1) Arrange the calculation loons for the programs 
you wish to implement into a single list: 
2) Remove the jump statement from the bottom of each 
program except the last and have that jump 
3) 
40 
statement return to the top of the first 
nrogram. This makes the list of calculation 
loons into a sin~le loop~ 
/ Change the addressing within each program to 
point to the section of ARA~ and ERA~ in which 
initial conditions and variables are located. 
Setting initial conditions and variables into !RAM is 
accomplished for all filters before the calculation loon 
is begun. Filters with sample rates that are integer 
multinles of one another are implemented as in the steps 
listed above with the addition of a step to install 
counters and juMn instructions to control program flow. 
Chanter Su~~ary 
In this chapter the 3~ biquadratic structure was found 
to have the fastest calculation loop of the four direct 
structures. The narallel imnle~entation and cascade 
inplementation of an Nth filter of 3D structures were 
compared. The parallel structure was shown to be 
superior in performance (see Table 2). From these nrogram-
min~ examples it can be seen that the MB8764 nerforms 
mathematical functions very efficiently but this efficiencv 
is reduced considerably when results must be moved out of 
the D re~ister to ARAM or when looping is used. For best 
performance in speed, programs written for the MB8764 
should use a minimum of transfer instructions and should 
avoid looping. 
41 
TABLE 1 
CAPABILITY OF THE MB8761 IN PEAFORMMIHG 
MULTIPLE FILTER PROGRAMS 
f i I ter max 1 of max sa11p re approx memory use 
or·der filters fr·equency 
... IR0f1 ARRM 
2 16 39.06 KHZ 736 96 
4 10 10.00 KHZ 670 90 
6 8 39.06 KHZ 704 96 
6 6 42.71 KHZ 654 90 
10 5 43.18 KHZ 650 90 
12 4 47. 17 KHZ 6(14 8~ 
14 4 41. 67 KHZ 668 96 
TABLE 2 
COMPARISON BETWEEH A PARALLEL AHO CASCADE FILTER 
IMPLEMENTATION OF 30 BIQUAORRTIC 
SECTIONS OH THE MB8764 
Feat ur·e Para Ile I Cascade 
BR Rn 
128 
120 
1 _., C• 
Ll..I 
120 
120 
112 
128 
min sample period ( 11+7ra). 1 µs ( 11+1 On) . 1 µs 
i r1put to output ~elay .7µs (4n+5)>1µs 
I F:Dr1 I ocat i onE• used 25+2tn 17+26n 
AF:AM I ocat ions used 3n+3 3n+3 
BRAM I ocat ions used 4n+4 Sn 
note: f Of" . 1 µs moc:h i ne cycle, n= 1 of 3[1 sect.ion~ 
42 
Scalini is necessary in the design of a digital filter 
to nrevent overflow within fixed-point machines such as the 
MB8764. The design of the ~B8764 also helps to prevent 
overflow during intermediate calculations in the arithmetic 
and logic block. Internal ALU operations and the D 
register provide twice the dynamic range of the ALU inout 
registers A and B. Thus the result of an intermediate 
operation which overflows in a 16-bit register of the 
MB8764 can remain valid in the D register~ allowing sub-
sequent operations without overflow. If an overflow should 
occur, the MB8764 can minimize the error throu~h the use 
of the CLP flag. 
-THE MB8764 nEVELOPMENT SYSTEM 
Introduction 
Once a program has been designed for a digital device 
it is important that it be fully tested. This especially 
true for the MB8764 program that is to be in?ut into the 
internal mask ROM of the chip, as there is no adjustment 
?ossible once the mask is nroduced. Any mistakes in the 
mask ROM design must be accepted or the design must be 
corrected and a new chip produced. Fujitsu ~98764 pro ~~a~s 
can be tested on the ~!B8764 itself with the use of the 
MB87902 software development tool kit. The tool kit 
supplies a 16 MHz clock to the MB8764 giving it a machine 
cycle of 0.125 µsec or 25% slower than the minimum specified 
MB8764 machine cycle of 0.1 µsec. The slower clock rate 
is required for the MB8764 to make data transfer between 
external RA~ and the chip. 
This chapter first gives a brief descrintion of the 
develooment system for the MB8764 and then follows through 
the testing of a fourth-order Butterworth filter program. 
The information on the MB8764 development system found in 
this chapter is derived from references (6) and (7) and 
from experiences the author had when using the development 
system. 
43 
-44 
Descrintion of the MB8764 Development System 
The development system for the MB8764 can be divided 
into two primary parts, a Fujitsu FM-16S microprocessor and 
the Fujitsu FDSP KIT-8764 evaluation board. The micro-
processor is a standard Fujitsu model equipped with the 
following hardware: 
1) 10 mega-bit internal drive, 
2) One Slr;" floppy disk, 
3) A CP/M86 board and expansion RAM card, 
4) CRT and printer. 
Software nrovided includes: 1) Wordstar, a word processing 
pro~ram used to create code and data files~ 2) the MB8764 
assembler (ASM64) which assembles the wordstar code files 
into the RO~ executable code; and 3) the MON64 program which 
is actually two ?rograms used to control the FDS? KIT-8764 
evaluation board. 
The FDSP KIT-8764 evaluation board is nrimarily a 
standard MB8764 with support hardware to interface it with 
the Fujitsu FM-16S microprocessor. It also ~rovides the 
designer with three sockets for EPROM programming and testin~. 
The SU?Port hardware includes: · 
1) A 1024-word instruction RAM, accessed by the MB8764 
through the MB8764's external instruction port, 
2) A 1024-word expansion RAM, which operates as a 
standard MB8764 expansion RAM, 
• I 
I 
I/ 
45 
3) Two 512-word data RAMs, one for storing data to be 
inout into the MB8764 and the other for storing the 
MB8764 output data, 
4) An analog interface, which provides 12-bit ADC 
and DAC for analog input/output, and 
5) An interface circuit, to enable the FM-16S 
microprocessor to control the board. 
With the development system, a designer may choose the 
MB8764 input to be an analog signal a digital signal from 
data RAM, or a digital signal from a user supplied device. 
The same choices apply to the MB8764 output. If the output 
is directed to data RAM then 512 words of output data may 
be accessed and viewed on the CRT. Program execution can 
be stopped by the microprocessor at almost any point in the 
program. While paused the D, A, X, Y and CO registers can 
be viewed as well as any addresses in the instruction RAM, 
internal RAM, or external RAM. Any ·of the addresses or 
registers that can be viewed may also be changed to 
another value. If instruction code is altered, the new 
program can be loaded back from · the instruction RAM to a 
disk file in the microprocessor. When a program passes 
all tests, an EPROM is made or a floppy disk created with 
the tested program on it. Fujitsu will use this EPROM or 
floppy disk to create a custom MB8764 chin with an IROM 
loaded with the program sent. If a mask IROM is nqt 
' 
46 
required EPROMs can be manufactured by the development 
system and used as external !ROM for the MB8764. 
Testing a Program 
A fourth-order low pass Butterworth digital filter was 
designed with the following specifications: 
1) Cutoff frequency - 50 KHZ 
2) Max loss in passband - 3 DB 
3) Sample frequency - 250 KHZ 
Conversion from analog to digital was made via the bilinear 
transform. The filter was implemented as a cascade of t~o 
biquadratic sections. The figure below shows the model and 
· .... (,. ·., r----i ,r':h--. 
' '''~11:· ~  ~··-'r, 
i I 1-\ 
! I I ·, 11 / I I •, ~ I / ,1 \ \ 
•' I L I 
' I I \ 
/ I I \ 
j 1:1 1 t' ; \ t 1 1 
L...!,_J ,• ' l \ 
,1 \ 
I 
. 
r::;:-i___,.~---.. _ __....·...: !" t. .. ; 
~·~'r; 
l I -\ ~ 
:-'·· -1! .... i' \ \ 1 .. - ,, 
- .. I I 11 ,~ I 
/ 1' \ \ 1.--J 
/II I \~j I / 1, b • ' 
I I ' I I . 
I I 
,i 1, '":'. 11 
~ I I I ~ I 
I l ·~I: r::...rc;::r b ·~· . Ll-1 .;.. 
- 1 - ~· - 1 . - - . 
• -. -. r ' .-. . · -, .-. ' .., ·· 1 .-. ,.., Q ' .., - ··1 r '"• .-, 1' r ,-1;:. r ')-; · : ""• c- .· · ' ·1 : 
.-. 
H
,. ., , ! (I .. )~• I., ,,:1t1 I 0 )'- +I,, Ct.) •. }'- ,., _... , L ... 1.J.::_+ , ·..,.IL 1- 1 ·- 1 ~ •• •.• 0:: • 1 ·- 1 ~ • '-
l '7 ; =· ')r.' • -, 
. - . 1 ~· . · '·.  ... 
- - ' ,· ~ ,., ,. ' .., - I ' 4 . ... .-. . ..., - '-~ - ( ' 3 2 9 (i > 2 + ( ' 0 6 4 6 ) z 1 -: ~- . 4 .... ,,:. ,. } '- + '·· . r:1 t 1 J ..' '-
3u~terworth Filter Model ~~ d :~~a~~on. 
The step response and the frequency resPonse of the model 
was calculated on an HP 85 computer, and it verified the 
model to be valid. The calculated step resnonse data was 
47 
saved to compare to the output data from the real time 
execution of the model on the MB8764. 
As assembly level program was written for the filter 
and edited in wordstar on the Fujitsu FM-16S microcomputer. 
The file created by wordstar was checked for errors and 
assembled when all errors had been corrected. Error codes 
from the assembler were adequate but documentation of 
program format requirements were inadequate with many 
errors solved by trial and error. The assembler permits 
some use of address .labels and variables in the assembly 
level program and converts them to proper values before 
converting the program to machine code. Along with a 
statement list the assembler provides the designer with a 
dictionary list and a symbol list. These nrovide documen-
tation on the variables and labels used in the assembly 
level program. 
Machine code, created upon assembly of the filter 
program without error, was stored in a .DEB file. The .DEB 
file was loaded into the instruction RA.M of the FDSP KIT-8764 
usin~ the DEBGl ?rogram. The DEBGl ?rogram can also be used 
to read and write programs between EPROM and instruction RAM 
or from instruction RAM back to the FM-16S microprocessor. 
With program instructions loaded into the instruction 
file program DEBGl was exited, and the DEBG2 program loaded. 
The filter program in the instruction RAM was now able to 
run on the MB8764 under the control of the DEBG2 program. 
48 
Because the program called for the step response of the 
filter, no input was generated within the program. The 
following functions were accomplished through the use of 
DEBG2 comJ!lands. Out?ut was specified to be placed into the 
output data RAM. Program execution was begun and then 
paused to check output data RAM contents, register contents 
and IRA.M contents. Corrections were made to program code 
until output data results were correct, and the program was 
operating properly. A special note is made that attempts 
to store data in address FF of !RAM were not successful, 
however, when storage was changed to register FC the 
program ran correctly. A listing of the filter program 
executed is shown in Figure 18. Figure 19 is a comparison 
of calculated sten response and MB8764 program step 
resnonse. 
Sum.~ary 
The ~B8764 digital signal nrocessing chip is well 
supported by the MB8764 Support Tool development system. 
Its ability to run programs at 80% of the maximum internal 
clock rate of the MB8764 and to use the MB8764 chip instead 
of a software simulation of the chip gives the designer a 
chance to evaluate program results in real time. Documenta-
tion of assembly language formatting requirements is 
inadequate. Including formatting examples would greatly 
. improve the documentation. 
PRG 
ORG 
CLR 
BUFIL 
CREECH,$10 
X:V:O 
LD 1 : tiOP •$0EiC5 
MOU:HDP A,$80 
LOI :NOP •$178A 
MOU:NOP A_,$81 
LD 1 : HOP •$OfiC5 
MOU: NOP A, $Ei2 
LOI :NOP •$150E 
r10U: NOP A.• $83 
LDI :HOP •$0422 
MOLi: HOP A,$Ei~ 
LOI :MOP •$1035 
MOU:NOP R,$85 
LD I : tiOP •$206Ei 
MOU:NOP A,$86 
LOI :NOP •$1035 
MOU:NOP A,$67 
LO I: NOP •$1001 
MOU: HOP A, $8ci 
LOI :NOP •$1007 
MOU:HOP A,$89 
LO I : NOP •$0000 
MOU:HOP A,$01 
·MOU: NOP A, $02 
MOU:HOP A,$04 
~9 
*required by a~~e~bler 
*assembler required sets code location 
*c I ears X ,..Y., Z r·eg ________ _ 
* 
* 
Thi E; see:t ion 
* 
equot ion coeff i~ients. 
* 
*NOTE:FORMRT REQUIREMENTS ARE STRICT 
*R SPACE AFTER A COMMA OR A COLON 
*CAUSES ASSEMBLY ERROR. 
* 
Jt: 
•: 
* 
* 
* 
* 
Thi~ ~ection 
* 
* 
sets in it i o I 
* 
Figure 18. Butterworth Filter Program Ready for Assembly. 
MOU:HOP R,$05 
MOU:NOP A,$07 
MOIJ:NOP A,$08 
L 1 LAB: HOP $01 (X), $01 
LAB:NOP $02(X),$02 
LAB:MLT $01(X),$03 
LAB:MSM $05(X),$01 
LAB:MSM $04(X),$06 
LRB:MRD $05(X),$07 
f10U: ML T D, $F C 
LRB:MSM $07(X),$08 
LAB:NOP $0B(X),$09 
NOP: MSf1 
HOF': f1RD 
MOU:NOP D,$FE 
MOU •$Ei00 .. EA 
LDI :NOP •$4000 
MOU:HOP A,$FD 
MBA:NOP $00(X),$FD 
LRB:NOP $00(X),$00 
MOU:NOP $FC,D 
HOP:MSM 
MOU:NOP D,$FC 
MBA:NOP $03(X),$FC 
LAB:HOP $03(X),$05 
MOU:HOP $FE,D 
MXY:MSM •$7F,•$00 
50 
* condition~ to zero. 
* 
*calculation loop begins 
* c:alc:ulote 
* 
>t: 
pre f i m i nar~ 
y 1 { k) 1---------------
lcolc:ulate 
* I pre I i mi nor~ 
*pre I i m y 1 ( k ) t o B Fi A f1 I y 2 ( k ) 
*---------------------! 
* I 
* 
1----------------
Jt: Se. t C• lJ t p U t 111 C• d e. 
*simulate~ receipt of on input of 1 
* 
>t: 
*x(k) to A and ao to B 
* 
* x(k) x oo + {pre I irninary Yt(k)) 
* 
* 
Jt: 
*y2(k) calculated X index shifted back 1 
Figure 18. Continued. 
MOU:HOP O,EO:$FE 
MBA:NOP $07(X),$FE 
HOP 
HOP 
HOP 
NOP 
JMP:NOP Ll 
EHD 
51 
*y2(k) output 
*~?(k) stored in 06(x) of ARAM 
~~ 
*NOTE:THIS PROGRAM HAS EXTRA LINES OF 
*CODE IN IT TO GIUE IT A SAMPLE RATE 
*OF 250 KHZ. 
* 
*returns to start of program loop 
*required by assembler 
Figure 18. Continued. 
1 . 2 
1. 0 
O.B 
0.6 
y(k) 
0.1 
0.2 
k=O 5 
k 
10 15 
colc y(k) 
52 
Note: Calculated y(k) and 
MBB761 resultant · y(k) plot 
atop each other 
20 25 30 35 k= 10 
MB8761 y(k) 
Figure 19. Impulse Response of Butterworth Filter. 
COMPUTING THE DISCRETE FOURIER TRANSFORM 
ON THE MB8764 
Introduction 
The discrete Fourier transform CDFT) can be repre~ 
sented by the equation: 
N-1 
X ( k) = ~ x ( n) Ur/·" , k =O, 1 , 2, ... N-1 
n=O 
The DFT can be co~puted directly from the equation atov~ 
or can be computed using the fast Fourier transform CFFT) 
algorithm. Implementing a DFT with an FFT algorithm greatly 
reduces calculations necessary to perform the DFT. This 
reduction, from approximately N2 complex multiplication and 
adds~ to Niog 2~ complex multiplications and adds, enables a 
computer to perform the transform in much less time.- The 
MB8764 which offers a 0.1 µsec rnu~tiply and add is a good 
candidate for perfor~ing real time DFTs. This chapter will 
· briefly discuss how the MB8764 can be used to perform the 
DFT directly and via the FFT algorithm. 
,.> 
Imolementin~ the DFT 
A program which performs the primary computation loop 
of a 64-point DFT of complex inputs is shown in Figure 20. 
Inputs are assumed to be stored in BRAM. The first loop for 
k=O in . the DFT equation is a just summation of the· ~omplex 
~nputs because the transform coefficients equal one. The 
53 
'I 
54 
remaining loops use complex coefficients which are stored in 
table ROM. ' The program can be expanded to perform up to a 
512-point DFT but requires input data to be stored in ERAM 
. and additional lines of code to page through the table ROM 
and RAM. The limit of 512 complex points is set by the 
ERAM expansion .limit of 1024 words. Paging of the ROM 
is a very complex operation because of the order in which 
the transform coefficients are accessed in the DFT equation. 
Performing the FFT 
The FFT algorithm is developed from the DFT by 
decomposing the DFT of N samples into N/2 DFTs of two 
samples each. In the process of decomposition, the symmetry 
and the periodicity of the DFT is taken advantage of in 
order to reduce the number of calculations necessary to 
compute the DFT. The required calculations are sometimes 
referred to as butterfly computatio~s. · The equations that 
must be implemented by each butterfly are: 
Xm+l(p) = Xm(p) + cwrN)(Xm(q)) 
.. JXm+ 1 ( p) = Xm ( p) ( wr N) ( Xm.< q) ) 
Where r i .s determined by the location of the butterfly and 
wr = e-j(2 11 /N}r = cos(21f./N)r - jsin(2n/N)r .. Given the 
N 
number of sample points N, values . for cos(2'11/N)r and 
sin(2n/N)r r=O to N/2 can be solved for a stored in ROM . 
as a table for use by the program (see reference 8). 
55 
A program to imple~ent the FFT algorithm would consist 
of the following sections: 
1) ryata innut. Data is in~ut into the MB8764 after 
being reverse bit shuffled. 
2) Calculation. Calculating the results would 
require calucation of (~/2)xlog 2N butterflies · a 
routine for the calculation of a butterfly is 
shown in Figure 21. Twenty-six machine cycles 
are necessary to execute the butterfly routine. 
Additional machine cycles are required for loon 
commands and indexing. The total number of mac-
hine cycles for the calculation of a 64- or 
128-noint FFT is approximately 30 x(M/2)xlog2~. 
3) Data output. The inplace FFT algorithm would 
nrovide results to the same re~isters as the inputs 
were received. Outnut in A+iB form would require 
no additio~al cycles because it can be nerformed 
in the calculation loop. If output is desired in 
another form additional ~rogram steps may be 
required. 
Paging is not necessary if the tiumber of registers in table 
ROM is less than 128. - Thus for more than a 128-point FFT 
the designer must devise a method to perform the table ROM 
paging. The 1024 word limit on ERAM expansion allows the 
MB8764 to compute u~ to a 512-point FFT. A 64-noint FFT can 
56 
be computed with no need for external expansion. For more 
than 64 noints external expansion is required. 
Summary 
The MB8764 will· perform both the DFT and the FFT 
algorithm very efficiently for 64 noints and requires no 
external expansion. The DFT is not easily expanded up to 
the 512 points because it accesses the Table RO~ in a 
complex manner. For the 512-point DFT external expansion of 
~OM to 2048 words is required. The FFT may also be 
expanded to 512 points and requires no external ROM, but 
will require some additional programming steps to provide 
RAM and ~0:!-1 paging. With a 0.1 µsec instruction cycle 
a 64-point DFT can be performed in less than 9.0 msec 
and a 64-noint FFT can be computed in less than 
600 vsec. 
57 
-----THIS CALCULATES FOR THE SECOHD THROUGH H-1 LOOP---
-----IHITIALIZE---- $03s1,~et PGT, $00•0,$01•0 
MOU:HOP •$3F,CO *loop counter fork initialized 
L2 MOU:NOP C0,$01 *k=l to N-1 loop;CO(k) saved 
MOU:NOP •$40,CO *loop counter for n initialized 
L 1 L T Ei : H 0 P $ 0 0 ( X ) , $ 0 0 ( 'r' ) * n = 0 t o H- 1 I o op 
MOU:HOP $00,0 
LTB:MSM $01(X),$00(Y) 
MOU: NOP $00 ( 'r') .• Ei 
NOP:MRD 
MOU:MLT D,$00 
LTB:NOP $00(X),$01(Y) 
MOU:HOP $01,A 
HOP: r1sn 
HOP: SUT1 
MOU:NOP D,$01 
MOU:HOP X,A 
r10l.J: NOP $03, fi 
MXY:ADO •$00,~$01 
MOU:NOP D,X 
JCO:HOP Ll 
MOU:HOP $00,EO 
CLH:NOP Y 
LO I : NOP •$0002 
MOU:ADD $04,CO 
MOU:HOP D,$03 
MOU:NOP $03,X 
MOLJ:NOP $01,EO 
LD I : HOP •$0000 
MOU:HOP A,$00 
MOU.: NOP A I $01 
JCO:NOP L2 
*: This sect i cm 
* calculates the 
* real and imaginary subtotals 
* and put~; 
* real result in address $00 
* imaginary result in $01 
* 
lt: 
*----------------------------
•:Updates the 
lt: f: and Y i nde:x: 
*registers for each new n 
*----------------------------
*jump to Ll 63 times then continue 
*output real part X(k) 
*clear Y 
*Ccrmput e new 
*va I ue for· 
•:x index 
* ~----~~~~~~~----
*output imaginary part X(k) 
* i n i t i a I i z i ng 
*addre~~ $00 and 
*address $01 
*loop back to L2 for 62 ti•es 
Figure 20. 6~-Point DFT Program. 
L1 MOU:tiOP Y,$04 
LTB:HOP $00(X),$00(Y) 
LTB:HOP $01(X),$01(Y) 
LT6:MLT $00(X),$01(Y) 
LTB:MRD $01{X),$00(Y) 
MOU:MLT D,$00:A 
MOU:MSM $05,Y 
MOU:NOP 0,$01 
MOU:NOP $00(Y),B 
nx~·: ADD •$00,•$02 
MOIJ:SUB 0,$7E(Y) 
MOLl:NOP D,$02 
MOU:NOP $01,A 
MOU:NOP $01, 6 
MOU: ADD Y,$05 
MOU:SUB 0,$7F(\') 
MOU:NOP D,$03 
MOU:HOP $04,Y 
MAB:NOP $02,$00(Y) 
MAB:NOP $03,$01(Y) 
-------------------
-------------------
-------------------
58 
* store Y index 
*calculate real and i•ag. 
* 
*ports of Xm(q) x uHr 
* 
*real part to ARAM and A 
*change y index 
* i n1ag_t1ar·t to ARAr-1 
*real part Xm(p) to B 
*incrementing $05 Y inde x 
*real part Xm+l(p) to BRAM 
*real part Xm+l(q) to ARAM 
*i~ag part Xm(q) x uNr to A 
*imag part Xm(p) to E 
* 
*imog part Xrn+l(p) to BRAM 
*imag part Xrn+l(q) to ARAM 
•change bock Y index 
*real port Xm+l(q) to BRAM 
*imog part Xm+l(q) to BRAM 
*IHDEXING AHO LOOP COMMAHDS 
Figure 21. FFT Butterfly Routine for 6~-point FFT. 
COUCLUSIONS 
The Fujitsu MB8764 digital signal nrocessor was found 
to be a powerful processor canable of perform~ng very fast 
multiply and sum routines. This sneed enables it to 
solve a second-order binomial equation in 1.6 µsec, a 
64-point FFT in .6 µsec, and a 64-?oint DFT in 9.0 msec. 
An eighth-order digital IIR filter irnnlemented in a 
parallel form can operate with a sample rate of 149.25 
KHZ. The weakness in the Fu4itsu chip lies in its 
internal precision. With only 16 bits internal ?recision, 
sample rates greater than five times ""Che maximum signal 
frequency may be too great for the internal precision 
of the Fuiitsu. Increasing the chips internal precision 
to 24 bits is possible by usin~ two worqs for internal 
data transfer and coefficient storage, and by shiftin~ 
the D re~ister so that the lower-order bits can be trans-
ferred out. This procedure is cumbersome and would slow 
down processing by at least a factor of ten. Double 
precision operations are not possible because the D 
register carries only 26 bits. 
The MB8764 allows for external expansion of ROM and 
RAM. When ERAM is used either the instruction cycle must 
be 1.25 ~sec or less, or the ERAM speed switching O?tion 
must be utilized. This option, selected by an external 
59 
60 
pin, allows ERAM to be accessed at half the rate of the 
instruction cycle. A DFT, FFT or digital filter program 
which uses ERAM will run faster with the 1.25 µsec machine 
cycle than with a 0.1 µsec instruction cycle and the 
ERAM show speed option selected. RAM and ROM are divided 
into pages with the RAM having 256 words per page. This 
paging causes problems in any program that works with more 
than a page of data or coefficients. DFT calculations for 
more than 64 points, although possible on the MB8764, are 
difficult to program and slow to operate because of this 
paging oroblem. 
The input/output features on the MB8764 can be u~ed 
to govern the sample rate of a digital filter. This is 
done by using a jump instruction that prevents program 
execution from continuing until an input is received. The 
address attached input mode allows specific coefficients 
of a digital filter to be changed during program execution. 
Thus a designer can produce a digital filter that reacts 
to various parameters and compensates its transfer function 
to accommodate the parameter changes. 
Instructions are designed to take advantage of the 
separate sides of RAM and their indexes. This makes 
programming on the MB876·q·, .. ·most · efficient when ARAM and BRAM 
~ . . .. . 
). or table ROM and BRAM can be used independently. When . this 
sepal;'ation cannot be used . by an application the M;B8764 
becomes awkward __ in its internal ·data transfer. Thus the 
61 
MB8764 is not a general purpose microorocessor but is 
snecificallv designed for digital signal processing or 
similar arithmetic o~erations. 
The MB8764 helps to orevent overflow in preliminary 
onerations from occuring by providing two bits to the left 
of the decimal point in the D re~ister. The data format 
in the input/output and storage registers allows for one 
bit to the left of the decimal point. If inout signals 
are restricted to +/- one, scaling of the inout signal 
is unnecessary. 
Snecifications of the MB8764 claim it can implerne~t 
a second-order filter in 0.7 µsec. It should be noted 
that the second-order filter to which this s~ecification 
applies is a second-order FIR filter. 
~EFERENCES 
1. MB8764 Pro~ramrning Manual. Tokyo, Japan: Fujitsu 
Limited. 
2. MB8764 Hardware Manual. Tokyo, Japan: Fujitsu 
Limited. 
3. Booth, Andrew D. "A Signal Binary Multiplication 
Technique," Quarterly J. Mechanical Application 
Math., Vol. 4, pn. 236-240. Reprinted by Earl 
E. Swartzkander, Jr., ed. Benchmark Paners in 
Engineerin~ and Computer Science/21 Computer 
Arithmetic. Stroudsburg, Pennsylvania: Dowden, 
Hutchinson and Ross, 1980. 
4. Canright, Robert Eldon, Jr. "Digital Filtering 
with the iAPX 86/20." Research Paner, University 
of Central Florida, 1983. 
5. Phillips, Charles L., and Nagle, H. Troy, Jr. Digital 
Control System Analysis and ·nesign. Englewood Cliffs, 
New Jersey: Prentice-Hall, 1984. 
6. ~B87902 The Fuiitsu MB8764 Sunport Tool Outline. 
Tokyo, Japan: Fujitsu Limited. 
7. MR87902 Software Develonment Tool Kit for MB8764 
Digital Signal Processor Detailed Description. 
Tokyo, Japan: Fujitsu Limited, 1984. 
8. Onnenheim, Alan V., and Schafer, Ronald W. Di~ital 
Si~nal Processing. Englewood Cliffs, New Jersey: 
Prentice-Hall, Inc., 1975. 
62 
