Algorithms and VLSI architectures for parametric additive synthesis by Spanier, Jonathan Robert
Durham E-Theses




Spanier, Jonathan Robert (1999) Algorithms and VLSI architectures for parametric additive synthesis,
Durham theses, Durham University. Available at Durham E-Theses Online: http://etheses.dur.ac.uk/4536/
Use policy
The full-text may be used and/or reproduced, and given to third parties in any format or medium, without prior permission or
charge, for personal research or study, educational, or not-for-profit purposes provided that:
• a full bibliographic reference is made to the original source
• a link is made to the metadata record in Durham E-Theses
• the full-text is not changed in any way
The full-text must not be sold in any format or medium without the formal permission of the copyright holders.
Please consult the full Durham E-Theses policy for further details.
Academic Support Office, Durham University, University Office, Old Elvet, Durham DH1 3HP
e-mail: e-theses.admin@dur.ac.uk Tel: +44 0191 334 6107
http://etheses.dur.ac.uk
T H E UNIVERSITY OF DURHAM 
S C H O O L O F E N G I N E E R I N G 




Jonathan Robert Spanier 
19 JIM 2000 
A THESIS SUBMITTED IN PARTIAL FUL-
F I L L M E N T OF THE REQUIREMENTS OF THE 
C O U N C I L OF T H E U N I V E R S I T Y OF D U R H A M 
FOR T H E DEGREE OF DOCTOR OF PHILOSO-
PHY ( P H . D . I . 
M A Y 1 9 9 U 
The copyright of this diesis rests 
wi th the author. No quotation 
f rom it should be published 
without the written consent o f tiie 
author and information derived 
f rom it should be acknowledged. 
Algorithms and VLSI Architectures for Parametric Additive Synthesis 
Jonathan Robert Spanier 
Ph.D. 1999 
Abstract 
A parametric additive synthesis approach to sound synthesis is advantageous as it 
can model sounds in a large scale manner, unlike the classical sinusoidal additive 
based synthesis paradigms. I t is known that a large body of naturally occurring 
sounds are resonant in character and thus fit the concept well. This thesis is con-
cerned with the computational optimisation of a super class of formant synthesis 
which extends the sinusoidal parameters with a spread parameter known as band-
width. Here a modified formant algorithm is introduced which can be traced back 
to work done at IRCAM, Paris. 
When impulse driven, a filter based approach to modelling a formant limits the 
computational work-load. I t is assumed that the filter's coefficients are fixed at ini-
tialisation, thus avoiding interpolation which can cause the filter to become chaotic. 
A filter which is more complex than a second order section is required. 
Temporal resolution of an impulse generator is achieved by using a two stage 
polyphase decimator which drives many filteibanks. Each filterbank describes one 
formant and is composed of sub-elements which allow variation of the formant's 
par ameters. A resource manager is discussed to overcome the possibility of all sub-
banks operating in unison. A l l filterbanks for one voice are connected in series to 
the impulse generator and their outputs are summed and scaled accordingly. 
An explorative study of number systems for DSP algorithms and their architec-
tures is investigated. I invented a new theoretical mechanism for multi-level logic 
based DSP. Its aims are to reduce the number of transistors and to increase their 
functionality. 
A review of synthesis algorithms and VLSI architectures are discussed in a case 
study between a filter based bit-serial and a CORDIC based sinusoidal generator. 
They are both of similar size, but the latter is always guaranteed to be stable. 
Declaration 
I hereby declare that this thesis is a record of work undertaken by myself, that it 
has not been the subject of any previous appplication for a degree, and that all 
sources of information have been duly acknowledged. 
5, <Xrv\ 
Jonathan Robert Spanier, May 1999 
© Copyright 1999, Jonathan Robert Spanier 
The copyright of this thesis rests with the author. No quotation from it should 
be published without the written consent of the copyright owner, and information 
derived from it should be acknowledged. 
Acknowledgments 
I would like to take this opportunity to thank both my supervisors, Professor Alan 
Purvis and Dr. Simon Johnson for the encouragement and support they provided 
over the course of study. 
The financial assistance provided by the University of Durham Studentship and 
the Royal Academy of Engineering International Travel Grant Scheme is gratefully 
acknowledged. 
I would like to thank Mike Ellison of the University I T Computing Service for helping 
me to operate the Cadence Design Integration tools. My thanks go to members of 
the electronics workshop and to the departmental computing support; especially 
to Peter Friend, Ian Hutchinson, Peter Baxendale and Trevor Nancarrow for their 
advice and assistance in electronic and computing problems. Special thanks go to 
Matthew Jubb and John Glover for enlightening me in the joys of UNIX system 
administration and running Linux. 
I am indebted to my fellow postgraduates, especially to Steve Robinson, Martin 
Bradley and the rest of the Telecoms Networks Research Group. I would like to 
thank my colleagues in the Digital and Optical Signal Processing Group and the 
VLSI Research Group for the informative discussions we had over the three years. 
I would like to acknowledge the B T E X community for providing such a useful and 
worthwhile typesetting package. 
Special thanks also go to Austin Cassidy, Dr. Roger Woods and Dr. Saeed Vaseghi of 
The Queen's University of Belfast for providing UNIX support, portable computers 
and for all their encouragement. 
Finally I would like to thank my parents for providing steadfast support throughout 
all my endeavours and to give special thanks to my mother, who tirelessly proofread 
this thesis despite not understanding any of the technical aspects. 
Contents 
List of Figures v 
List of Tables vii 
List of Abbreviations viii 
1 Background and Structure 1 
1.1 Introduction 1 
1.2 Aims of this Thesis 2 
1.3 Summary of Thesis 3 
2 Introduction to Synthesis methods 5 
2.1 Subtractive Synthesis 6 
2.1.1 The Digital Controlled Oscillator 7 
2.1.2 The Digital Controlled Amplifier 10 
2.1.3 The Digital Controlled Filter 11 
2.2 Additive Synthesis 12 
2.2.1 Time-Multiplexed Wavetable Oscillators 13 
2.2.2 Inverse FFT-based Oscillators 14 
2.2.3 Multirate Additive Oscillators 14 
2.3 Frequency Modulation 15 
2.4 Data Reduction for Synthesis 16 
2.5 Waveguide Modelling 17 
2.6 Speech Modelling 18 
2.6.1 Time-Domain Modelling 18 
2.6.2 Spectral Modelling 19 
2.6.2.1 VOSIM 20 
2.6.2.2 Parallel/Cascade Formant Synthesis 21 
2.6.2.3 Forme d'Onde Formantique 24 
2.7 Summary 25 
C O N T E N T S ii 
Number Systems for V L S I Processing Elements 27 
3.1 Introduction to Number Systems 27 
3.2 Conventional Radix Number System 30 
3.2.1 Weighted Number Systems 30 
3.2.2 Sign-Magnitude Representation 31 
3.2.3 Diminished-Radix Complement Representation 31 
3.2.4 Radix Complement Representation 32 
3.2.5 Properties of Sign-Magnitude, r-l 's and r's Complement 
Numbers 32 
3.3 The Signed-Digit Number System 33 
3.3.1 Definition of Signed-Digit Numbers 33 
3.3.2 Conversion Between The Conventional Radix-r Number and 
its SD Form 34 
3.3.2.1 Conversion from Conventional to SD Systems . . . . 34 
3.3.2.2 Conversion from SD to Conventional Systems . . . . 35 
3.4 The Residue Number System 35 
3.4.1 The Multiplicative Inverse 36 
3.4.1.1 Euler's Formula 37 
3.4.2 Residue to Decimal Conversion 37 
3.4.2.1 Mixed-Radix Method 37 
3.4.2.2 Chinese Remainder Theorem 38 
3.4.3 The Quadratic Residue Number System 38 
3.4.3.1 Conversion To Complex Quadratic Residue Number 
System 38 
3.4.3.2 Conversion Back To Conventional Number System . 39 
3.5 Addition Elements 40 
3.5.1 Conventional Binary Addition and Subtraction 40 
3.5.2 Signed Digit Addition and Subtraction 41 
3.5.2.1 The Algorithm 41 
3.6 Array Multiplication Techniques 44 
3.6.1 Cellular Array Multipliers Conclusions 46 
3.7 Canonical Multiplier Recoding 47 
3.7.1 Introduction to Multiplication String Recoding Algorithms 47 
3.7.2 Canonical Signed-Digit Multiplier Recoding 48 
3.7.3 Canonical Recoding Algorithm 48 
3.7.4 The Booth's Multiplier Algorithm 49 
3.7.5 Efficiency of Multiplier Algorithms and Design Alternatives . 50 
3.8 Summary 51 
V L S I Technologies and Applications 53 
4.1 Novel VLSI Technologies 54 
4.2 Current-Mode Logic 55 
4.2.1 Current Source 56 
4.2.2 Current Mirror 57 
4.2.3 Threshold Detector 58 
4.2.4 Bidirectional current input circuit 58 
C O N T E N T S iii 
4.2.5 Module VLSI Scaling 60 
4.3 Negative Differential Resistance 60 
4.4 Voltage-Mode Multiple-Valued Logic 62 
4.4.1 Operational Amplifier Approach 62 
4.4.2 Neuron MOSFET Approach 62 
4.5 DAD Logic 63 
4.5.1 Mathematical Theory behind DAD Logic 66 
4.5.2 Harmonic Oscillator 67 
4.5.3 Bipolar Level System 68 
4.5.4 Value Determination through Approximation 70 
4.5.4.1 Analogue Eigenvalue Methods 70 
4.5.4.2 The Shooting Method 71 
4.5.4.3 The Pruess Method 74 
4.5.4.4 Important Issues regarding Eigenvalue Determination 77 
4.6 Summary 77 
5 V L S I requirements for synthesis 79 
5.1 DSP Systems Design 79 
5.1.1 Multiplier Constraints 79 
5.1.2 MAC Constraints 80 
5.1.3 A L U Constraints 81 
5.1.4 Data Address Generator 81 
5.1.5 Program Sequencer 81 
5.1.6 Memory 82 
5.1.7 Architecture Summary 82 
5.2 Datapath length criteria 83 
5.2.1 Overflow and underflow reduction 83 
5.2.2 Truncation, Rounding and Unbiased Rounding 84 
5.2.3 Saturation Arithmetic 85 
5.2.4 Other noise problems 85 
5.3 Filter based Sine Generator 86 
5.3.1 Filter Structure 87 
5.3.2 Multiplier Design 88 
5.3.3 Accumulator Design 90 
5.3.4 Filter Design 90 
5.4 The CORDIC Sine Generator 91 
5.4.1 Derivation for Circular Mode 91 
5.4.2 The CORDIC Algorithm 92 
5.4.3 CORDIC Sine Algorithm 94 
5.4.4 Design of the CORDIC sine Generator 95 
5.4.5 Sequencer Design 98 
5.4.6 ROM Design 100 
5.4.7 CORDIC register Design 102 
5.4.8 Adder Design 103 
5.4.9 Construction and Testing 103 
5.5 Comparisons between Sinusoidal Generators 108 
C O N T E N T S iv 
6 Decaying Sinusoidal Additive Synthesis 110 
6.1 Analysis of a Second Order Filter Structure I l l 
6.2 Analysis of the FOF technique 112 
6.2.1 Implementation Approaches to the FOF wavefunction . . . . 114 
6.2.2 Wavetable Based FOF Synthesis 115 
6.2.3 Filter Based FOF Synthesis 117 
6.2.4 Full Hanning Window FOF envelope 120 
6.2.5 FOF filter structure with oscillation 122 
6.2.6 Miscellaneous Wavefunction Designs 124 
6.3 Approaches to formant synthesis 125 
7 Parametric Additive Synthesis Implementation 127 
7.1 Optimisation of the filter t2e~Qi sin(w 00 128 
7.1.1 Analysis of the Transversal Element of the Filter Topology . 128 
7.1.2 Transversal Filter Structure 129 
7.1.3 I IR Filter Structure 130 
7.2 Source Excitation Driver 131 
7.2.1 Discrete Summation Formulas for Impulse Generation . . . . 133 
7.2.2 Impulse generation using Decimation 134 
7.2.2.1 The Decimator Structure 134 
7.2.2.2 Decimator Design 135 
7.2.2.3 Reduced Coefficient Storage Design 137 
7.3 Synthesiser Structure 138 
7.3.1 The importance of Phase in Audio 138 
7.3.2 Parallel Formant Model 139 
8 Algorithm Scheduling and Miscellaneous Topics 143 
8.1 Algorithm Scheduling 143 
8.1.1 Controller Algorithm Overview 144 
8.1.2 Filterbank Scheduler Controller 145 
8.1.3 Number of Filters required 146 
8.2 Miscellaneous Topics 147 
8.2.1 Envelope Generation 147 
8.2.2 Subtractive Synthesis Extensions 148 
8.2.3 Synthesis by Rule 149 
8.3 User Interface Design 150 
8.4 Hardware/Software Segmentation 151 
9 Conclusions and Further Work 153 
9.1 Conclusions 153 
9.2 Further Work 156 
A The Laplace to Z Transform Algorithm 158 
List of Figures 
2.1 Simplified Subtractive Voice Architecture 7 
2.2 Block Diagram of a Wavetable Oscillator 8 
2.3 Second Order Digital Filter 11 
2.4 LPC Synthesiser 19 
2.5 Typical VOSIM waveform 20 
2.6 Block Diagram of Serial-parallel formant synthesiser 23 
2.7 FOF synthesiser topology 24 
2.8 Properties of the FOF wavefunction 25 
3.1 Block Diagrams of ripple carry adders for conventional binary arith-
metic 41 
3.2 Block Diagram of Part of the SD adder/subtractor 43 
3.3 The Structure of a 4x4 N M M 45 
3.4 The Structure of a 4x2 A M M 45 
4.1 Current Source Circuit Symbol and Schematic 56 
4.2 Current Mirror Circuit Symbols and Schematics 57 
4.3 Threshold Detector Circuit Symbol and Schematic 58 
4.4 Bidirectional Current Input Circuit Symbol and Schematic 59 
4.5 Bidirectional Current Input I-V characteristics 59 
4.6 Block Diagrams of Negative differential Resistance 61 
4.7 Schematic representation of a neuron MOS transistor 63 
4.8 Functional Diagram of DAD Logic Quantiser Operation 65 
4.9 Stationary state solutions for the Harmonic Oscillator Potential . . . 68 
5.1 A Simplified Structure of a DSP Processor 80 
5.2 Block Diagram of the I IR Sinusoidal Generator 87 
5.3 Schematic Diagram of the 5 level Booth Multiplier Module 89 
5.4 Block Diagram of Sinusoidal Generator 97 
5.5 Block Diagram of the Barrel Shifter 98 
5.6 Schematic Diagram of a T flip-flop 99 
5.7 Schematic Diagram of first T flip-flop for mod counting 99 
5.8 Schematic Diagram of a modulo 21 counter 100 
v 
L I S T O F F I G U R E S vi 
5.9 Schematic Diagram of a Synchronous Reset Flip-Flop 103 
5.10 Comparison between Sine Function and CORDIC Simulation . . . . 105 
5.11 Error Difference between Sine Function and CORDIC Simulation . . 106 
5.12 Photomicrograph of the CORDIC Cosine Chip 108 
6.1 The FOF Wavefunction 112 
6.2 Power Spectrum of the FOF Wavefunction 114 
6.3 Simplified CSOUND FOF Implementation 116 
6.4 Time domain behaviour of FOF envelope generator 118 
6.5 Zero-Pole plot of the FOF envelope generator 119 
6.6 Heterodyned version of the FOF Algorithm 120 
6.7 Time domain and z-plane responses to Full Hanning FOF 121 
6.8 Time Domain response of the t2e~ot sin(woO wavefunction 123 
6.9 z-plane response to t2e~at sin(o>oi) wavefunction 123 
6.10 Magnitude and Phase Spectra for the t2e~al sm(u>ot) wavefunction . 124 
7.1 Flowgraph of Direct Form FIR Filter Structure 129 
7.2 Second Order Coupled I IR filter Form 131 
7.3 Simple Impulse Generator Schematic 131 
7.4 Amplitude vs Frequency for the sine function 132 
7.5 Reduced Dynamic Memory Storage Polyphase Filter 136 
7.6 Block Diagram of t2 e~at sin(u^) based formant synthesiser 140 
8.1 Block Diagram of the controller for idle/running status 145 
8.2 Block Diagram of Resource Filter Allocation 146 
8.3 Schematic of a Bipolar Envelope Generator 149 
List of Tables 
3.1 Multiples of the multiplicand to be added after .scanning a triplet of 
mult iplier hits in an overlapped pairwise scanning system 48 
3.2 The Booth Multiplier Radix-2 String Recoding Algorithm 49 
5.1 Input-Output Functions for CORDIC Modes 93 
5.2 CORDIC Sine Chip Input-Output Connections 106 
5.3 Current Consumption of CORDIC Sine Chip 107 
6.1 Comparisons between Foimaht Filter Topologies . . . . . . . . . . . 126 
8.1 Decision Table for Bipolar Envelope Ramp Generator 148 
v i i 
List of Abbreviations 
ADSR Attack Decay Sustain Release 
AES Audio Engineering Society 
ALU Arithmetic Logic Unit 
A M Amplitude Modulation 
ASIC Application Specific Integrated 
Circuit 
CCRMA Center for Computer Research 
in Music and Acoustics 
CMOS Complementary Metal-Oxide 
Silicon 
CORDIC COrdinate Rotation Digital 
Computer 
DAC Digital Analogue Converter 
DCA Digital Controlled Amplifier 
DCF Digital Controlled Filter 
DCO Digital Controlled Oscillator 
DFT Discrete Fourier Transform 
DIL Dual In Line 
DSP Digital Signal Processor 
ES2 European Silicon Structures 
FFT Fast Fourier Transform 
vii i 
L I S T O F A B B R E V I A T I O N S ix 
FIFO First In First Out 
FIR Finite Impulse Response 
FM Frequency Modulation 
FOF Forme d'Onde Formantique 
FPGA Field Programmable Gate Ar-
ray 
HP Hewlett Packard 
IEEE The Institite of Electrical and 
Electronic Engineers 
I IR Infinite Impulse Response 
IMEC Inter-Universitair Micro-Elektronica 
Centrum 
IRCAM Institut de Recherche et Co-
ordination Acoustique/Musique 
LPC Linear Predictive Coding 
MAC Multiply-Accumulate unit 
MIDI Musical Instruments Digital 
Interface 
MOSFET Metal-Oxide Silicon Field Ef-
fect Transistor 
MVL Multiple-Valued Logic 
NDR Negative Differential Resistance 
RNS Residue Number System 
ROM Read Only Memory 
SD Signed Digit 
SHARC Super Harvard ARchitecture 
Computer 
SQUID Superconducting QUantum In-
terference Device 
STL Simulation and Test Language 
L I S T OF A B B R E V I A T I O N S x 
VHDL 
VOSIM 
Transistor TYansistof Logic 
VHSIC Hardware Description 
Language 
Very Large Scale Integration 
or Integrated circuits 
VOcal SIMulation 
C H A P T E R 1 
Background and Structure 
1.1 Introduction 
Since the dawn of time, mankind has used materials from the Earth to mould into 
instruments to hit, pluck, stroke or blow. Over the aeons, man has harnessed new 
developments in Science and Technology to transform or create new tones and new 
playing techniques. Initially, these came through material science, as in the tones 
created from different types of wood and varnish. Now, in the 20th century the 
ubiquitous silicon chip has taken the limelight. 
Man uses VLSI technology either to make a facsimile of an instrument by modelling 
the instruments' properties or to create new ones which require new performance cri-
teria to control them. Research into this field requires a multidisciplinary approach 
utilising the skills of musicians, physicists and computer scientists. 
At the dawn of the millennium, musicians now have a plethora of different synthesis 
tools to generate unique sounds, but the commercialisation of these algorithms has 
been relegated to software implementation. Most manufactures have only imple-
mented the easy algorithms, such as wavetable, subtractive and Frequency Modu-
lation synthesis. They are becoming reluctant to spend any more man years and 
money on ASIC design, which wil l never repay their initial investment due to its 
small customer base. However, most manufactures have jumped onto the multi-
media concept. This leaves musicians with a quandary on whether to buy similar 
hardware instruments or to go down the software extensibility route. The latter 
approach seems good value due to its upgradability, but this assumes that compu-
tational resources are fixed. In reality, algorithms consume valuable computation 
1 
1. Background and Structure 2 
thus slowing down the synthesis engine to near or below real time. I t is difficult for 
musicians, who normally rely on their tactile and auditory responses, to respond in 
the same way to purely acoustically based sounds. 
1.2 Aims of this Thesis 
This thesis looks at the mapping of a parametric additive synthesis technique to a 
specialised hardware implementation, in order to discover an optimal implementable 
approach built around a recursive filter with the minimum number of arithmetic and 
storage operations. The parametric additive synthesis technique is simply a formant 
based model which reduces the number of parameters needed to model a sound, and 
is applicable because most instruments resonate. The alternative method would be a 
sinusoidal based additive synthesis which has an explosion of controllable parameters 
and would be difficult to implement. 
The standard approach to digital CMOS design has been shown to be effective in 
reducing design times predominantly because digital is easier to work with than 
analogue. In this thesis, I explore different forms of digital structures by increasing 
the number of encodings per wire. To this end, current mode CMOS logic and 
negative differential resistance devices are discussed. My novel theoretical approach 
uses a constant number of transistors to generate many discrete levels. This is closer 
to the behaviour of negative differential devices and should consume less power. 
A comparison between a bit-serial sinusoidal generator and a CORDIC parallel 
based sinusoidal generator is made, and they are found to be of comparable size. 
However, despite taking 21 clock cycles to generate a sinusoid, the CORDIC version 
is always stable. The I IR filter coefficients sets the frequency of oscillation and is 
sensitive to the pole quantisation of its transfer function. This form takes 25 clock 
cycles to perform one filter operation, with both adders and multipliers operating 
in parallel. 
The standard FOF wavefunction approach is very greedy on memory and arithmetic 
resources. A filter based approach overcomes this problem. The filter structure 
chosen has 4 zeros in complex conjugate form on the real axis and 3 poles in complex 
conjugate pair form. This is radically different to any other FOF form. 
Octaviation can be performed by using two phase accumulators plus an adder. This 
forms an idealised impulse generator running at a high sampling rate. The impulses 
are decimated to a lower sampling rate, using an optimal two stage decimator. Using 
polyphase filters and FIR symmetry, a novel coefficient storage reduction scheme is 
proposed. 
1. Background and Structure 3 
The fil ters are controlled by an allocation system which utilises decrementing coun-
ters and a linked list scheduler. The need for synthesis rules, envelope generation, 
subtractive synthesis modules and intui t ive user interface w i l l al l be highlighted. 
1.3 Summary of Thesis 
The structure of this thesis is as follows :-
Chapter 2 presents a review of common audio synthesis techniques for music genera-
t ion . I t highlights the subtractive synthesis approach based on modular components 
progressing through to sinusoidal additive synthesis. The remainder of the chapter 
looks br ief ly at physical modell ing and speech inspired synthesis techniques based 
on formant modell ing. 
Chapter 3 examines the mul t i tude of approaches to number representation and 
the effect of these on numerical computat ion. B o t h of these are of paramount 
importance in V L S I DSP design and implementat ion as they enable designers to 
meet the performance cri teria of the chosen a lgor i thm. Conventional and redundant 
number representations are discussed and their merits i n addi t ion and mul t ip l i ca t ion 
are highlighted. Finally, array and Booth recoded mult ipl iers are discussed. 
Chapter 4 discusses different material structures for DSP, highl ight ing mul t iva l -
ued logic architectures using current mode logic and voltage-mode logic paradigms. 
Mul t iva lued logic, based on negative differential resistance based devices, plus a new 
concept which I have invented, are also presented. 
Chapter 5 expands on Chapter 3 and deals w i t h hardware implementat ion require-
ments necessary for audio synthesis and discusses wordlength issues caused by ad-
d i t i on and mul t ip l ica t ion . These increases in wordlength can be minimised using 
t runcat ion or rounding schemes. Problems caused by noise are also addressed. The 
rest of the chapter then applies this knowledge to a comparison of two sinusoidal 
oscillators implemented using either a bit-serial f i l ter or a C O R D I C based a lgor i thm. 
Chapter 6 examines a formant based synthesis technique by modell ing the time-
domain impulse response of a second order f i l ter . This creates a damped sinusoid 
having no attack and the problem is overcome by looking at the FOF methodology. 
The method is then discussed in terms of implementat ional metrics. Implementa t ion 
of this structure is then approximated by various f i l te r topologies and some unusual 
waveform generators are highlighted. 
Chapter 7 chooses f r o m Chapter 6 one filter topology w i t h an elegant structure and 
its implementat ion is discussed. The driver to this f i l te r is designed and optimised. 
1. Background and Structure 4 
Chapter 8 addresses scheduling of the fil ters and synthesis structures to enable 
easy use of the technique. Rules to enable realistic s imulat ion of singing voices are 
discussed as are other approaches to enhance the sonic capabilities of the synthesis 
model. The chapter ends w i t h details of how the a lgor i thm would be par t i t ioned 
between hardware and software. 
Chapter 9 describes the conclusions reached in this work, and suggestions for fu r the r 
research in this topic are discussed. 
Appendix A discusses the mathematical derivation of the Laplace to Z t ransform. 
C H A P T E R 2 
Introduction to Synthesis 
methods 
Audio synthesis techniques provide a valuable playground in which to explore the 
route f r o m algor i thm design to a ful l -custom V L S I architecture. This thesis w i l l 
concentrate on the synthesis of musical tones. This f ield has benefited f r o m the 
cross-fertilisation of ideas f r o m engineers to musicians and evolved f r o m original 
work at Bell Telephone Laboratories. The work at Bell Telephone Laboratories was 
or iginal ly in speech compression, but in the 1960's Max Mathews created the f i rs t 
d ig i t a l software generating musical tones known as M U S I C I V [82, 114]. The first 
commercially available digi ta l music synthesiser was introduced by Yamaha in the 
early eighties. This instrument was based on research by John Chowning [11] of 
Stanford University, which used the F M synthesis technique. Before the advent of 
the D X series of synthesisers by Yamaha, musicians were using analogue based sub-
tractive synthesisers of Moog, A R P and others [137]. The d ig i ta l approach provided 
repeatabil i ty of sounds because the technology was not dependent on temperature 
d r i f t i n g of components as was found in the analogue systems. The F M algor i thm also 
provided musicians w i t h br ight , clean sounding timbres that were unique. However, 
musicians now prefer the analogue systems because they provide warm sounding 
timbres caused by their imperfections. Musical instrument manufacturers began 
to explore newer techniques in order to give the musicians better control over the 
sounds. 
A l l the early d ig i ta l synthesisers relied upon custom V L S I architectures to provide 
5 
2. Introduction to Synthesis methods 6 
the computat ional power necessary to generate mul t ip le voices in real-time and 
to make the instruments affordable. Nowadays, w i t h new general purpose DSP 
chips designed for sustained computat ion, manufacturers choose the programmable 
(software) and hardware approach. We have gone full-circle, f r o m off-line compiler 
based software synthesis to real-time synthesis engines like the new C S O U N D , based 
around the Analog Devices f loat ing point SHARC DSP chip [138]. Does this mean 
that a thesis on V L S I and synthesis algorithms is worthwhile ? I th ink so because 
software based engines require hardware which can support the computations which 
must be computed w i t h i n a sample interval and so require powerful new V L S I 
designs. The other approach is the ful l -custom ASIC design which has a pay-
off when the number of units sold is very large. However, the latter approach 
requires a longer gestation period. The ASIC approach is of value for sample-rate 
converters [2] and mobile phones. For the music market this approach is becoming 
less attractive. B u t opt imising the algorithms w i l l make software use the available 
hardware resources more effectively. 
Th i s chapter w i l l review the current synthesis techniques available today and high-
l ight their possible hardware configurations. 
2.1 Subtractive Synthesis 
I n 1964 Robert Moog constructed a transistor-based analogue voltage controlled 
oscillator, and presented a paper on i t at the sixteenth annual convention of the 
AES [82]. Over a period of rapid growth, engineers realised the need for other 
funct ions to emulate instruments and make new tones. New voltage controlled 
funct ions were provided in the f o r m of Voltage Controlled Fil ters and Ampl i f ie rs . 
The subtractive synthesis technique uses filtering to change the harmonic structure 
of the input which is normally of a r ich harmonic content, for example, the sawtooth 
waveform. 
2. Introduction to Synthesis methods 7 





M I M M Envelope Generator 
Keyhonrd 
Figure 2.1: Simplif ied Subtractive Voice Architecture 
A n example of a typical single voice, dual oscillator subtractive synthesiser is shown 
in f igure 2.1. The keyboard generates two control signals; the gate which is an o n / o f f 
signal indicat ing the durat ion of the note and a signal proport ional to the frequency 
of the note value. The gate signal drives the D C F and/or D C A via an envelope 
generator. The two oscillators (DCO) can either be controlled separately or chained 
together, thus providing a richer evolving sound. These oscillators in analogue 
synthesisers produce a range of waveforms, most notably, Sinusoidal, Triangular , 
Pulse and Sawtooth. I n the digi ta l equivalent, the musician has an inf in i te choice of 
waveform. Incidentally, modern day samplers are basically subtractive synthesisers 
w i t h oscillators having user definable waveforms. 
The next three sub sections w i l l concentrate on digi ta l hardware to implement the 
D C O , D C A and D C F modules in a subtractive synthesiser. 
2.1.1 The Digital Controlled Oscillator 
There are many different ways of generating an oscillator; either f i l ter based [31, 53] 
or by wavetable lookup. This sub-section w i l l discuss the wavetable based approach 
to oscillator generation. Essentially, wavetable based oscillators are counters being 
incremented by a variable representing frequency of oscillation. The ou tpu t f r o m 
the counter is used as an address for a R O M containing a discrete-time version of the 
analogue waveform. The counter can be thought of as a periodic ramp waveform, 
hence each ramp count is transformed by looking up in a table. The contents of the 
table ( R O M ) can be any shape the musician likes, not s imply a sinusoid. The only 
2. Introduction to Synthesis methods 8 
Frequency 
Increment. Keen the 
m hits 
m nits 
m bits Accumulator 
Register. 




Figure 2.2: Block Diagram of a Wavetable Oscillator 
proviso being that the waveform in R O M is single-cycled; so the counter provides 
the means of generating repetitive cycles of the waveform. 
There are two types of wavetable lookup oscillators [10], one is the variable-rate 
f o r m which drives the counter, R O M and the ou tpu t conversion elements. Th i s ap-
proach provides high-f idel i ty but is very expensive in terms of hardware. Due to the 
variable-rate nature, the approach cannot be mult iplexed and is thus unsuitable for 
V L S I implementat ion. The other type is known as the fixed sample-rate oscillator 
which is also known as the phase-accumulator oscillator. A block diagram of this 
oscillator is shown in figure 2.2. 
The phase-accumulator oscillator contains the phase angle increment and phase 
angle registers. The phase angle register holds the instantaneous phase of the wave-
f o r m . Each clock interval, the phase angle contents are added by the phase angle 
increment and when the ou tpu t of the adder overflows, the adder performs a modulo 
operation. This operation natural ly occurs when the phase angle is greater than 
2 '" — 1, and rn is the number of bits i n the adder. The clock signal is generated v ia 
a crystal for stabil i ty. This technique requires interpolat ion when the sample falls 
between two storage locations. Failure to correct for this non-integer sample w i l l 
manifest itself as audio distort ion. I n figure 2.2 we have a t runcat ing oscillator, the 
77i — n bi ts can be thought of a f ract ional part , used i f necessary for interpolat ion. 
I n sil icon, interpolat ing oscillators increase the chip real-estate and hence requires 
thought to whether the expense is jus t i f ied . However, this oscillator is suitable for 
V L S I implementat ion because the adder can be t ime-division mul t ip lexed, providing 
many oscillators. M u l t i p l e registers are required, but in V L S I the adder consumes 
more space. B y providing a system clock which is k times the sampling rate and 
logic which can operate at this speed, k oscillators can be generated at the sample 
rate. 
2. Introduction to Synthesis methods 9 
The frequency of the waveform, / given the phase angle increment, / is 
where f s is the sampling rate and L is the table length. Wha t is now required is the 
number of bits i n the adder to provide sufficient resolution for musical applications. 
Using the method as presented in Snell's paper [127] the size of the adder is given 
by equation 2.1 
Where Fmax is the maximum desirable frequency of any sinusoidal component and 
i ? m ] n is the m i n i m u m step or change i n frequency. Due to work done by many 
scientists the values for these parameters are 20 kHz and 0.03 Hz respectively. Sub-
s t i t u t ing into equation 2.1 gives an adder word length of 21 bits. This is the m i n i -
m u m required resolution necessary for musical purposes. I f i*max is a power of two, 
then the true frequency of the oscillator can be read directly f r o m the phase angle 
increment wi thou t using any extra hardware/software, providing a direct human 
readable value. 
A n expression which relates b to the m i n i m u m interval the oscillator can be tuned to, 
requires knowledge of the non-linear equally tempered musical scale. The m i n i m u m 
interval is normally expressed in cents and a cent is 1/1200 of an octave. One 
octave is a doubl ing /ha lv ing by a factor of two. The m i n i m u m interval (or tuning-
accuracy) about a part icular note frequency F is given below. 
I t can be shown that various values for F, i*max and b give non-linear values for 
C m m by using equation 2.2. Therefore to achieve adequate resolution across the 
entire audio spectrum, the designer must choose a value for b which provides better 
than 1 cent tun ing resolution at the lowest frequency the oscillator w i l l run at. 
As mentioned earlier, this type of wavetable lookup generates dis tor t ion i f the sample 
value i n the phase accumulator is not integer, i.e. a non zero fract ional part . Thus 
the correct sample required is i n between two memory locations. W h a t is required is 
a mathematical expression that predicts the noise as a func t ion of the size of table. 
A n empir ical result was discovered by Moore [94] and the Signal-to-Noise rat io was 
found to be 6(k — 2) dB for a t runcat ing oscillator and 6(k - 1) dB for rounded 
(to nearest integer) oscillators. The k and k + 1 being the bits of precision in the 
oscillator (not including the sign b i t ) for each result respectively. I t is also assumed 
Fm&x 6 = 1 + 3.32 log + 1 10 
m m 
(2.1) 
max C 3986.31 log + 1 10 m m F(2b 1 
(2.2) 
2. Introduction to Synthesis methods 10 
that the table has 2 f c entries. So, i f we do not want interpolated wavetables, the 
designer must choose a larger wavetable to reduce the noise generated by the lookup 
process. Moore also found that for sinusoidal interpolat ion there needs to be 2(k — l ) 
bits of precision (excluding the sign b i t ) i n a table of 2k entries to provide a SNR 
of 6/ d B . The interpolat ion formulae for sinusoidal interpolat ion can be as follows : 
take I , the integer part and f, the fract ional part of the phase angle respectively. 
sin(7 + / ) = sintable[7] + /(sintable[7 + 1] - sintable[/]) 
A n alternative f o r m using the approximation sin(x) ^ x for small x expressed in 
radians is 
s i n ( / + / ) ~ sintablef/] + /cos tablef / ] 
So far, the analysis has not taken into account the simplifications that can be made 
for symmetr ical waveforms like sinusoids. Since a sine func t ion is symmetrical about 
180° and the f i rs t quarter cycle is symmetrical about 90° storage space can be 
reduced by a factor of 4. The savings in memory as the table length increases make 
this approach very attractive. 
For the 180° approach, the designer w i l l implement half the table size and the sign 
b i t would control a two's complement inverter, to convert the positive table value to 
its negative. The designer would normally need to add one to this result to actually 
implement two's complement, but this can be assumed to be negligible. 
I n the 90° reduced table method, the most significant two bits are used to decide 
on whether to complement and/or scan the table. The most significant is the sign 
b i t as usual. The next b i t determines the negation of the address for the table and 
thus performs backward scanning. The remaining bits are the address for the R O M 
table lookup. I f the penult imate most significant b i t is non-zero, this w i l l alter the 
address bits to the R O M by two's complementing them. However i f the address bits 
are zero at the same t ime as this control b i t is active (1), the phase angle is either 
90° or 270° exactly and the complement would give an incorrect address for zero. 
This special case is overcome by subst i tu t ing the max imum address instead, and 
is accomplished by or ing the carry bi t of the two's complementer w i t h the address 
bits. 
2.1.2 The Digital Controlled Amplifier 
The D C A is nothing more than a mul t ip l ier . However, for the V L S I designer, the 
choice becomes t r icky because the larger the wordlength the more area is used ( in 
2. Introduction to Synthesis methods L 1 
parallel mul t ip l ie r circuits). The observant reader might wonder i f a f loat ing point 
m u l t i p l y would be beneficial. Unfortunately, for commercial uses, a mul t ip l ie r of 
this type is excessively expensive in terms of design time and floor space. I t is wor th 
not ing that f loat ing point mult ipl iers actually truncate the mantissa part , jus t like 
integer based mult ipl iers , p r imar i ly to reduce the number of bits to wire. This can 
be shown clearly by the fol lowing analysis. Say, the musician wants to mu l t i p ly a 
16 b i t number by an 8 b i t ampli tude envelope, the mul t ip l ier would need to have 
16 - f 8 = 24 bits output . However, i f this mul t ip l ier is going to be used again, the 
designer would need to provide sufficient headroom, otherwise saturat ion and an 
incorrect answer w i l l result. 
2.1.3 The Digital Controlled Filter 
The D C F provides gross modif ica t ion of the harmonic structure of the input by 
either boosting or attenuating user specifiable frequencies. This module is the most 
expensive to implement in silicon because filtering in the discrete-time domain re-
quires numerical operations and storage elements operating at the sample rate. 
I n analogue synthesisers, the filter is relatively simple to implement because the 
ar i thmetic calculations are implemented directly using active and passive electronic 
components [139]. I n the digi ta l domain, the designer must choose on the ar i thmetic 
format for the computations and then construct the operators accordingly. 
There are many different types of d ig i ta l filters, where their topologies influence 
ease of implementat ion and accuracy. I n this in t roductory chapter the second order 
state variable d ig i ta l filter w i l l be used as a case study. 
l-"rtt|u«u-> Kmiuimcy 
,. ...... 
> I I 
—' 
/ . • 
^ H j ^ P DbtcMtt-time 
Figure 2.3: Second Order Dig i ta l Fi l ter 
2. Introduction to Synthesis methods 12 
As shown i n figure 2.3 the digi ta l filter is quite complicated as i t has addi t ion, 
mul t ip l i ca t ion and delay (z~l) elements. This filter is composed of three elements; 
a mixer/adder and two integrators i n cascade. This f i l te r has a —12 dB/octave ro l l -
off i n each of the ou tpu t modes (lowpass, highpass and bandpass). The frequency 
parameter is defined as 2 s i n ( ^ f ) and the Q (Qual i ty factor) or bandwid th parameter 
is b w / f , where bw is the bandwid th and / is the desired frequency cut-off for the 
high and lowpass modes of the filter. This filter topology has the main advantage 
that the two variables, frequency and Q are independent. Note that the Q factor in 
the diagram is actually the reciprocal of the real Q factor. 
As mentioned earlier in section 2.1.1 i t is possible to implement an oscillator using 
fil ters. To implement a decaying oscillator, the f irst frequency parameter is mu l t i -
plied by —1 and the f irst adder is modif ied to have no input and the other inputs are 
transformed into add operators. To implement a sinusoidal oscillator, the user sets 
inf in i te Q; hence the Q parameter, as shown in the diagram, becomes zero. A f ina l 
note, for constant r ing t ime oscillation, set Q to zero, set zero input and replace the 
feedback connection of the f i rs t integrator by a factor being — e x p ( - ^ p ) and D is 
the t ime taken for the oscillation to decay to 37% of its value l . 
I n chapter 5 the complexities of d igi ta l filter implementations w i l l be discussed in 
terms of the addi t ion, mul t ip l ica t ion and delay V L S I pr imit ives . There are two 
main approaches to implementing these filters i n ASICs; either by using bit-serial 
or bi t-paral lel techniques. The latter uses more floor space and therefore is more 
expensive to manufacture. 
2.2 Additive Synthesis 
Addi t ive synthesis is a technique whereby complex waveforms can be decomposed 
into simpler waveforms ( normally sinusoidal ) and regenerated by adding the com-
ponents together. Any waveform can be decomposed into a sum of simpler wave-
forms provided that the components belong to an orthogonal basis. In the simplest 
case the components are sin(a;) and cos(x). A n orthogonal basis func t ion means the 
sin(a;) is 90° out of phase compared to cos(a;). Mathematical ly, a set of funct ions 
{fnix)}^1^ is said to be an orthogonal system w i t h respect to the non-negative 






whenever m ^ n 
The amplitude falls to 8.7 dB. 
2. Introduction to Synthesis methods 13 
The technique was pioneered by the physicist Joseph Fourier dur ing his research on 
heat flow and the results were published widely, especially in "Theorie Analy t ique 
de la Chaleur" (Analy t ica l Theory of Heat) in 1822. The Fourier series is defined 
as follows :-
Let / be a piecewise continuous func t ion on the interval [—T, T\. Then the Fourier-
series of / is the tr igonometric series 
oo 
/ ( * ) ~ y + Y,{an^s( — ) + b n s m ( ^ r ) } (2.3) 
71=1 
The coefficients a n ' s and bn's are given by the fol lowing formulas 
1 T17TT 
an = y J f ( x ) cos( — ) dx n = 0 , 1 , 2 , . . . (2.4) 
1 f'^' 717TT 
bn = - j f ( x ) sin(-^-) dx 7 1 = 1 , 2 , 3 , . . . (2.5) 
In equation 2.3, the continuous t ime f o r m has been used. I n the d ig i ta l domain, the 
Fourier series becomes periodic due to the sampling process and integrals become 
summations. Also, calculations of the coefficients in equations 2.4, 2.5 are s impl i -
fied by using complex ar i thmetic . A l l the D F T does is to correlate the data w i t h 
sinusoids to see how much of a part icular harmonic is contained w i t h i n the signal. 
Once the harmonics are found, resynthesis of the signal can be performed by using 
equation 2.3. 
A relative to sinusoidal addit ive synthesis uses what are known as wavelets. A 
wavelet is a t ime-domain and frequency-domain l imi ted func t ion . I n [38], Guil le-
main and Kronland-Mar t ine t discuss the relationships of acoustical signals i n terms 
of wavelets. This modern area of additive synthesis has advantages in signal re-
construction (resynthesis) due to its resemblance w i t h mul t i ra te DSP techniques. 
However, formant based synthesis techniques do have many similarit ies w i t h this 
approach and to a musician these are much more in tu i t ive to use. 
There are three methods to implement sinusoid additive synthesis. They are t ime-
divis ion mult iplexed fixed sample-rate wavetable oscillators, inverse F F T and mul-
t irate additive synthesis. 
2.2.1 Time-Multiplexed Wavetable Oscillators 
This approach is the logical extension of a fixed sample-rate, wavetable oscillator, 
whereby inserting registers at key locations allow the hardware to per form mul t ip le 
oscillator lookups by ut i l i s ing pipel ining in the adder and R O M output stages. The 
2. Introduction to Synthesis methods 14 
works by [10, 58, 86, 127] util ise this concept. A l l the oscillators r un at the highest 
sample-rate and the m a x i m u m number of oscillators that they are capable of is 
about 256. The oscillators are then mul t ip l ied by an ampli tude factor and summed 
together. 
A n extension to this scheme reduces the data requirements, by storing the sinusoid 
as a piecewise linear func t ion w i t h interpolat ion. To overcome certain l imi ta t ions 
caused by the straight-line segment approximation, Gaussian noise is injected into 
the audio stream to mask the audible artefacts. The storage requirements are jus t 65 
elements compared w i t h 65536 in a normal lookup table oscillator without, symmetry 
logic. A chip implementing 127 sinusoid oscillators using 1/j.m C M O S technology in 
an area of 27mm 2 was created by workers at Sheffield University [47]. 
Another method w i l l be discussed in chapter 5, which is based on the C O R D I C 
i terat ion a lgor i thm. This technique improves on the previous memory reduction 
a lgor i thm by almost 70 % 
2.2.2 Inverse FFT-based Oscillators 
This technique utilises the Fast Fourier Transform to generate a complex waveform 
f r o m the ampli tude and phases of the indiv idual harmonics. I t is only feasible be-
cause there are algorithms which per form the D F T very quickly by not ing symmetry 
caused by the discrete-time nature of the signals and the sample length. 
The technique uses the overlap-add methodology because the F F T is a periodic 
func t ion and the signal is t ime varying. The actual approach uses the short- t ime 
F F T . The number of oscillators the musician requires imparts direct ly on the com-
puta t ional cost and delay before hearing the signal. I f the a lgor i thm is based on a 
1024 point radix-2 F F T , any frequencies which are not integer mult iples of the data 
length w i l l require interpolat ion. 
This system can be bu i l t purely in software [34] and w i l l eventually be commer-
cialised. 
2.2.3 Multirate Additive Oscillators 
This approach part i t ions the oscillators in a tree which depends on the m a x i m u m 
oscillator frequency w i t h i n a part icular sub-band. The oscillators are based on the 
table lookup scheme but their sample clocks are not operating at a fixed sample-
rate, rather they operate on factors of the output sampling-rate. This approach 
optimises resolution of the oscillators and efficiency by not ing the hierarchical nature 
of music [104]. 
2. Introduction to Synthesis methods 15 
The oscillators i n each sub-bank w i l l have different frequency resolutions and the fre-
quency control word in the phase accumulator w i l l need some conversion a lgor i thm 
for the benefit of the musician. The oscillators operating at the lowest frequency 
w i l l be up-sampled in stages un t i l they have been up-sampled to the audio sampling 
rate. A t each sub-bank the oscillators running at that frequency are added and 
up-sampled up through the tree. 
The system w i l l require clever scheduling techniques and sophisticated user inter-
faces to operate [107, 108]. I t has the potential of generating thousands of sinusoids 
using current A S I C design methodologies. 
2.3 Frequency Modulation 
F M synthesis uses standard wavetable hardware as described earlier in this thesis. 
I ts applicat ion to musical tone synthesis can be traced back to John Chowning in 
1973 [11]. He used the technique to generate audio spectra where both carrier and 
modula t ing frequencies are in the audio band. Thus the side-bands generated by 
the process appears in the audio band and these f o r m the spectrum. This technique 
was sold to Yamaha and the first d ig i ta l synthesiser, known as the D X 7 , was born. 
I n F M , the instantaneous frequency of the carrier wave is varied by the modula t ing 
wave. The rate at which the carrier varies is the frequency of the modula t ing wave. 
The variat ion of the carrier, known as the peak frequency deviation, is propor t ional 
to the ampli tude of the modula t ing wave. The equation describing the process, 
having a carrier frequency, c, modula t ion frequency, m and peak deviation, d is 
shown in equation 2.6 
e = A sin(2vrc£ + — sin(27rm<)) (2.6) 
m 
When I — ^ is zero, there is no modulat ion. When I is greater than zero, sidebands 
appear above and below the carrier and are separated by rn Hz. The larger / , the 
modula t ion index, the more sidebands appear and the carrier ampli tude reduces. 
The amplitudes of the carrier and sidebands are determined by Bessel functions 
of the first k ind . These functions natural ly arise f r o m problems having circular 
symmetry. 
The to ta l bandwid th of the signal generated by F M is approximately 2 (d + m) Hz. 
The sidebands can be derived f r o m the expansion of equation 2.6, where J n ( I ) is 
the n ' t h order Bessel func t ion . 
2. Introduction to Synthesis methods 16 
e = A{J0(I) sm{2nct) 
+ Ji (7)(sin(27rf(c + m)) - sin(27ri(c - m))] 
+ J 2(/)[sin(27rt(c + 2m)) + sin(27rt(c - 2m))] 
+ J 3 (I)[sin(27ri(c + 3m)) - sin(27rt(c - 3m))] 
+ • • • } 
I t can be seen that by varying few parameters, a r ich t ime-varying spectrum can 
be produced. Chowning has been able to make quite realistic analogs of t rumpets , 
viol ins and singing voices using this non-intui t ive method. 
F M synthesis can use non-sinusoidal carrier and/or modula t ion waves to produce a 
richer spectrum. The only caveat being the bandwid th must not exceed the audio 
bandwid th , otherwise aliasing w i l l result. 
2.4 Data Reduction for Synthesis 
I n this section, I w i l l br iefly discuss techniques used to reduce the number of param-
eters to describe timbres. This is of paramount importance in all synthesis engines 
because of finite computat ional resources. So i t is impor tant to know how much 
data is necessary to produce high quali ty audio synthesis of sound. 
This aspect of sound synthesis is really beyond the scope of this thesis, as i t is 
concerned w i t h the analysis of sampled sound into its harmonic structure. These 
results are subjected to fur ther analysis to reduce the parameters necessary so that 
on resynthesis a close facsimile to the original can be generated. 
There are various approaches to this t ransformation and al l are dependent on clus-
ter ing of data into regions of s imi lar i ty and the condensing of these areas into a 
reduced parameter space. T w o impor tant methods are known as Pr inc ipal Com-
ponent Analysis and Group Addi t ive Synthesis. Pr incipal Component Analysis is 
a statistical technique used to recast a correlated mat r ix of data into a set of or-
thonormal basis vectors subject to the minimisat ion of the sum of the mean-square 
differences between the original and reconstructed data sets. I n [118], Sandell and 
Martens described this approach for additive synthesis of sound and reported that 
nearly identical resynthesis to original tones for cello, trombone and clarinet can 
be generated w i t h a 40 - 70 % data reduction. I n Group Add i t i ve Synthesis [101], 
a more complex analysis is involved which is NP-complete and thus intractable. 
2. Introduction to Synthesis methods 17 
However, the data can be reduced by using methods derived f r o m Computer Sci-
ence research for ampli tude envelopes. This scheme is s t i l l under active explorat ion 
and ongoing research. 
I n [46], different opt imiza t ion methods were applied to piecewise-linear envelopes in 
order to reduce the data needed to describe timbres. Horner and Beauchamp used 
gradient search and genetic a lgor i thm techniques and found that genetic algorithms 
were the best but were computat ionally expensive for real-time analysis/synthesis 
systems. The use of shared breakpoint times for harmonic ampl i tude envelopes en-
abled addit ive synthesis to be mapped onto wavetable interpolat ion schemes. How-
ever, the authors do not mention what synthesis engine should be used. I n addi t ion 
to this, work has been undertaken to apply a reduced data set to various synthesis 
models [45] for the Trumpet , Tenor voice and Chinese Pipa tones. B u t these require 
a great deal of human in tu i t i on to work efficiently. I n my view this is because the 
data set was designed for one part icular synthesis technique; therefore a model of 
the parameter relationships between these two methods would be required. 
I n this section, I have shown that some analysis would be required to provide syn-
thesis techniques in order to model and use instrument timbres which already exist 
i n the physical world . However, for new timbres and effects, the assumptions which 
can be applied to data reduction methods no longer apply. I n this regime unusual 
audio textures can be produced by using reduced parameter spaces. 
2.5 Waveguide Modelling 
This method tries to model the physical world by solving par t ia l differential equa-
tions using d ig i ta l techniques. Thus digi ta l waveguides are used. These are the 
discrete-time equivalents to transmission lines having two 'arms' provid ing the for-
ward and backward propagating wave solutions to a pipe. Oscillators [124] can be 
made. 
The p r imary a im of this technique is the concept that by model l ing the real wor ld 
the ar t icula t ion of the d ig i ta l simulations w i l l provide realistic timbres and smooth 
parameter updates. However, this is normally offset by the man-machine user inter-
face being different to the instrument being modelled and thus an interface model 
must be designed. 
Current ly, most waveguide based synthesis systems are monophonic because of the 
enormous memory and ar i thmetic operations required to pe r fo rm the synthesis. A 
good in t roduct ion on this subject can be found in [126]. 
2. Introduction to Synthesis methods 18 
2.6 Speech Modelling 
The most interesting of al l the synthesis methods are the ones which are based 
on human speech emulation. The human voice is acknowledged as being the most 
versatile of al l sound producing instruments and is capable of producing d r u m and 
w i n d instrument imitat ions, as well as ventriloquists and impersonators. The quest 
for speech synthesis has been man's goal for many centuries. Mankind ' s fascination 
for t a lk ing machines was f i rs t encountered in Ancient Greek mythology and more 
recently in Cervantes' "Don Quixote". The fol lowing quote sums up man's interest 
in speech synthesis :-
"The invention of a t a lk ing machine, and its operation in accordance 
w i t h a well-considered plan, would be one of the boldest schemes to 
occur to the human intellect" 
Wolfgang Von Kempelen (1971) 
I n this section t ime-domain (LPC) and spectrum ( Formant synthesis including 
FOF) synthesis methods w i l l be studied. Incidentally, waveguide modell ing has been 
used in speech product ion [76], and is normally known as ar t icula tory synthesis. 
Instead of modell ing the physical processes which produce speech, these approaches 
model the signal itself. This overcomes the l imitat ions of modell ing a system which 
is of ten non-linear and complex; thus simplifications and assumptions are made 
which result i n poor emulation of the signal. However, these methods are abstract 
i n construction, having no physically meaningful parameters and requiring complex 
rules to account for every perceptible nuance of the signal. 
However, these engines are easy to construct and are found in many appliances 
including GSM phones, speech synthesisers and audio coding for broadcast. I t is 
quite common to f i nd that analysis of instrument's mechanics in terms of the audio 
signal is easier to investigate than their physical behaviour. 
2.6.1 Time-Domain Modelling 
This approach models the sample nature of the signal rather than the frequency 
components. I n the l i terature there is some confusion as to what t ime-domain mod-
el l ing refers to. I am assuming that this technique models the ampli tude fluctuations 
as a func t ion of t ime. 
The a im of this technique is to model the signal to a linear all-pole I I R f i l te r having 
the transfer func t ion shown in equation 2.7. 




Where p is the number of poles in the f i l ter , G is the f i l te r gain and {ap(k)} are the 
parameters that determine the poles. Taking a short segment of the audio signal of 
about 20 ms and using linear predict ion the coefficients ap(k) can be determined. 
We model equation 2.7 by its inverse, which is a F I R f i l ter . The analysis process 
works out the error between the F I R f i l te r and the signal and minimises the sum of 
the squared errors and the residual signal which cannot be modelled by the f i l ter is 
stored as the residual. 
To reconstruct the signal, the parameters are inserted into an all-pole I I R filter and 
the residual signal is idealised by a noise source. Vowel sounds can be modelled by 
d r iv ing the f i l te r w i t h a periodic impulse t ra in w i t h a period equal to the pi tch. 
Pitch Frequency 
Impulse Trniu 







^ ^ | l j l j i ^ ^ ^ 
Prediction CntlTiiicnLs 
Figure 2.4: L P C Synthesiser 
A schematic diagram of a L P C synthesiser is shown in figure 2.4 and by using 
lat t ice all-pole f i l ters, the coefficients can be quantised to a few bits w i thou t loss of 
precision. Consequently, cheap and commercially viable systems can be made, e.g. 
Texas Instruments Speak and Spell. However, for music, the user requires greater 
precision. This difference can be traced back to speech coding for transmission over 
low bandwid th lines. 
2 .6 .2 S p e c t r a l M o d e l l i n g 
I n the previous sub-section the signal was approximated by modell ing the t ime-
varying signal i n the t ime domain. Here, the signal is modelled via its frequency 
components. Most physical instruments, including the human voice, are composed 
of a cavity in which standing waves are set up. The air i n these cavities thus 
resonates. I n the speech and music fields these resonances are collectively known 
2. I n t r o d u c t i o n to Synthes i s methods 20 
as formants. This technique finds some known f i l ter which can resonate and uses a 
parallel or cascade connection of them to generate a harmonically r ich spectrum. 
The V O S I M , parallel/cascade formant and FOF synthesisers w i l l be discussed in 
the next sub-sections. 
2.6.2.1 V O S I M 
The V O S I M technique was designed to create a s imulat ion of vocal sounds and 
hence was called VOcal SIMula t ion . I t was created and developed by Werner Kaegi 
in 1973. 
V O S I M uses a simple waveform controlled by three parameters and periodically 
repeated at the desired fundamental frequency. I t consists of a series of s in 2 (a:) 
pulses having a pulse-width of T, followed by a delay M. There is a decay parameter, 
b which is the factor by which each pulse height is reduced w i t h respect to the 
preceeding pulse. The first pulse has uni t ampli tude. The last parameter is the 
number of pulses in each period. I t can be clearly seen f r o m diagram 2.5 that the 
period can be varied by al tering M. A rigorous analysis of the waveform in both 
t ime and frequency domains can be found in [135]. 
Amplitude 




T : T M 
Period = N T + M 
Figure 2.5: Typica l V O S I M waveform 
To generate format- l ike regions i n the frequency domain, the musician w i l l require 
two or more V O S I M oscillators. Each oscillator models a formant . The output of 
a l l the oscillators is summed to produce the desired effect. One major draw-back 
2. I n t r o d u c t i o n to Synthes i s methods 21 
to the technique is that i t is only useful i n producing tones w i t h variable pi tch but 
stationary spectra. 
The beauty of this technique is manifest i n the hardware. Each V O S I M oscillator 
can be computed using a slightly modif ied wavetable oscillator. The f i rs t pulse is 
generated by setting the ampli tude factor to uni ty and scanning the table pointer 
forwards, thus reading the pulse at a rate dependent on the pulse-width parameter. 
Once the pulse has been read, the ampli tude factor is mul t ip l ied by the decay 
parameter and the table pointer is reset and allowed to scan through the table as 
before. I f al l the pulses are generated, the wavetable oscillator w i l l ou tpu t zeros for 
M seconds. 
2.6.2.2 P a r a l l e l / C a s c a d e F o r m a n t Synthes i s 
The formant synthesiser uses three to five second order low pass fi l ters, each having 
variable ampli tude, bandwid th and frequency controls. These fi l ters are either con-
nected in serial (cascade) fo rm or parallel fo rm. They are driven like L P C w i t h an 
idealised pulse t ra in and/or noise source. The former configuration is suitable for 
non-nasal voiced sounds and the latter is superior for nasals, fricatives and stops. 
The cascade f o r m models the vocal tract wi thout nasal coupling and thus is a closer 
model to the human voice. The parallel f o r m is an abstract model. 
I n the serial f o r m , i f a small number of resonant fi l ters is to be used, then a correcting 
func t ion is required. Ideally, this depends on the number of formants and their 
frequencies, but can be approximated by a f ixed network [76]. The advantage of the 
serial approach is that the formant amplitudes are predictable f r o m the knowledge 
of the formant frequencies and bandwidths. 
However, the parallel f o r m has greater f lex ib i l i ty and control as all three parameters 
are direct ly controllable. This creates two problems. Firs t ly , the user has yet 
another parameter to control; i n a musical context this might be desirable to aid 
novel, non-physical based timbres. The second problem is potential ly more serious. 
The phase cancellation at frequencies between two resonances may cause zeros in the 
frequency response when the parallel formant filters are summed together. This can 
be explained by considering the sum of two second-order bandpass f i l t e r functions 
i n the continuous domain 2 . 
s(2s2 + s(f31+(32) + oj2+uJ22 
.2 + _9 , a - , .2 — r „0 , a , .7\r •) , o _ , . .2\ K1^) s2+PiS + u>2 s 2 + /32s + w | (s2 + Pxs + ul){s2 + 02S + U%) 
2 T h e equation in the discrete time form is similar, except the parameters are trigonometric and 
exponential functions. 
2. I n t r o d u c t i o n to Synthes i s methods 22 
I t can be seen that the numerator term of equation 2.8 w i l l have complex conjugate 
roots between the two pole frequencies u>\ and u>2- This w i l l produce a deep minima, 
which can be removed by al ternating the signs for each formant f i l ter . 
A general parallel/series hybr id formant synthesiser [69] is shown in figure 2.6. FO 
is the p i tch and AV, AVS, A H , A F , A B and A l to A6 are ampli tude factors. RGS, 
R N P and R l to R6 are resonators and RGZ and R N Z are anti-resonators having 
variable bandwid th and frequency. 
The synthesiser as shown in figure 2.6 can be simplif ied to a standard parallel 
s tructure and the formant filters can be factorised into common terms, this would 
reduce the ar i thmetic and storage costs. 





















2. I n t r o d u c t i o n to Synthes i s methods 24 
2.6.2.3 F o r m e d ' O n d e F o r m a n t i q u e 
F O F was developed by Xavier Rodet i n 1978 and a control program known as 
C H A N T allowed musicians to generate excellent renditions of singing voices, provid-
ing the musicians composed at I R C A M in Paris, France. Later i n the development 
of the system, cymbals, drums and str ing based instruments were emulated. 
The Forme d'Onde Formantique (FOF) a lgor i thm is a source-filter synthesis model 
capable of emulating the singing voice [115]. The source is s imply an impulse t r a in 
and the f i l te r is a modif ied decaying sinusoid (a formant ) . The technique is a 
special case of a parallel formant synthesiser because the ampli tude of the signal is 
calculated in the t ime domain using exponential functions and sinusoidal wavetables. 
This method has been shown to require less numerical precision than an equivalent 
f i l t e r and is also computat ional ly efficient [116] i n integer ar i thmetic . Float ing-point 
ar i thmet ic was chosen for flexibility. There are two versions of F O F , the first being 
wavetable based as in C H A N T and C S O U N D and the other being second order 
filter based as in the Samson's Box version belonging to C C R M A , Stanford. The 
standard C H A N T configuration for F O F is a parallel F O F bank connected to a 
common excitation signal as shown in figure 2.7 
Tn + 1 r Tn - 1 
n 




s ^ i k ) 
Formant 2 
i 2 (*> 
J L 
V I R T U A L 
E X C I T A T I O N 
BWn 1L 




R E S U L T I N G 
S I G N A L 
Figure 2.7: F O F synthesiser topology. (Af te r G.Bennett & X . Rodet [6]) 
The F O F algor i thm tries to model the formant structure in the t ime domain w i t h 
an analytical expression. I t is based on half a hanning window for the attack por t ion 
2. I n t r o d u c t i o n to Synthes i s methods 25 
of the waveform, as shown in equation 2.9. 
| e~a 1 s in(w 0 t + </>) i > | 
a is related to the bandwid th of the filter and UJQ is the angular formant frequency. 
By varying /?, the attack por t ion of the waveform can be reduced or expanded w i t h 
a corresponding effect in the frequency domain. This is shown diagrammatical ly in 






A M P L I T U D E 
3 
50 
T I M E 
(a) Time-Domain representation (b) Power Spectrum representation 
Figure 2.8: Properties of the F O F wavefunction. (Af te r G. Bennett & X . Rodet [6]) 
Currently, this part icular synthesis technique has escaped hardware implementat ion 
due to its immense computat ional cost. 
2.7 Summary 
I n this chapter, various synthesis techniques have been described and hardware im-
plementations were discussed where appropriate. I n this thesis a spectral based 
parametric additive synthesis a lgor i thm w i l l be investigated. Th i s a lgor i thm is es-
sentially a formant based technique belonging to the F O F family. The fundamental 
reasoning behind using this technique is the reduced parameters needed to describe 
the spectrum of the instrument. I n additive, each sinusoid is placed in the audio 
f ield w i t h ampli tude and frequency. I n F M , though the parameters required are 
small , the parameters behave in a non-linear, counter-intuit ive manner. This makes 
i t d i f f i cu l t for musicians weaned on analogue subtractive synthesisers to create useful 
timbres. The formant based systems have in tu i t ive parameters, used in recording 
studios globally. The parameters are more than F M and, on an oscillator basis, more 
numerous than additive, but each formant oscillator is really a spread of frequency 
components. Having ampli tude, formant frequency and bandwid th parameters the 
2. I n t r o d u c t i o n to Synthes i s methods 26 
musician can place the oscillator in the audio field and bu i ld up complex evolving 
timbres. 
Another reason behind FOF-like formant synthesisers for V L S I implementat ion is 
the reduced parameter updates required f r o m a host machine [ 6 . However, internal 
bandwid th inside ASIC chips is not normally a problem because the w i r i ng is f ixed 
dur ing development. Interfacing a chip to the outside world is a major headache, 
because the number of pins On a chip determines the cost of the device and1 each 
p in carries, at most, a two state signal and consumes a f in i te area. 
C H A P T E R 3 
Number Systems for VLSI 
Processing Elements 
I n this chapter, different number systems w i l l be investigated and a review of a r i th-
metic units for V L S I w i l l be presented. The m a j o r i t y of the work presented here 
w i l l be based on integer ar i thmetic , because floating-point ar i thmetic can be fab-
ricated f r o m integer ar i thmetic parts. I t is uncommon for ASICs to be designed 
w i t h f loat ing-point capabili ty because floating-point ar i thmetic elements use huge 
amounts of f loor space, thus increasing the chip's size and cost. Integer ar i thmetic 
is preferred due to its lower cost and higher speed. However, most off-the-shelf DSP 
and microprocessor chips have f loat ing-point units because the designers know these 
devices are to be used in general, non-optimised applications. They also provide a 
simple mechanism to upgrade funct ional i ty via reprogramming of software. 
3.1 Introduction to Number Systems 
Humans use the signed decimal number system to pe r fo rm ari thmetic operations. 
Most computers use a binary number system to per form useful ar i thmetic compu-
tat ion. Bu t can computers use a different number system which w i l l result in faster 
and more efficient computat ion ? I t seems possible that they can, as humans use of 
the decimal system arose f r o m their use of fingers and thumbs, and digi ta l computers 
use of binary logic arose f r o m the on off switch. Ar i thme t i c computat ion can be per-
formed in any number system the chip designer desires. Some number systems, like 
27 
3. N u m b e r S y s t e m s for V L S I P r o c e s s i n g E l e m e n t s 28 
residue ari thmetic, minimise the number of carries in addi t ion and mul t ip l i ca t ion 
operations and consequently these operations can be performed quickly [75, 90]. 
There are six impor tan t number systems known; they are the conventional radix, 
the signed-digit, the residue, the rat ional , the logari thmic and the f loat ing point . I n 
this chapter, the f irst three w i l l be discussed in greater detail . A summary of these 
number systems follows :-
C o n v e n t i o n a l R a d i x S y s t e m A fixed radix ar i thmetic system w i t h a radix r > 2 
and a digi t set of 
{ 0 , 1 , . . . , r - 1 } 
A l l the digits are positively weighted and each number is uniquely represented; 
e.g. decimal, which is base-10 and has a d igi t set of { 0 , 1 , . . . , 9} . 
Signed-Dig i t S y s t e m I n this system, bo th positive and negative weighted digits 
are allowed for each digi t , and the digi t set is 
{ - a , . . . , - 1 , 0 , 1 , . . . a } 
Where a is a bounded positive integer, and w i l l be discussed in section 3.3. 
This is a redundant number system in that the signed-digit representation of 
a number may not be unique. 
R e s i d u e S y s t e m This system has no weighting factor assigned to each digi t of 
a residue number. The order of the digits is immater ia l in determining the 
value of the number. I t is possible for mixed radices to be assigned to different 
digits. I n a residue number system X is an integer represented by an n-tuple 
equal to 
X = {r\,r2,...,rn}m 
w i t h respect to another n-tuple being 
m = { m 1 , m 2 , . . . , m n } 
Each ri is called the residue of X modulo m ; , where all the n modu l i {mi \ 
i = 1,2, . . . , n } are pairwise relatively prime. A l l the n residue digits i\ for 
i = 1,2, . . . , n can be independent ly processed. The system is therefore 
carry-free when per forming addi t ion and mul t ip l ica t ion . 
This system was invented to combat ari thmetic errors in the design of reli-
able comput ing systems. I t can consequently be made fau l t tolerant, since 
operations can be performed i n parallel w i t h no carries. 
3. N u m b e r S y s t e m s for V L S I P r o c e s s i n g E l e m e n t s 29 
R a t i o n a l S y s t e m This system represents numeric quantities as fractions in terms 
of numerator and denominator integer pairs. The system always yields ra-
t ional numbers and under ar i thmetic operations ( + , - , x , - ^ ) , may be closed 
wi thou t resorting to inf in i te precision, i.e.. 5 = 0.3333 • • • The problem w i t h 
this system is that the numerators and denominators can become large at 
the beginning of a moderately sized computat ion and i t is s t i l l a theoretical 
concept. 
L o g a r i t h m i c S y s t e m This system employs a real number n > 1 as the B A S E . 
The set of real numbers is defined by the fol lowing logari thmic space L M . 
L M — {X I \x\ = fi1; i an integer} U {0} 
This allows geometric rounding rather than ar i thmetic rounding which en-
hances the number accuracy. The system is bu i l t using integer ar i thmetic 
units and log-antilog functions. 
F l o a t i n g Po in t S y s t e m I n this system a real number is expressed as two numbers 
/ = (rn, e) = m x re 
where m and e are each a signed, fixed-point number and r is a given radix. 
The label m is known as the m a n t i s s a of the number f and the label e is 
known as the exponent of the number f. 
The mantissa is normally fract ional in one of the conventional number systems 
and the exponent is a biased or unbiased r's complement integer. The radix r 
is self-implied and consequently is not stored. The radix point of the mantissa 
can float around by adjust ing the exponent, but i n the normalised number 
representation, the point lies near the most significant mantissa d ig i t . 
Before the discussions on conventional, signed-digit and residue number formats, i t 
should be noted that the choice of a number system for ar i thmetic operations is 
governed by three points. 
• Efficiency of representation 
• Facili ty for ar i thmetic design 
• Rel iabi l i ty of operation 
A designer could decide to use binary computers and implement decimal ar i thmet ic 
using Binary-Coded-Decimal number representation. 
3. N u m b e r S y s t e m s for V L S I P r o c e s s i n g E l e m e n t s 30 
3.2 Conventional Radix Number System 
As mentioned in section 3.1, a radix-r number, X , can be represented in a d ig i ta l 
computer by a d ig i ta l vector of (n-r-k)-tuples 
where each component X{ for —k < i < n — 1 is called the i ' t h digit of the vector X . 
The first n digits ( x 7 1 _ i , . . . , X I , X Q ) f o r m the integer por t ion of the number X and 
the remaining k negatively indexed digits £ - 2 , • • • , f o r m the fractional 
por t ion of the number X . A rad ix point is used to divide these two portions and 
is not stored in the computer, but is implied to exist at a certain point ! 
M i x e d radix numbers are those assuming different radix values in different d igi t 
positions. A n example would be the t ime format hours, minutes, seconds; having 
the mixed radices of ( 24, 60, 60 ) . 
3.2.1 W e i g h t e d N u m b e r S y s t e m s 
There are various number systems w i t h i n our famil iar radix number system. These 
subclasses are based on the weighted radix number format . I n this number format , 
each d ig i ta l vector, X is associated w i t h a unique value denoted by 
7 1 - 1 
X v = Xi • U{ 
i= — k 
where each u>i is called a weight ing factor for the i ' th digit. The n + k weighting 
factors f o r m a weighting vector denoted by 
W = ( w n _ i , . . . ,wo,u>-i , • . . 
The value of the number X can be obtained by X • W, the dot p r o d u c t of the 
two vectors X and W . The i l lustrat ive example below w i l l be beneficial. The weight 
vector 
VK = ( r " - 1 , . . . , r ° , r - 1 , . . . , r - f c ) 
w i l l lead to the conventional positional representation of a radix-r number X , w i t h 
a value 
X u — X • u 





3. N u m b e r S y s t e m s for V L S I P r o c e s s i n g E l e m e n t s 31 
The higher the radix number, r, the more radix y digits are required to encode each 
radix-r d ig i t ( where y ^ r and y < r ) 1 Therefore i f the ASIC designer wishes to 
encode decimal, then r = 10, k — [ l o g 2 10] = 4 bits. 
The weighted number system can be fur ther subdivided into three common signed 
formats, known as S ign-Magni tude , D i m i n i s h e d - R a d i x C o m p l e m e n t and R a d i x 
C o m p l e m e n t representations. 
3.2.2 S ign-Magnitude Representat ion 
This format denotes digits of A as ( a n _ i a n _ 2 • . . a\ao)T, where a n _ i denotes the 
sign b i t and is zero for positive integers and (r — 1) for negative numbers. The 
representation for sign-magnitude numbers is 
A = {{r - l ) ? n n _ 2 • •. m i m 0 ) r 
where m* for n-2 > i > 0 are the true magnitude digits and the magnitude equals 
n - 2 
t=0 
This number system has two number representations for zero and this zero vector 
is called a d i r t y zero. 
3.2.3 D i m i n i s h e d - R a d i x Complement Representat ion 
This fo rmat represents positive numbers by 
A = (077l n _ 2 • • - mim0)r 
and negative numbers by 
A = ( ( r - l ) m n _ 2 . . • m i m 0 ) r 
where m ; = (r — 1) — 7Ti; for n-2 > i > 0. Therefore, A = r " — 1 — A. 
The diminished radix complement representation is also known as the ( r - l ) ' s com-
plement. This representation has a non-unique zero vector, like the 
sign-magnitude representation and is thus redundant. 
' in general for binary computers, at least k bits are required to encode a radix-r digit, where 
k = [log2 r ] and \x\ means the least integer that is not less than the real number x. 
3. N u m b e r S y s t e m s for V L S I P r o c e s s i n g E l e m e n t s 32 
3.2.4 R a d i x Complement Representat ion 
Thi s format represents positive numbers by 
A = ( 0 m n _ 2 • • • m i m o ) r 
and negative numbers by 
A = ( ( ( r - l ) m „ _ 2 • • • m i m 0 ) + l ) r 
I n this notat ion, A = r n — A. This representation is known as the r's complement 
representation. 
Th i s representation has a unique zero value and the zero vector is known as a c l ean 
zero. The representation is thus non-redundant. 
3.2.5 Propert ies of Sign-Magnitude, r - l ' s and r's Complement 
N u m b e r s 
I n a l l these formats, al l the digits are required to be fi l led according to the fo l lowing 
rules :-
• Sign-Magnitude numbers are f i l led w i t h leading zeros for positive and negative 
numbers. 
• ( r - l ) ' s complement numbers are f i l led w i t h the sign's value. Th i s is known as 
sign extension. 
• r's complement numbers are f i l led w i t h the sign's value and this process is 
known as s ign extension, 
I n a l l these formats, overflow can occur when the positive number exceeds the upper 
bound and underflow can occur when the negative number exceeds the lower bound. 
T h e beauty of the fixed-radix point number formats discussed, is that every n-digi t 
integer can be considered as a f rac t ion mul t ip l ied by a constant factor rn\ conversely, 
every k-digi t f ract ion can be considered as an integer mul t ip l ied by a constant factor 
r~k. This is very useful when the designer of an ari thmetic un i t wants to implement 
the uni t i n f ract ional notat ion and wi thou t the extra complexity of f loat ing-point 
number representations. 
3. N u m b e r S y s t e m s for V L S I P r o c e s s i n g E l e m e n t s 33 
3.3 The Signed-Digit Number System 
The Signed Dig i t ( SD ) number representations allow redundancy to exist and are 
useful i n designing high-speed ari thmetic machines [4]. Each signed digi t may need 
more than one b i t to represent i t 2 . Wider data buses and an increase in data 
storage are required, but i t is wor th using the speed improvement the technique 
provides. 
3.3.1 Def ini t ion of Signed-Digit N u m b e r s 
Given a radix r, each digi t of an SD number can assume the fol lowing 2a + 1 values. 
S r = { - a , . . . , - 1 , 0 , 1 , . . . , a } 
where the max imum digi t magnitude a must be w i t h i n the fol lowing region. 
7 - ^ - < a < r - l (3.1) 
Notice that integers are assumed to satisfy the fol lowing inequalities a > 1 and 
r > 2 
To yield the m i n i m u m r e d u n d a n c y in the balanced digi t set T,T, one can choose 
the fo l lowing value for the max imum magnitude. 3 
a = 
Sometimes i t may be useful to allow a = r e / 2 , where re is even and a = (ra — l ) / 2 
when r0 is odd. Thus these adjacent odd and even radix values may give the same 
digit set. A n example of this d igi t set corresponding to this choice of a is shown 
below in two different forms; Bu t is the same set when r0 = re + 1. 
Sr . = { - ^ , . . , - 1 , 0 , 1 , . . . , ^ } 
S r e = { - ^ , . . . , - 1 , 0 , 1 , . . . , | } 
The SD number digi t set for a radix-2 number is £ 2 = { - 1 , 0 , 1 } 
The algebraic value, y„, of an SD number 
y = ( y n _ i . . - y o y - i • • -y-k)r (3-2) 
"The sign bit is included in each bit. 
3 [xj is the largest integer that is less than or equal to the real number x. 
3. N u m b e r S y s t e m s for V L S I P r o c e s s i n g E l e m e n t s 34 
can be evaluated by 
71-1 
i=—k 
Notice that there is no expl ic i t s ign as y can be positive and negative, also the 
zero has a unique expression if , and only if , yi — 0 for a l l i i n equation 3.2. 
The negation —3^  of an SD number 3^  is achieved by changing the sign of all non-
zero digits i n y. Since zero is unique —0 = 0. The negative non-zero digits are 
represented by x, where x is the positive non-zero digi t before negation. 
The SD number system is used because it eliminates the carry propagation chains 
in addi t ion and subtraction operations. I n order to break the carry chain, enabling 
fast addi t ion and subtraction operations, the lower bound on a should be made 
tighter and is expressed in equation 3.3. 
< a < r - l (3.3) 
I t is known that division can use the less strict bound as found in equation 3.1. 
As mentioned previously, this number system allows non-unique representations of a 
par t icular value. The weight of an n-digi t SD vector y w i t h value yu is the number 
of non-zero digits i n the representation and is denoted by uj(n,y„). I n general, the 
weight of an n-digit SD vector is defined as 
71-1 
v{n,y„) = \Vi\ 
i=0 
and \yi\ = 1 i f y% ^ 0. 
The SD vector w i t h the m i n i m a l weight is called a m i n i m a l S D representa t ion 
w i t h respect to given values of n and yv. Later on i n this thesis, the reader w i l l see 
the importance of the min imal set applied to mul t ip l ier design. 
3.3.2 Convers ion Between T h e Convent ional R a d i x - r N u m b e r and 
its S D F o r m 
Let X = ( a ; n - i , . . . , i i , . i o ) r bea conventional radix-r number and y = ( y n - i , • • • ,yi, Vo) 
be the equivalent SD number. 
3.3.2.1 C o n v e r s i o n f r o m C o n v e n t i o n a l to S D S y s t e m s 
For every conventional d igi t X{, the i n t er im difference d ig i t di is generated by 
r + 1 
2 
3. Number Systems for V L S I Processing Elements 35 
and the borrow digit bi+\ is defined to be 
if xx < a 
i f %i > a 
yt = di + bi 
This conversion process has no borrow propagation and each digit is independently 
generated. This means that the number conversion can work in parallel or from any 
digit position. 
3.3.2.2 Conversion from SD to Conventional Systems 
This reverse process is achieved by adding the two SD numbers y+ and y~ together. 
The y+ number is formed from the positive ( non-zero ) digits of the SD number y 
and the y~ number is formed from the negative ( non-zero ) digits. The example 
below should make this process clear. 
y = y+ + y- = (1452) 1 0 
X = +(1050)io - (0402) 1 0 = (0648) 1 0 
This conversion performs subtraction in the conventional number system and con-
sequently is not carry-free. 
3.4 The Residue Number System 
The Residue Number System ( RNS ) is defined in terms of a set of moduli. I f V 
denotes the moduli set, then 
V = {P1.P2, • • • ,PL] 
The pi's are all integers and are pairwise relatively prime. Any integer in the residue 
class ZM 4 , where 
L 
has a unique L-tuple representation given by 
v R N ? ( \ X — • \x\,x%,... ,xL) 
4 A ring of integers in Modulo M; ie { 0 , 1 , . . . , M - 1} 
= { i ' , 
The i ' th SD digit, y;, is then obtained by 
3. Number Systems for V L S I Processing Elements 36 
and Xi = X mod p; and %i is called the i ' th residue of X. 
For a signed number system, any integer in (— -y, 4f) has a RNS L-tuple represen-
tation given by 
and this is a symmetric system. 
The residue number system assigns all digits to have equal importance, unlike 
conventional binary systems which are weighted. I f two RNS numbers X and y are 
operated using the following arithmetic operators ( + , - , x ) and the result is stored 
in .Z, then 
Z = X oy = ((xi o y i ) mod pi,...,{xLo yL) mod pL) 
only if and only if Z belongs to ZM • 
Notice that the i ' th RNS digit, 2, is defined in terms of (xt o y^) mod p, and this 
implies that the operators do not generate any information ( carry signals ) from any 
other RNS digit !! The multiplication operator results in an RNS number belonging 
to Z2M, if the two RNS numbers operated on belong to ZM- This number system 
makes possible high speed concurrent arithmetic operations and consequently is 
very attractive to implement in VLSI [36]. 
The division operator is not closed in the RNS system and consequently can only 
be approximated via iteration. 
3.4.1 T h e Mult ip l icat ive Inverse 
The multiplicative inverse is crucial in number conversions involving the RNS system 
and therefore an understanding of its formulation is essential [132]. 
For 0 < a < P, i f there exists an integer a - 1 such that a-a~i mod P — 1, then a - 1 is 
the multiplicative inverse of a mod P. The multiplicative inverse of a - 1 mod P 
exists and is unique if and only if the Greatest Common Divisor of o and P is 1. 
Multiplicative inverses are derived by 
1. Direct Search Method. 
2. Trial and Error Method. 
3. Using Euler's Formula. 
X mod p i f X > 0 
x (M — \X\) mod pi i f # < 0 
The next section will discuss Euler's Formula. 
3. Number Systems for V L S I Processing Elements 37 
3.4.1.1 Euler's Formula 
The 'totient function', denoted by <p(p), when p is larger than 1, is the number of 
non-zero elements in Zp that are relatively prime to p. When p equals one, then 
the totient function cf>(p) — 1. 
I f p is a prime q, then all non-zero elements of Zq are relatively prime to and 
so 4>(q) = q — 1 whenever q is a prime. I f p is a power of a prime, then the only 
elements of Zp not relatively prime to qm are the qm~l multiples of q. Therefore 
= qm - qm~l = qm-l{q - 1) 
Euler's theorem states that if the Greatest Common Divisor of a and q equals one 
then a^'fi mod q = 1 and if p is any positive integer, then i t also follows that 
a~l mod p can be written as 
Q - i = a * ( P ) - i m o d p 
3.4.2 Res idue to D e c i m a l Convers ion 
There are two methods to carry out this task, the Mixed-Radix Method and 
The Chinese Remainder Theorem. This thesis will discuss the mixed-radix 
method first. 
3.4.2.1 Mixed-Radix Method 
Before delving into the method some new notation will be introduced as follows :-
< a • b >p= a • b mod p 
c _ 1 [p ] is the multiplicative inverse of c mod p 
The Mixed-Radix Coefficients are generated by 
L-l i 
* = a i n 
i=0 k=0 
provided 0 < aj < for all i > 0 and po = 1. The coefficient a, is computed 
using a recursive algorithm employing intermediate variables Sij with x — SQ>Q = QQ; 
XI = 5 0(i_!) for / = 2, 3, . . . , L and Su = aj for i = 0 , 1 , . . . , (L — 1). The recursive 
formula is 
Stj = (<S(i_i)j - < — > p ( j + i ) ) 
\ Pi I p(j+l) 
Where j — i , {i + 1 ) , . . . , (L - 1) for each subscript i. 
3. Number Systems for V L S I Processing Elements 38 
3.4.2.2 Chinese Remainder Theorem 
The Chinese Remainder Theorem is 
Pi 
i=i M 
and Si = ^7 and s 1 = s l \ p ] . 
3.4.3 T h e Quadrat i c Res idue N u m b e r Sys tem 
The Quadratic Residue Number System [22, 133] is an encoding system which wil l 
make complex multiplication equivalent to two real multiplies, and complex addition 
equivalent to two real adds. 5 This system relies on the mathematics of Gaussian 
Primes and the rules are as follows .-
1. A l l integer primes of the form p = 4k + 3 are among the Gaussian primes. 
2. Since all integer primes of the form p = 4k - f 1 can be expressed as a sum of 
two squares, they may be factored into two distinct Gaussian primes as given 
by p = a2 + b2 = (a + jb)(a — jb) and j — n/^ T. 
3. The prime number 2 can be factored as (1 + - j), but only one of these 
factors is distinct. 
3.4.3.1 Conversion To Complex Quadratic Residue Number System 
To convert a complex number into the equivalent residue format, the following 
procedure is adopted. 
If, given a complex integer input of the form (rn + nj), then 
P is prime, so that the residues mod p forms a field and consequently the multi-
plicative inverses of a can be found. Therefore, multiplying by a _ i [ p ] implies 
x+ — x+ba~l\p]j = < (m -I- a _ 1 [p]n) + (n — ba~l\p]m,)j >p 
5Normal complex addition requires two real adds and normal complex multiplication requires 
four real multipliers and two real adds. 
x+ =<m + nj >{a+bj) 
So 
x+{a - bj) 
x+a — x+bj 
- <{m + nj){a - bj) > { a + b ] ) { a - b : j ) 
= < (am + bn) + (an - bm) j >p 
3. Number Systems for V L S I Processing Elements 39 
and is satisfied by 
x+ — < m + a~l\p]bn >p 
and is a real integer. For a particular choice of p, the term < a~l\p]b >p is unique 
and can be labelled Kp. 
The other Gaussian prime factor of p can be shown to be 
x~ = < m - a~l\p\bn >p—< m — Kpn >p 
Therefore x+ and x~ are real integers which define the complex number m + nj 
as long as 
Cm < m < Cm + p - 1 
and 
cn <n < Cn+p — 1 
Where the constants c™. and Cn are fixed constants and set to — so the complex 
numbers are centred about the origin in both ordinates. 
3.4.3.2 Conversion Back To Conventional Number System 
Let us assume that we have a 2r 'digit' residue representation 
(x'j , Xy , • . . , X^ , X ft, ..., xr , xr ) 
Now forming 
y'k =< x t + xk >vk 
and substituting for x~£ and derived previously, results in 
y'k=<2m >Pk 
Finally 
yk = < < 2 _ 1 >pk< 2m >pk>pk = < m >Pk 
and is the residue of the real part mod pk-
The other term z'k —< xk — xk >pk becomes 
zk = < n > p f c 
and is the residue of the imaginary part mod p^. 
The final stage of the conversion is by passing and zk through the Mixed-Radix 
Conversion method or by using the Chinese Remainder Theorem as discussed earlier. 
3. Number Systems for V L S I Processing Elements 40 
3.5 Addition Elements 
In this section, adders and subtractors for conventional binary and signed-digit 
number representations will be discussed. 
3.5.1 Convent ional B i n a r y Addi t ion and Subtract ion 
The simplest method of addition/subtraction in sign-complement addition is to 
follow the approach which is taught at school. So, addition is performed from the 
least significant digit with zero initial carry through to the most significant digit in 
a wave-like propagation. These adders are called ripple cany adders because the 
carry flows through from the least significant digit. Overflow can be detected by 
checking the carry bits on the penultimate two adder outputs and if they differ the 
result of the arithmetic operation is wrong. Subtraction is performed by adding 
the two numbers together, but radix-complementing the subtrahend. Addition is 
performed using a full adder, where the sum output is the digit addition with 
carry in binary and the carry output is the overflow from the sum. In digital form 
this is represented by 
S{ — Ai © Bi © Cj 
C j + [ = Ai • Bi + Bi • Ci + Ai • Ci 
An add/subtract cell using an additional control line M, with M equal to one to 
subtract and zero for addition, can be represented by 
Si = Ai • {Bi • M) • d 
Cl+V = (Al + Ci)-(Bi-M)+Ai-Ci 
Figure 3.1 shows block diagrams of a ones-complement and twos-complement 
adder/subtractor ripple-carry adder design. The implementation of a 
sign-magnitude adder/subtractor requires a large number of XOR gates and a more 
complex sign and overflow circuit. A l l these circuits except, sign-magnitude coded 
adders based on the ripple-carry approach, take 2n + 6 time slots for an n-bit adder. 
This total time (or delay) is the time taken for the carry in the least-significant digit 
to travel through to the most-significant digit. 
As the adder's data word increases, the ripple-carry method becomes inefficient. 
Therefore alternative adders have been developed which reduce the carry propaga-
tion and thus increase speed of computation. The carry-completion sensing adder 
is an asynchronous self-timed unit that aims to reduce the carry propagation length 
3. Number Systems for V L S I Processing Elements 41 
Sign Hits 
A,, . , H „ . A , . , B„_ A | H, A , H , 
Sign lilts 
A f l . , I I D . , A 0 , , H n . 2 A , I I , A B I t , 
11 Atx lxL . XL X T X L X 
U1 
{ it / 
I S 
A d d 
Sulitnjcl 
(a) One's Complement Adder (b) Twos's Complement Adder 
Figure 3.1: Block Diagrams of ripple carry adders for conventional binary arithmetic 
to n/log 27z. Alternatives are conditional sum and carry-select adders. High speed 
addition can be accomplished with a carry lookahead technique. This approach 
computes the carry bit for each digit in parallel and sends this information to the 
sum part of a fu l l adder. However, for large bit lengths, this approach requires an 
exponential increase in logic gates making chip routing and power dissipation more 
difficult. 
3.5.2 Signed Digit Addition and Subtraction 
Using the signed-digit number system, the designer can build a SD adder/subtractor 
with a carry propagation to one position to the left during the operation. The time 
required to perform the parallel addition is equal to the time required to add two 
adjacent digits. This is achieved by the following :-
• The radix r > 2. 
• The algebraic value of zero must have a unique SD representation. 
• There must be transformations between conventional sign-magnitude m-digit 
representation and SD m-digit representation for all algebraic values within 
the machine range. 
3.5.2.1 T h e Algorithm. 
For parallel operation all digital positions n - 1 > i > - k for (n+k)- digit SD numbers 
must satisfy :-
3. Number Systems for V L S I Processing Elements 42 
1. Let Si be the i ' th sum digit of the resulting sum 
S = (sn-\ • • • s\So • s - i • • • S-k)r = Z + y and U be the transfer digit from 
the ( i - l ) ' th digital position. Then s, = f { z i , y i , t i ) . 6 
2. The transfer digit t l + \ to the ( i + l ) ' t h digital position on the left is a function 
of the augend digit Zi and the addend digit y^. ie. = g ( z i , i j i ) . 
To achieve parallel subtraction of the subtrahend digit y z from the minuend digit 
Zi, the operation 
Zi~Vi = z% + yl 
This operation is SD parallel addition of the minuend digit added to the additive 
inverse of the subtrahend digit. 
The transfer digit, U can assume positive and negative values in SD addition and 
subtraction and is never propagated past the first adder position on the left. 
The SD parallel adder/subtractor is a two step process on the digits Zi, \ji and tx 
and is the following :-
1. The outgoing transfer digit t l + i and the interim sum digit UJI are generated 
by the addition of Zi to yi and is 
r • t i + i +u>l = z t + yi 
2. The sum digit s, is obtained by adding Wi to t j , the transfer digit from the 
digital position i — 1, and is calculated by 
Si — k>i + ti 
This algorithm is shown graphically in figure 3.2 and for parallel operation 
\si\ < |z j | and |y^| 
and the unique zero representation criteria is satisfied by 
\Z{\ < r - 1 
SD subtraction requires the condition that for every yi = a, there exists yl = —a 
such that 
Vi + ¥i = a + (~a) = 0 
6Zi and yi are the i-th digits of the augend Z and the addend 3^  respectively. 
3. Number Systems for V L S I Processing Elements 43 
z. 
I 
t i + l 
(0; 
—^ 





Figure 3.2: Block Diagram of Part of the SD adder/subtractor 
The convertibility requirement between conventional and SD format is in the same 
form as in section 3.3.2.1, but the transfer digit ( the borrow difference ) can take 
negative values. The condition 
\u>i\ < r — 2 
is obtained as the upper magnitude limit for the interim sum if the transfer digit 
t{ is restricted to - 1 , 0, 1. Therefore the minimum base to achieve the operation 
is three ( by substitution for the conventional digit in section 3.3 and xt = 1 ). A 
consequence of these conditions imply that the set of allowed values for each interim 
digit is 
t { - ( r - 2 ) , . . . , - l , 0 , l , . . . , r - 2 } 
and this leads to two enlarged digit sets which are considered minimal when each 
digit assumes the smallest value a = for odd radix and a = '-f + 1 for even 
radix and maximal when a = r — 1. 
When u>i belongs to the sequence { w m j n , 
tj-t-i is computed by 
-1,0,1,... tumax} th<2 transfer digit 
°» i f umin < z i + Vi < W m a x 
1, if Zi + yi > w m a x 
- 1 , i f Zi + yl < w m i n 
3. Number Systems for V L S I Processing Elements 44 
I t is easy to perform addition and subtraction from least significant digit to most 
significant digit or vice versa. 
The designer selects the required digit set for a;,-, the interim sum digit to be the 
minimal set corresponding to the radix and the inputs to the adder/subtractor has 
a digit set which is minimal according to a above. 
Arithmetic right shifting an SD number and the transfer digit is generated when 
\z-k\ < l zi|max ~~ 1 
and is useful in multiple precision arithmetic operations. Overflow due to shifting 
left can be predicted by inspection of the most significant two digits before the 
shift operation. The total time delay of a fully-parallel SD adder/subtractor is 
determined by the add time of only one stage of the digit adder ( ie. stage I and 
stage I I ) and is independent of word-length ( m = n + k ). 
3.6 Array Multiplication Techniques 
Array multiplication algorithms are modelled on the familiar multiplication tech-
nique children use to perform multiplication. This method uses addition., multipli-
cation and radix left shifting, starting from the units column. This is demonstrated 
in equation 3.6. 
There are two basic types of modular array multipliers, which are known as the 
Non-additive Multiply Module ( N M M ) and the Additive Multiply Module ( A M M 
). The N M M type computes the expression 
Y = A x B (3.4) 
and the A M M type computes the expression 
Y = AxB + C + D (3.5) 
The four by four N M M structure is displayed in figure 3.3 and implements equa-
tion 3.4 by the following worked equation. 
«3 «2 c.1 a0 
x) 63 b2 bi bp 
a 3 - bo a 2 • 60 ai • b0 a 0 • bo 
a.3 • 6j a.2 • b\ a\ • b\ ap • b\ (3.6) 
a 3 • 6 2 a 2 • 6 2 ai • 6 2 ao • 62 
+) a3 -6 3 a 2 • 63 ai • 63 a 0 • 63 
Pi Pe P5 PA P 3 P2 Pi Po 
3. Number Systems for V L S I Processing Elements 45 
I7.A. 
p l Pt p i
 pt f> P l 
Figure 3.3: The Structure of a 4x4 N M M 
The four by two A M M structure is displayed in figure 3.4 and implements the arith-

















bo a0 • b0 
+) 
C.3 C2 C l CO 
d0 
P5 PA Pz P2 Pi Po 
1 
Figure 3.4: The Structure of a 4x2 A M M 
The A M M array multiplier ( figure 3.4 ) is particularly attractive for DSP because it 
3. Number Systems for V L S I Processing Elements 46 
performs a MAC operation. I t uses less fu l l adders than the 4x2 N M M and requires 
less AND gates to calculate the partial products. But compared to a square matrix 
( eg.. 4x4, 8x8 .... ) the total delay time through the fu l l adder network is two 
units slower than the N M M case [52]. The A M M wil l use more external wires than 
the N M M because it computes a more complicated function. The N M M has the 
disadvantage in requiring additional summing devices like multiple-operand Wallace 
tree adders; this however gives the structure a doubling in speed compared to the 
A M M case. 
The k-input Wallace tree is a fc-to-^- carry-save ful l adder and is used to create 
the sub-products simultaneously. A further requirement is to separate the operands 
into blocks usable in the 4x4 N M M . This is written as 
P = A x B = {AH • AL) x {Bh • BL) = A H x B H + AH x BL + AL x BH + AL x BL 
for construction of an 8-by-8 array multiplier using four 4-by-4 N M M structures. 
The last term AL X BL needs no further modifications, but the other parts require 
eight 3-input Wallace trees and a 12-bit Carry Propagate Adder. 
To implement either type of array multiplier requires a large floor plan for VLSI, 
implying a large number of logic gates creating a fast multiplier but also a large 
power dissipation problem. Apart from the speed, the logic elements are repetitive 
and useful in VLSI design. 
3.6.1 Cellular Array Multipliers Conclusions 
There are many variants of cellular array multipliers based on the A M M and N M M 
type devices. Some also use novel adders like Wallace trees and carry propagate 
adders. There is also a non direct Universal Multiplication Array ( UMA ) de-
vice which provides multiplication in the standard binary formats ( unsigned, sign-
magnitude, one's complement and two's complement numbers ) using support logic. 
The UMA is normally created using programmable A M M modules. 
Designers have built specialised two's complement array multipliers using four types 
of adder. Each adder type has a certain number of negatively weighted inputs which 
are connected together using a certain mixture of adder types. The fastest adders 
are the Pezaris and Tri-Section, using all four types of adder and three types of 
adder respectively. The Baugh-Wooley uses one type of adder and is slightly slower 
than the others; it uses more combinational logic to provide the various product 
terms for the array. 
As mentioned before, all these devices perform multiplication very quickly at the 
expense of a large network of logic gates with many connecting wires. This creates 
3. Number Systems for V L S I Processing Elements 47 
a large amount of heat which has to be dissipated. Also valuable silicon is used up 
to perform one function. I t would be better to use the minimal amount of silicon to 
provide many functions at a reasonable speed and at a low power dissipation ! The 
signed-digit number system, residue number systems and the CORDIC algorithm 
all show promise in this area. 
3.7 Canonical Multiplier Recoding 
3.7.1 Introduct ion to Mult ip l icat ion Str ing Recoding Algor i thms 
The standard binary multiplication algorithm is performed by adding the partial 
sums and arithmetic left shifting the operands. This is efficient only when the 
multiplier has more zeros than ones. I f the multiplier has a string of ones then the 
standard algorithm will operate more slowly due to the increased number of ones. 
This problem can be solved by recoding the multiplier to search for consecutive ones 
and recode the data. 
This recoding process can be shown by the following process; assuming there is a 
string of consecutive ones ( non zero elements ) in the multiplier. 
Column Position . . . i + k i + k - 1 i + k - 2 . . . i i - 1 
Bit Content . . . 0 1 1 . . . 1 0 
Now the string property 
2i+fc _ 2* — <2i+k—\ _j_ <^i+k—2 _|_.. ._|_ 2 ' _ n -f- 2* 
and the k consecutive ones can be replaced by the left hand side of the string 
property equation. Hence the recoded multiplier is the following :-
Column Position . . . i + k i + k - 1 i + k - 2 . . . i i - 1 
Bit Content . . . 0 1 0 . . . - 1 0 
Therefore execution time will be reduced, because the algorithm wil l shift across k-1 
consecutive zeros and will perform one addition and one subtraction. The next stage 
of the algorithm is to work out how to search for strings of ones using either non-
overlapped binary bit pairwise scanning or overlapped triplet bit pairwise scanning 
techniques. Table 3.1 shows the properties of triplet scanning. 
This recoding method uses carry-save adder trees and a carry propagate adder. 
3. Number Systems for V L S I Processing Elements 48 
Multiplier Bits 
The low-bit of The Present Pair 
the next higher pair Multiplicand Multiples Reasoning behind 
xi+l Z i to be added string property 
0 0 0 0 No string of ones 
0 0 1 + 2A End of string 
0 1 0 + 2A Isolated one 
0 1 1 + 4A End of string 
1 0 0 - 4A Beginning of string 
1 0 1 - 2A Beginning and ending 
1 1 0 - 2A Beginning of string 
1 1 1 0 Centre of string 
Table 3.1: Multiples of the multiplicand to be added after scanning a triplet of 
multiplier bits in an overlapped pairwise scanning system 
3.7.2 Canonical Signed-Digit Multiplier Recoding 
I f the designer now replaces conventional multiplier digits with the high redundancy 
SD code, the addition operations are reduced and there is a greater average shift 
length across the zeros in the multiplier, leading to faster operation. 
The requirements are a minimal SD vector which has the minimum weight for a 
prespecified a as in section 3.3; this vector must be canonical. A minimal canonical 
SD vector means the SD vector V = dn-\ •••dido contains no adjacent non-zero 
digits, therefore 
di x di-\ = 0 f o r l < i < n — 1 
Research in this field has shown that a unique canonical SD vector V for a number 
with a fixed value a and a fixed length n exists provided the product of the two left 
most digits in V does not equal one. This is expressed as 
d„_i x dn-2 ^ 1 
and is satisfied if an additional digit, which equals zero, is inserted at the left most 
end of the vector V. Now the number is an n+1 digit SD vector with a leading digit 
zero and the following algorithm is used to recode the conventional number system 
into the canonical SD format. 
3.7.3 Canonical Recoding Algorithm 
Given an (n+l)-digi t binary vector B = bnbn-\ • • • b\bo with bn — 0 and bi e {0,1} 
for 0 < i < n — 1. The (n+l)-digi t canonical SD vector V with dn = 0 and 
3. Number Systems for V L S I Processing Elements 49 
di = {1,0,1} is manipulated so that both vectors represent the same numerical 
value. 
n n 
J2bi^2i = J^di x2* 
i=0 i=0 
• Start with the low order end of B by setting the index i — 0 and initial carry 
c 0 = 0. 
• Generate the carry out using the standard rule in conventional binary arith-
metic which is 
c i + i = 6 t + 1 •bl + bi-cl + bi+i • Ci 
• Generate the i ' th digit di of the vector V using 
di = bt + C i - r • a+i 
• Increment index i and repeat if i ^ n otherwise algorithm has been completed. 
The canonical SD vector has the minimum weight and all non-zero digits are sepa-
rated by zeros. A l l that is required for multiplication is addition by A or - A when 
a non-zero digit appears; the zeros provide more shifting operations. 
The algorithm can be easily extended to generating two or more signed-digits at a 
time. By using a higher radix, the multiples will increase and there wil l be a larger 
shift per cycle, eg., r = 4 implies multiples of 0, ±A, ±2A and 2-digit shifting per-
cycle. 
3.7.4 T h e Booth's Mult ip l ier A l g o r i t h m 
The Booth multiplier is based on string recoding and for conventional binary num-
bers in radix-2 the algorithm assigns the following inputs and outputs as in table 3.2. 
Inputs outputs Remark on string property 
bi bi-i di 
0 0 0 No string 
0 1 1 End of string 
1 0 T Beginning of string 
1 1 0 Centre of string 
Table 3.2: The Booth Multiplier Radix-2 String Recoding Algorithm 
3. Number Systems for V L S I Processing Elements 50 
The output digit dj for 0 < i < n is obtained by the following discrimination rule :-
The procedure above requires the attachment of two zero valued dummy digits to 
each end of the n-digit binary vector. The algorithm generates no carries because 
each output digit is only a function of the two adjacent input bits. This algorithm 
can generate the output vector V by examining adjacent input pairs of vector B in 
parallel. 
The recoding scheme can be expanded to higher radices by partitioning the recoded 
string into pairs or triplets. I t is also important to remember that the recoded string 
can have adjacent non-zero digits having opposite signs. 
After the string recoded vector is formed the following operations are performed. 
i f bi = bi-i, shift right partial product 
if bi < add A to partial product and then shift right 
if bz > subtract A from partial product and then shift right 
I f the arithmetic left shifting operation exists, the multiply algorithm can proceed 
from the most significant bit. The subtraction operation is performed by two's 
complement addition. This algorithm is a boon to two's complement multiplication 
because the signs of the multiplier and multiplicand are automatically taken into 
account during the procedure. 
3.7.5 Ef f ic iency of Mult ip l i er Algor i thms and Design Al ternat ives 
The measure of multiplier efficiency for these add-shift multiplier algorithms, are 
based on the average number of add-type operations and the average shift length 
between successive additions in the algorithm. The average shift length should 
increase and the average number of additions should decrease with respect to a 
simple add-shift multiplication algorithm. 
The various multiplier design alternatives are summarised below :-
Adder Bypassing Shifting across strings of zeros in the multiplier, by using non-
uniform shifts. This is ideal for asynchronous computers. The disadvantage 
is the increased control complexity. 
0, if bi = b i - i 
1. i f f c i < 6 
1, \ibi>b 2-1 
3. Number Systems for V L S I Processing Elements 51 
Reduced Addition Time Parallel carry generation using Carry Lookahead Array 
( CLA ) techniques or Carry-Save methods will reduce the addition time. SD 
arithmetic allows parallel addition to be performed very conveniently. 
Multiple Shifts Provision must be made to shift more than one digit per iteration. 
This increases hardware, so the optimal choice is a tradeoff between speed and 
cost. 
Multiplier Recoding The introduction of redundancy into multiplier designs 
through recoding, may not necessarily increase the circuit complexity, due to 
the simplicity in generating the multiplicand multiples. Recoding increases 
the average shift length. 
High-Radix Multiplication The number of iterations required in a multiplica-
tion can be reduced by a factor of k = log r x and x is ^he number radix. 
3.8 Summary 
The simplest architectures are the array multipliers. These are very common in 
high speed applications because when pipelined they can become systolic arrays. 
Due to their inherent regularity, their great advantages are ease of design and the 
generation of area efficient layouts. 
The multiplier recoding techniques can be applied to parallel and serial execution 
units, but come into their own when in serial form. The ground breaking paper 
of Lyon on serial booth encoded multipliers [79] was very useful in the VLSI DSP 
community as it enabled complex circuits to be designed. An example of these 
was a programmable filter bank [80] for cochlea simulation. A modified form with a 
routing matrix was built by Wawrzynek for musical synthesis applications [143, 144]. 
There are alternative forms of booth encoding which only have positive shifting for 
example the radix 4 serial multiplier [110]. 
There are many types of signed digit arithmetic units, which fall into regular array 
and serial architectures based on Avizienis work [4]. Work at Queen's University of 
Belfast on systolic arrays have produced I IR filters [150] which have lower latency 
as the digits can be extracted with the most significant digit first. A hybrid booth 
encoded multiply accumulator unit with an internal redundant binary format [51] 
can also be designed. Serial based multipliers [27] require a means of converting the 
digit's stream into two's complement form. This is carried out by the algorithm of 
Ercegovac and Lang in their paper [26]. 
3. Number Systems for V L S I Processing Elements 52 
The residue number system, which can be thought of as a highly redundant number 
system has been widely researched [36, 132]. I t has been implemented in I IR [29] 
and FIR [59] DSP structures. I t has also appeared in FFT processors [22] using 
gaussian residue arithmetic. There has even been a paper by Taylor and Huang on 
a floating point A L U [134], However, its use with current digital number systems 
requires a large overhead either by using the Chinese Remainder Theorem or by 
using a mixed radix form changing between the two formats. Examples are [30] 
and more recently by [25, 42]. A mixed radix case [152] has been reported. The 
residue number system has been used sucessfully in a multi-valued logic based image 
processor [44] using current mode logic. Multipliers [62, 66] using this novel VLSI 
circuit design flow have also been constructed. In chapter 4 I will discuss this 
circuit technique in which logic can be held in more than two states. This number 
system has even found itself in optical computing [90] by using the Fredkin logic 
element [33]. 
This chapter has reviewed various ways of performing arithmetic computation using 
novel number systems. Some of these having no analogue to the human approach, 
have advantages in the design and implementation of VLSI architectures for DSP. 
Since digital signal processing is concerned with describing the world in terms of in-
teger representations, their transformation requires the use of arithmetic operations. 
There are many ways of representing numbers electronically and this chapter dis-
cussed the arithmetic operations these representations allow. Each representation 
and style help designers to build efficient computers. The most common number 
system in use today is the sign-magnitude and two's complement forms. However, 
by using redundant number systems inside the arithmetic blocks and by efficient 
conversion between the two formats, fast computation can be performed. 
I f we are willing to relinquish our beloved binar y number system and discard digital 
CMOS design, an important new avenue in computation can be explored. This is 
discussed further in the next chapter. I t remains to be seen whether alternative 
computation becomes predominant in the future. I think that these new forms will 
have a wonderful future ! 
The rest of this thesis concentrates on two's complement arithmetic applied to 
a musical synthesis algorithm. The design infrastructure enabling simulation to 
realisation necessitates this particular choice of method. Using two's complement 
arithmetic enables certain transformations to be achieved using a relatively small 
silicon area. This wil l be explored further in chapter 5. 
C H A P T E R 4 
VLSI Technologies and 
Applications 
This thesis has so far concentrated on arithmetic algorithms and possible structures. 
The signed digit number system has become more important in MAC designs due 
to the increased number redundancy. The residue arithmetic systems have been 
studied by a few researchers, but implementation is somewhat contrived because 
the designers still work in binary and its higher radices [132]. 
High radix arithmetic based processors have remained mainly an academic interest 
because of the increased hardware complexity. The cost of an arithmetic processor 
is proportional to the square of the word-length, and the speed is proportional to the 
logarithm of the word-length [52]. The multiply unit in processors contains « 10% 
of the total semiconductor components and the cost of this element is 10% of 
the total processor cost. 
Novel VLSI technologies would allow more powerful arithmetic algorithms to be 
constructed in various number formats. However the designer would need to provide 
a backward compatibility path to standardised binary operated processors as they 
enable easy interfacing between the novel processor and current peripherals. Such 
compatibility is even more important when designers consider floating point IEEE 
compliant arithmetic processors. 
53 
4. V L S I Technologies and Applications 54 
4.1 Novel VLSI Technologies 
The best architecture for an arithmetic processor is one which uses the minimum 
area of silicon and provides maximum speed. The complexity of the circuitry should 
be minimal. The complexity is defined by the number of components and their 
interconnections. 
The current trend in digital electronics is to reduce the feature sizes of transistors 
to achieve faster operations. This is easily understandable and is usable up to the 
limits of lithographic techniques and chemical etching. Once the size of the devices 
become very small, quantum size effects become predominant and the transistor will 
not function. In this regime, new devices need to be invented. As transistor size 
is reduced, their voltage supply must also be reduced, and because of its quadratic 
dependancy, power consumption is decreased commensurably. However, there is a 
speed penalty which can be overcome by architectural transformations 1 and scaling 
the threshold voltage [71] of the transistor. As an interim measure these techniques 
are effective, but other problems like poor signal to noise ratios on chip could halt 
their progress. 
I would like to highlight four novel technologies which show great promise. These 
are :-
Novel Materials This normally uses Gallium-Arsenide technology, allowing a cor-
responding increase in speed. Other fabrication technologies are 
Silicon-Germanium substrates, which the companies I B M and Analog Devices 
are now developing and marketing. Another technology is Molecular Electron-
ics, but this is a very new technology and will take ten years or so before be-
coming commercially useful. Optical based computers do exist, but they take 
up a large amount of space ( the optical bench ) and cannot interface simply 
with silicon based systems. A related technology is known as plasma wave 
electronics [83]. Other examples include the field of spintronics [85] which 
utilise magnetic material to confine electrons in one of two spin states. The 
statistical average of all these confined electrons results in a memory device 
which requires no refreshing of the cells' contents. Their other advantage is 
that they can be scaled to smaller dimensions because they utilise quantum 
mechanical effects and thus increase package density. 
Different Digital Modules I mention these because the Japanese have built a 
Josephson Junction SQUID based computer [39] working at 1GHz. Negative 
'Which can also be applied to improve system stability [41]. 
4. V L S I Technologies and Applications 55 
Differential Resistance ( NDR ) based silicon circuits are also beginning to be 
used, as they can combine two or more operations in one transistor. 
Low Power Designs This will take on more significance in the coming years as 
devices are squeezed onto smaller pieces of silicon. Smaller transistors makes 
the systems operate faster but it also increases power dissipation. In [145] 
low-power reversible computational elements was discussed. 
M V L Logic This acronym stands for multiple-valued logic, whereby instead of 
binary encoding on wires, the designer chooses a logic which has more than 
two values. 
The majority of digital circuits are based on voltage thresholds and positive feedback 
to the supply rails and thus provide noise immunity in digital design. An interesting 
low power, high speed technique for latch design can be achieved using current mode 
logic [151]. 
The rest of this chapter wil l concentrate on various multiple valued logic structures 
suitable for VLSI implementation. They are current-mode logic, negative differential 
resistance, voltage-mode logic and a new concept which I have created. 
4.2 Current-Mode Logic 
Current Mode Logic is based on the subset of Multiple-Valued Logic ( M V L ), 
which is concerned with architecture implementation in silicon by use of currents 
to carry information. The field of MVL is so wide, it is necessary to relabel a small 
subset which has the most benefit to VLSI designers. The IEEE society has a yearly 
international symposium on multiple-valued logic, from algebra to implementations 
of ternary computers. 
Current Mode Logic is composed of four elements as follows :-
• Current Sources. 
• Current Mirrors. There are two types, nMOS and pMOS. 
• Threshold Detectors. 
• Bidirectional Current Input Devices. 
4. V L S I Technologies and Applications 56 





Figure 4.1: Current Source Circuit Symbol and Schematic 
4.2.1 C u r r e n t Source 
The current source as shown in figure 4.1 implements the function 
J y = 0 i f s = T 
\ y — m if x = '0' 
where x is the input, y is the output and m is an integer. 
The current source is implemented using a p-channel depletion-mode MOSFET for 
fast and stable operation. The saturation value of the drain current is used as a 
constant current and is related by 
Id = Kd- — -V$ 
where Kd) Vr, W and L are the transconductance, threshold voltage, channel width 
and channel length respectively. The unit current value is set by the dose control. 
The current source is insensitive to fluctuations in VDD-
A voltage-switched current source can be implemented using a p-channel enhancement-
mode MOSFET ( current pass transistor ) connected to a p-channel depletion-mode 
MOSFET ( see right hand circuit schematic of the threshold detector on page 58 ). 








o * 1 A o o 
V 
y-i 1 I * Vss 
Figure 4.2: Current Mirror Circuit Symbols and Schematics 
4.2.2 C u r r e n t M i r r o r 
There are two types of current mirror circuits, nMOS and pMOS respectively and 
these are shown in figure 4.2 and implement the function 
yi = — Oj • x for i = 1 • • n 
where x is the input, m is the output and a{ is a scale factor. 
The current mirror is used to invert current direction, duplicate input current or 
scale the input current. The nMOS current mirror's input is operated in the satu-
ration region and the input current I m is given by 
hn=K-(V[n-VT)2 
where K, Vt and VM are the proportionality constant, the threshold voltage and the 
input voltage respectively. The output transistor operates in the saturation region 
because the gate is connected to the input of the current mirror, hence 
An = ^out 
A similar equation can be used for the pMOS current mirror. 
For short channel length devices, 7 0 u t deviates from I m due to the channel modu-
lation effect. This could be cured by using a diffusion self-aligned MOSFET device. 
4. V L S I Technologies and Applications 58 




Figure 4.3: Threshold Detector Circuit Symbol and Schematic 
4.2.3 T h r e s h o l d Detector 
The threshold detector is shown in figure 4.3 and implements the equation 
f y = 0 if x < T 
\ y = m if x > T 
Where x is the input, y the output, T the threshold and m is an integer. 
4.2.4 B id irec t iona l current input c ircuit 
The Bidirectional Current Input device detects the polarity of a bidirectional current 
and is shown in figure 4.4. I t uses current mirrors in addition to MOSFETS and a 
NOT gate. This device implements the function 
f x* = x, x~ = 0 if x > 0 = 0, x~ = x i f x < 0 
Where x is the input and x* and x~ are the outputs. The I-V transfer function of 
this device is shown in figure 4.5. 
4. V L S I Technologies and Applications 59 
o 
o X o 
BCI 
X o O 
Figure 4.4: Bidirectional Current Input Circuit Symbol and Schematic 
V Vdd Vdd 
2 
Figure 4.5: 'Bidirectional Current Input I-V characteristics 
4. V L S I Technologies and Applications 60 
4.2.5 Module V L S I Scal ing 
The modules can be scaled easily and do not require more voltage space because 
they operate on currents. They require wider devices due to the number of current 
values used. The advantage with these devices is that the scaling is localised to 
parts of the chip area, whilst normal voltage driven devices ( not current based 
logic systems ) change the system wide voltage supply. 
These modules have simple fan-in connections, using a wire, and complicated fan-
out connections. This is the opposite of T T L NAND based logic. Generally they 
are also radix independent, but because of imperfect current mirrors, the radix is 
currently limited to four. 
A good review of current mode CMOS multiple-valued logic circuits and theory 
can be found in [17, 55, 56], An informative critique on CMOS current mode 
adder design can be found in [1]. Multipliers based on signed digit arithmetic can 
be implemented effectively with this paradigm [62, 66]. Residue arithmetic based 
devices have been built from signed-digit blocks and barrel shifters [44]. 
4.3 Negative Differential Resistance 
Negative differential resistance occurs when part of the current versus voltage curve 
for a device has a negative slope, as shown in figure 4.6(a). A device which has 
negative resistance generates electrical power which violates energy conservation. 
However, devices and circuits can be designed which don't need external energy 
sources and are thus passive. These devices can be built systematically from two 
bipolar transistors and normal resistors as described in [12, 13, 14, 100] but are not 
suitable for VLSI implementation because at present they need large voltage ranges. 
4. V L S I Technologies and Applications 61 





(a) Definition of Negative Differential (b) Stable States created from the cas-
Resistance cade of resistor and NDR device 
Figure 4.6: Block Diagrams of Negative differential Resistance 
Alternate circuits supporting negative impedance are gyrators, which are built from 
operational amplifiers, resistors and capacitors. Gyrators are used in analog filter 
design to simulate inductors. 
Devices which implement negative differential resistance can be manufactured us-
ing GaAs substrates and utilising the quantum mechanical phenomenon known as 
resonant tunnelling. 
By connecting a resistor in cascade with a circuit or device which implements neg-
ative differential resistance, a circuit with multiple states can be established. This 
is shown diagrammatically in figure 4.6(b) and appears as the stable voltage points 
a, (3 and 7. I f an external signal was applied across the nonlinear circuit element, 
three distinct values could be stored and either a ternary memory device or an 
xor function could thus be performed. Hence more functionality arising from fewer 
devices. 
By design, physicists and engineers can create resonant tunnelling devices having 
multiple regions of negative differential resistance, which can be used for digital 
storage [123, 147] and analog frequency multiplication [8, 119]. They can also be 
used as building blocks for very fast 32 bit multiplication [91] using a logic design 
approach [92]. They are smaller than an optical based multiplier [75], but are 
currently quite power hungry and require GaAs Molecular Beam Epitaxy methods 
which are still a specialised niche market. 
4. V L S I Technologies and Applications 62 
4.4 Voltage-Mode Multiple-Valued Logic 
In this chapter, we have seen methodologies which generate many logic values by 
simulation of exotic non-linear behaviour, by current scaling, or by actual device 
physics. In this section, multithreshold logic circuits which are operated in voltages 
are discussed. There is a close correspondance with this approach and normal 
familiar binary logic systems. 
This section is split into two. The first subsection will describe a voltage threshold 
mechanism built from operational amplifiers and the second wil l describe a func-
tional transistor called i/MOS. 
4.4.1 Operat iona l Ampl i f i er Approach 
In binary logic, designers allocate certain voltage ranges for high and low states and 
assume that the switch between these two stable states occurs over a narrow voltage 
range and at high speed. This should be familiar to designers of both CMOS and 
T T L logic families. These families have nice S-curved current-voltage characteristics 
for their inverters. The ranges provide a mechanism to protect current drain from 
previous and other logic elements in a design. The steepness of the I-V curve 
provides the logic level restoration necessary for noise immunity in digital circuits. 
To provide multilevels we must guarantee adequate voltage ranges for each stable 
state, as voltage levels for each state can vary. A good rule of thumb is to separate 
each voltage state with ± ^ its mid-value, so that the actual encoded voltage level 
is spread equidistantly over the supply voltage. This can be achieved by using 
operational amplifiers set up in parallel form, implementing an idealised inverter 
curve for output vs input voltages plus biasing. A comparator arrangement could 
be used if the number of circuits connected was very small, but this provides no 
method of renormalisation of the voltage levels. 
The op-amp approach is useful in getting some practical experience in multi-valued 
logic but it is slow [109]. An approach using CMOS and Bipolar transistors in 
a multi-level neural network implementing Analogue to Digital conversion can be 
found in [153] 
4.4.2 Neuron M O S F E T Approach 
A new functional device, called the Neuron MOS transistor (f^MOS) because of its 
capability of simulating neurons, has been found to be very useful in implementing 
multi-level Boolean logic [121]. 




 V2 / \ V n 
L 
f i e l d oxide 
N N 
P s u b s t r a t e 
Figure 4.7: Schematic representation of a neuron MOS transistor 
In diagram 4.7 the transistor is a normal MOS transistor with a floating gate to store 
charge, its main difference being its number of input gates. This structure is very 
similar to the charge coupled device, but instead of shunting charge from one region 
(7V +) to the other using two phase clocks, i t is used like a normal transistor. The 
transistor operates internally in a multi-voltage regime but its input gates operate 
in normal binary. I t is a voltage operated device and the voltage thresholds are 
controlled by the capacitance of the input gate to the floating gate (C„) and the 
capacitance from the floating gate to the source-drain channel. Thus each input 
gate can have a different threshold voltage. 
The transistor has been found to operate at low power over wide frequencies but is 
10-20% slower than CMOS. The transistor has larger gate capacitance and provides 
better optimisation of drive current [122]. In [120] important device characteristics 
are measured and a new multiplier design manufactured [140] which has been simu-
lated up to 500 MHz with nearly constant power consumption of 100/^W over 0-200 
The device has a large number of gate inputs and this could cause problems in large 
chips because of routing them over large distances. The other problem is getting 
the oxide layers produced smoothly with the standard CMOS process. 
This transistor is like the floating gate transistors used in electrically erasable read 
only memory chips (FLASH memory). Using Fowler-Nordheim tunnelling [73] and 
clever incremental charge injection, a 256 level per cell analogue storage device has 
been manufactured for low cost audio applications [7]. 
MHz. 
4.5 DAD Logic 
In this section, I will introduce a new concept in digital design which should be 
suitable for implementation in the standard CMOS logic technology. 
4. V L S I Technologies and Applications 64 
The major driving force of this concept, is its use of components to the fu l l ; unlike 
CMOS which shunts current between two thresholds, my concept uses the behaviour 
of the transistor to reduce component count in the same way as analogue electronics 
perform their functions. In CMOS digital circuits, the transistor is operating in the 
saturation regime to perform the switching function. In analogue electronics the 
designer can use the sub-threshold region to perform functions which can consume 
low power [89] by utilising the transistor's nonlinearity which arises from its physical 
structure. These analogue components often out perform the equivalent function 
implemented using the digital technique in terms of power, speed and area. This is 
despite the individual transistors being larger. What is more surprising is that the 
analogue multiplier based on Gilbert's translinear loop was invented in 1968 and 
using the bipolar technology could operate at 1 GHz with four transistors [37]. I t 
has taken thirty years for digital technology to approach this speed, whilst analogue 
design has improved commensurably by the improvements in fabrication and device 
scaling. Another reason for using analogue CMOS technology is that digital tech-
niques at sub-micron technologies resemble that of analogue because of the need 
to amplify useful signals in a torrent of noise. I t looks like our approximation of a 
switch is becoming more difficult to work with, because of the crosstalk generated 
in the wiring and the reduced voltage thresholds required. 
However, i f designers worked entirely in analogue we would be plagued by noise 
and signal recovery would not be possible. DAD Logic is my approach to solve 
this conundrum. Digital-Analogue Duality Logic uses analogue building blocks 
operating over a continuum of voltages and currents to generate two or more dis-
crete logic levels. Notice that their are similarities between analogue and digital 
methodologies and the correspondence principle in Quantum Mechanics. In actual 
fact, my work is directly inspired by quantum mechanical theory. 
My idea is to use DAD Logic to create discrete equidistant levels and to confine 
analogue values to these levels. This acts like the hysteresis in digital circuits acting 
on small drifts in the input voltages to pull the output to one of two values. Noise 
immunity is achieved providing the logic can "lock" to the relevant level in the 
presence of noise. Another name for this process is a quantiser, whose function is 
shown diagrammatically in figure 4.8. 





Figure 4.8: Functional Diagram of DAD Logic Quantiser Operation 
Figure 4.8 shows a linear ramp waveform being applied to the quantiser function 
and the output waveform is discontinuous with equidistant values. Notice that 
the quantiser function resembles a staircase and input values which vary over the 
vertical parts of the step function become horizontal at the output. The approach 
shown above resembles work done on a multilevel neural network for Analogue to 
Digital conversion [153] and also on an implementation using resonant tunneling 
diodes [74]. The first reference implements each step by additional circuitry with 
variable offsets, whilst the latter implements it with a single device but in a different 
technology. My approach is to sense the input value and home in on the nearest 
discrete value; consequently this device is scalable to many levels with no increase in 
circuitry. The real beauty and power of this approach is that it allows designers to 
use analogue circuits to perform the same function as do complex digital algorithms. 
Some examples of complex digital algorithms are those required by exponential, 
division, multiplication and natural logarithm operators. 
In the forthcoming sections, I will delve into the internal structure of the DAD Logic 
4. V L S I Technologies and Applications 66 
functional block. 
4.5.1 M a t h e m a t i c a l T h e o r y behind D A D Logic 
DAD Logic requires a mathematical model to converge a range of values to dis-
crete ones. Luckily mathematicians have already discovered the tools to do this: 
differential equations. The solutions of these equations are known as characteristic 
values (eigenvalues) and characteristic functions (eigenfunctions). In most electronic 
engineering applications, we normally use inital value differential equations which 
have many solutions. The differential equations suitable for DAD Logic are the 
ones which have conditions imposed at the ends of a medium. A classic example is 
a vibrating string tied at its ends having frequencies which are related by integer 
multiples of the lowest frequency (the fundamental). These equations are known as 
boundary value differential equations. 
The special class of functions needed for DAD Logic are the ones which have two 
boundary values (or limits); the equations governing this behaviour are known as 
Sturm-Liouville Theory 2 . 
A second order linear differential operator L denned on the interval a < x < b is 
said to be in self-adjoint form if 
Where p(x) is any function with continuous first order derivatives such that p(x) > 0 
or p(x) < 0, for all x in the interval a < x < b and q(x) is an arbitrary continuous 
function on the interval a < x < b. 
An equation of the form 
is called an eigenvalue equation of the differential operator L. This equation has a 
trivial solution when y = 0 and "useful" solutions when y is not equal to zero. The 
To each such eigenvalue there may be one or more solutions y\, which are called 
eigenfunctions corresponding to the eigenvalue A. 
The generalised Sturm-Liouville problem, or system, is a second order homogeneous 
linear differential equation of the form 
2Nanied after Jacques Charles Francois Sturm (1803-1855), a Swiss mathematician who worked 
in Prance with his friend Joseph Liouville (1809-1882), a French mathematician. 
d d 
p(x) + 9 i L dx dx 
Ly = \ y 
latter solutions only exist for certain values of A, which are called eigenvalues of L. 
4. V L S I Technologies a n d Appl i ca t ions 67 
denned on the interval a < x < 6, along w i t h a pair of homogeneous boundary 
conditions such that the eigenfunctions corresponding to dist inct eigenvalues for the 
adjoint operator are orthogonal. The parameter w{x) is called the weight func t ion 
which is assumed positive for all x in the interval. 
The most impor tan t property which w i l l be utilised is that the eigenvalues are real 
and can be arranged in an ascending order of magnitude separated by an index of 
the eigenvalue. Th i s index is the number of zeros on the open interval a < x < b for 
the par t icular eigenfunction. The other impor tant construct which w i l l be of use to 
f i n d an eigenvalue given an estimated one is the Different ia l Green's Identi ty. 
Suppose that i n a regular Sturm-Liouvi l le Problem on an interval a < x < b the 
coefficient functions p (x ) , q(x) and w(x) depend smoothly on some parameter v. 
Here we take this to mean that | ^ and ^ are continuous in bo th x and v. Let 
u be for each v a solution of the parametised Sturm-Liouvi l le different ial equation 
and let u depend smoothly on x and v; and likewise for the other solution to the 
equation pu'. Then 
The l imi t s c and d are valid across the interval a < x < b and are more likely to be 
the interval l imi t s themshelves. 
A f t e r this brief mathematical tour, i t is t ime to look at some useful different ial 
equations which have unique eigenvalues which are suitable for D A D Logic. 
— / u wax = 
av Jc 
,du d(pu') 
pu — - u—— 
av av 
4 .5 .2 H a r m o n i c O s c i l l a t o r 
M y f i rs t a t tempt at f ind ing a suitable differential equation having equi-spaced 
states (levels) was the one dimensional quantum harmonic oscillator. This is the 
Schrodinger equation of a particle trapped in a parabolic well w i t h the constant 
factors set to unity. 
^ # T ) + ( £ - 3 ; 2 ) ^ ) - 0 (4.2) 
Subject to the boundary conditions —oo < x < +oo and at these l imi t s the wave-
func t ion should be zero. Thus equation 4.2 can be transformed to the fol lowing 
equation which has power series solutions. 
4. V L S I Technologies a n d Appl i ca t ions 68 
W ( O - 2 ^ 7 < ( O + ( c - l ) H ( O = 0 (4.3) 
The solutions for equation 4.3 are known as Hermite polynomials and have eigenval-
ues of e — E = (2n + 1) and n = 0 , 1 , 2 , . . . . Consequently, f rom a simple equation, 
equally spaced levels result and are shown graphically in figure 4.9. This figure 
shows the solutions to equation 4.2 w i t h </>(x) and | 0 ( x ) | 2 plot ted against x and the 
polynomials p lo t ted relative to their respective energy levels or eigenvalues. Note 
that the smallest eigenvalue is ^ and not zero. 
Figure 4.9: Stationary state solutions for the Harmonic Oscillator Potential 
I n the fo l lowing section, I w i l l introduce another differential equation system which 
has the abi l i ty of generating equidistant levels w i t h sign. This type of system would 
be very useful to implement in signed number ar i thmetic transforms using D A D 
Logic. 
4 .5 .3 B i p o l a r L e v e l S y s t e m 
Bipolar levels require a differential equation whose solutions are periodic. Thus 
tr igonometric funct ions must be involved somewhere. I f we express the three dimen-
sional Schrodinger equation as spherical coordinates having a central force potential , 




(a) Eigenfunctions of Oscillator (b) Probability of finding particle 
1 d 




1 d 2 f r 
r 2 s i n 2 6 d<p2 
+ k2f{r)<!> = 
This equation can be separated into three variables R(r)0(9)$((/)). The radial part , 
which includes the central potential and the energy, is independent f r o m the polar 
4. V L S I Technologies and Appl i ca t ions 69 
dependent variables and is unimpor tant for D A D Logic. Manipu la t ing the equation 





s i n#— s'md- ,„ 
dS \ d9 
= —a 
+ ps\n2ee — a 
Note the two separation constants (a2 and 0). I t turns out that the last two 
equations appear in quantum mechanics as angular momentum operators. These 
equations can be expressed as eigenvalue equations as follows :-





Here (3 is denned to be 1(1 + 1) and the Lz operator is independent of 8. Also I is 
an integer belonging to I = 0 , 1 , 2 , . . . and m is an integer too w i t h these ranges: 
m = 0, ± 1 , ± 2 , . . . . One last point is that 6 must be periodic w i t h a period of 2TT 
so that Y is single valued. Wha t we have stumbled across is the Legendre and 
Associated Legendre equations and their polynomial solutions. 
n(n + 1) -
771 
1 
P?(x) = 0 (4.4) 
The Associated Legendre differential equation is defined as shown i n equation 4.4 
subject to the boundary condit ion — 1 < x < 1. The polynomials are represented 
by P™(x) and the equation becomes the Legendre equation i f m = 0. The value of 
?n is confined in the range — n < m < n. I n polar coordinates, spherical harmonics 
arise which contain associated Legendre polynomials as a func t ion of cos 8. 
On occasion, D A D Logic may require bipolar integers wi thou t a zero value, but does 
such a condi t ion exist ? I t so happens that a f o u r t h dimensional Laplace's equation 
w i l l generate the required equation [98], or, by reformulat ing angular momentum i n 




s in 2 P \ da2 dy + 
(4.5) 
4. V L S I Technologies a n d Appl i ca t ions 70 
Equat ion 4.5 is subject to 0 < a < 2ir, 0 < B < it and 0 < 7 < 2n and instead of 
associated Legendre polynomials, hypergeometric polynomials now arise subject to 
a differential equation [24] in B. 
!
d2 „d m2 + k2 - 2rnk cos B , , , s ) ,m , „ , , 
W + mll3Tff 5?p " + '<'+ ' 0 < 4 6 ) 
The result of equation 4.6 is that / now can take integer and half integer values and 
m (and k) has 2/ + 1 values ranging f r o m —I to /. The eigenfunctions of the equation 
now describe a symmetric top w i t h m being the component of angular momentum 
along the body f ixed symmetry axis and k is the component along an axis fixed 
i n space. For D A D Logic i t is useful to allow m — k and thus reduce the number 
of parameters to / and m. Here / selects the required range and rn is the actual 
quantised values. 
Thus bipolar equidistant values are obtained w i t h the opt ion of including zero, 
dependent on a parameter which can take integer or half integral values. 
4.5 .4 V a l u e D e t e r m i n a t i o n t h r o u g h A p p r o x i m a t i o n 
I n this part , I w i l l investigate some methods which I believe to show promise in 
the fo l lowing problem. Given an estimate of an eigenvalue, converge to the actual 
(closest) eigenvalue as quickly and accurately as possible. This is the most crucial 
aspect of D A D Logic and unfortunately i t is the part which s t i l l needs the most 
research. 
Here, we w i l l look at analogue methods using f ini te element techniques and then 
move onto two approaches which allow convergence to an eigenvalue. These last 
two approaches are based on programmed dig i ta l computat ion. 
4.5.4.1 A n a l o g u e E igenva lue M e t h o d s 
As mentioned in the last section, differential equations are modeled using finite 
difference techniques. So what are they ? Imagine a horizontal line and a physical 
phenomena which smoothly varies along i t . Discretise this line into equal subsections 
having a w i d t h 5. Label any three nearest subsections 2, 0 and 1 going f r o m lef t to 
r ight . Now label the transient or steady-state values of the dependent variable (the 
physical phenomena) and call i t y. A t each of the adjacent nodes, the value is yo, y\ 
and 7/2- The first derivative (gradient) of this dependent variable w i t h respect to 
x is approximated by taking the difference between the value of y at two adjacent 
nodes and d iv id ing by 5, (the node spacing). The approximat ion at the mid-points 
4. V L S I Technologies a n d A p p l i c a t i o n s 71 





1 = 1 - 0 
i = 0 - 2 
yi - yo 
5 
yo - ?/2 
5 
The f i rs t of these equations is known as the f orward difference a p p r o x i m a t i o n 
and the latter is known as the b a c k w a r d difference approx imat ion . Now, the 
f i rs t derivative at node 0 is obtained by averaging the f i rs t derivatives of the mid -
points between 1 and 0 and then between 0 and 2 and is as follows 
The second derivative of y w i t h respect to x is defined as the rate of change of the 
f i rs t derivative and is the difference between the f i rs t derivatives at the midpoints 
and d iv id ing by the gr id spacing. So 
This approach can be extended to many dimensions, which is required for pa r t i a l 
dif ferent ia l equations. 
Now, analogue methods use different ial operator discretisation and set value by 
using electrical circuit elements. These elements are the components which are 
between the nodes in the approximation. This approach has been demonstrated 
effectively by [9, 70] but requires manual adjustment of the component values which 
is incredibly slow. Also, the network is large and requires proper te rmina t ion for 
equations which have boundaries set to inf ini ty . However, they can pe r fo rm mul t ip le 
variable operation simultaneously which occurs w i t h par t ia l differential equations. 
Other useful works on analogue computat ion are [57, 64, 65, 136, 148]. 
4.5.4.2 T h e Shoot ing M e t h o d 
Before plunging into this method, addi t ional theory is required. Firs t ly, any h igh 
order linear different ial equation can be reduced to many lower order versions. This 
is very useful because Sturm-Liouvi l le theory predicts that eigenvalue determinat ion 
requires only a f i rs t order differential equation, its eigenfunctions are obtained using 
quadrature and the other f irst order different ial equation. 





(dyldx) (dy dx) 2/1 + 1/2 - 2 i / 0 i = 0 - 2 x = l - 0 
52 
4. V L S I Technologies a n d A p p l i c a t i o n s 72 
The Pri i fer t ransformat ion transfers cartesian based second order ordinary differ-
ential equations into polar form. The resulting equations in r and 6 are as follows 
= -cos2(6) + (\w-q)sm2(6) (4.7) 
ax p 
= \ - - (Xw-q)\sm(9)cos(9) (4.8) 
r ax \ p \ 
The regular boundary conditions at o and b are transformed to the conditions 9(a) = 
a and 6(b) = (3. Here a = a r c t a n ( ^ ) and (3 — a r c t an ( j^ ) . The an and bn arise 
because second order equations have two solutions and thus two coefficients exist 
at each of the boundary conditions. 
So why t ransform the equation into Pri i fer fo rm ? I n this fo rm, there is only one 
solut ion for any choice of a and /?. Therefore, we can choose these coefficients in such 
a way as to make the solution of the differential equation the desired eigenvalue. 
Another useful point is that a and /3 are determined upto a mul t ip le of ir. A n 
appropriate choice of this mul t ip le of n determines the number of times around the 
or ig in the eigenfunction travels, and becomes the number of zeros in the solution. 
The principle of a shooting method for f ind ing an eigenvalue of a regular S turm-
Liouvi l le problem is quite simple. The differential equation is solved as an in i -
t i a l value problem in some fo rm, here Pri i fer , over the range a < x < b (in pre-
transformed form) for a succession of t r i a l values of A; these are automatical ly 
adjusted t i l l the boundary conditions at bo th ends are satisified at once. A t this 
point , the eigenvalue is found. Generally, we may want to shoot f r o m the two ends 
to a point somewhere in the middle. To do this requires the definit ions of bo th a 
lef t hand and a r ight hand boundary conditions. The left hand ( Ui(x,X) at x = a 
is 
du 
p— = aj,u = a2 dx 
For the r ight hand boundary, replace a w i t h b. A t the end point , a func t ion D(X) 
is defined and is known as a miss -d is tance funct ion and describes the extent to 
which the boundary condit ion fails to be satisfied at that point . The eigenvalues 
are the zeros of this func t ion . A natural choice for the miss-distance func t ion is the 
Wronskian determinant and is 
dUt(c,X) TT dUr(c,X) 
P—dx— U r ^ X > 2~x U l ^ X ' ( 4 - 9 ) 
Unfortunately , the miss-distance equation 4.9 is always an oscillating func t ion and 
causes problems w i t h root f ind ing algorithms. However, these problems are solved 
4. V L S I Technologies and Appl i ca t ions 73 
elegantly w i t h Pr i i fer equations 4.7 and 4.8. I n this case the boundary conditions 
become 6(a, A) = a and 6(b, A) = /? + k TT. 
The fo l lowing algorithms below summarise the shooting method for eigenvalue de-
terminat ion . Firs t I give the miss-distance computat ional a lgor i thm. 
A l g o r i t h m 1 Miss-distance computat ion 
Inpu t : Coefficient functions 
I n i t i a l value of lambda 
A n accuracy specification 
Ou tpu t : Miss-distance D(X) 
In fo rma t ion on its accuracy 
Method: 
(1) Choose an appropriate point , c, to shoot too. Th i s may 
be done on the basis of a prel iminary analysis and can be 
subsequently adjusted. 
(2) Set up the boundary values of 8(a) and 9(b) 
(3) Integrate 0' equation to compute theta lef t hand solution 
(9L) and theta r ight hand solution (On) 
(4) Setup the miss-distance to be left hand solution - right 
hand solution at point c (D(X) — 6i(c) — QR(C)) 
Next is the eigenvalue determination process. 
4. V L S I Technologies a n d Appl i ca t ions 74 
A l g o r i t h m 2 Root finding for eigenvalue determination 
Inpu t : Index k of the required eigenvalue A 
I n i t i a l guess(es) for A 
Outpu t : A 
error estimation informat ion 
Method : 
(1) solve D(X) = ktr w i t h an appropriate root finder 
Note that the k'th eigenvalue Ajt is the unique value such that the miss-distance 
equation is 
D{Xk)=kTT 
for k = 0 , 1 , 2 , . . . 
The problem w i t h this approach is how to handle st iffness which afflicts the 9 
equation in parts of the range when A w — q is a lot less than zero and negative. 
This gives arise to exponentially growing eigenfunctions which can easily create 
numerical overflow or in the analogue domain, saturation. 
I n the next subsection, the Pruess method is presented and is constructed in a 
similar way to the finite element approach discussed in section 4.5.4.1. 
4.5.4.3 T h e P r u e s s M e t h o d 
The Pruess method at tempts to solve a problem by reducing i t to a simpler prob-
lem which is an approximat ion of the former problem. This is achieved by using 
piecewise constant mid-point approximation on the Sturm-Liouvi l le coefficients p, 
q and w. Thus the coefficients are split up into sub intervals and its value is taken 
to be the mid-point value of the sub interval. This is essentially a finite element 
approximat ion and i t reminds me of zero order hold, which appears as a sampling 
process model i n d ig i ta l signal processing. 
Let P, Q and W have values pi, qi and uij respectively in the i ' t h interval between 
Xi+i and Xi over i = l , - - - , n . A solution to this approximation of the Sturm-
Liouvi l le problem is of the f o r m 
U{x) = Ci Fi{x)+di Gl(x) 
4. V L S I Technologies a n d Appl i ca t ions 75 
Where Fi, Gi are independent solutions of 
d2U 




The explici t f o r m of a solution is the following, provided W{ = 
E/( i ) = t / ( x z _ x ) « i ( a ; - a : i _ i ) + f p ^ J ( a ; i _ 1 ) <L\{x - a ^ O / p * 
Where 
cos(i«i s) ki > 0 
1 fci = 0 
cosh(tUi s) fcj < 0 
Wi 1 
ki = 0 
S 
sinh(n),- £ k,<0 
Let Ui = U{xi), PU[ = PU'{xi) and i = i j , then 
1 1 = T , » = l , . . . n 
The transfer ma t r ix is defined as 
T, = ( / * ) 
The value of hi = Xi — a -vi and T j is equivalent to replacing / i by —h in T j which 
is needed for the a lgor i thm below. 
4. V L S I Technologies a n d Appl i ca t ions 76 
A l g o r i t h m 3 Simple Pruess A l g o r i t h m 
1. Choose a meshpoint xm, where 0 < m < N as a matching point , c 
2. Set up in i t i a l values ( ^ 0 J satisfying the boundary condit ion at a, and 
( PU' \ 
i n i t i a l values N ] satisfying the boundary condit ion at b 
3. For a t r i a l value of A 
(a) For i = l , . . . , m d o 
f p U ' \ m ( PU[_V \ 
(b) Store the resulting values PU'L(c), Ui{c) 
(c) For i = N,..., m + 1 do 
(d) Store the resulting values PU'R(c), UR(C) 
(e) Form a miss distance D(X) by comparing PU'L{c), Ui{c) w i t h PU'R(c) 
UR(c) 
4. A d j u s t A to solve the equation D(X) = 0 
The adjustment of A requires a zero count procedure. This is used to count the 
number of oscillations in the solution and home in on the required eigenvalue. I f 
the a lgor i thm was recast v ia the Pri i fer t ransformation, the counting procedure is 
in ter twined i n the 9 variable. 
To select a relevant mesh, the differential Green's identi ty as mentioned in equa-
t ion 4.1 is used as an error control. I t is necessary to use the standard idea of 
equidis t r ibut ion to choose a mesh which satisfies the absolute value of this identity. 
The f i rs t par t of the r ight hand side of equation 4.1 is equal to zero in this case. 
The integral on the lef t hand side is assumed to equal one, this implies a normalised 
eigenfunction. 
4. V L S I Technologies a n d Appl i ca t ions 77 
4.5.4.4 I m p o r t a n t I s sues regarding E igenva lue D e t e r m i n a t i o n 
The shooting and Pruess methods mentioned earlier are missing impor tan t details. 
This was done on purpose because they are generally designed to run on a stored 
program dig i ta l computer where everything is quantised. The details of these meth-
ods, including F O R T R A N source code, can be found i n the excellent book [111] 
"Numerical Solution of Sturm-Liouvi l le Problems". 
I can see that using these methods would make D A D Logic a reality. However, re-
ducing the algorithms to allow implementat ion in analogue V L S I elements has not 
yet been investigated. I t would be beneficial i f these algorithms assumed the bound-
ary conditions were fixed and modifications occured inside these ranges. However, 
this could only work for a restricted range; or an inf in i te range i f a t ransformat ion 
were possible. The current codes work by changing the boundary conditions in 
order to find the eigenvalue. 
I t would be wonderful i f these methods could work in finite t ime, and convergence 
was always the same regardless of the eigenvalue. I f a mesh selection is required for 
analog V L S I , there is a good chance that the c i rcu i t ry could per form a constraint 
more efficiently than a program, by ut i l i s ing the physical behaviour of the circuit 
elements themselves. 
4.6 Summary 
I n this chapter, I have highlighted novel V L S I technologies and architectures which 
are or may be of benefit to applications using digi ta l signal processing. Gal l ium-
Arsenide based technology, Josephson Junction SQUIDs and low power designs were 
briefly discussed. The rest of the chapter concerned different architectures having 
the common goal of encoding mul t ip le values on a single wire, unlike d ig i ta l which 
has only two values. 
Current-mode logic pr imit ives were mentioned and then negative differential resis-
tance based methods were discussed. These included bo th technological and simu-
lated approaches. Then voltage based logic was discussed in terms of technological 
(Neuron M O S F E T ) and Operational Ampl i f ie r based. 
The remainder of the chapter introduced my new idea which I have called D A D 
Logic. This is based on eigenvalue solutions to some second order differential equa-
tions which have two point boundary conditions. Methods of solving these equations 
were discussed using analogue and two programmed based approaches. The last two 
methods have been shown to be very accurate in numerical s imulat ion of a whole 
4. V L S I Technologies a n d Appl i ca t ions 78 
range of problems [111] and i t was my idea that this could be harnessed to the needs 
of D A D Logic. However, at the t ime of w r i t i n g this thesis, an analogue V L S I f o r m 
of eigenvalue determinat ion remains elusive. Consequently D A D Logic, which has 
enormous potential , w i l l remain a theoretical idea at present. 
C H A P T E R 5 
VLSI requirements for synthesis 
I n this chapter, the basic structures for performing DSP operations w i l l be discussed. 
A CORJDIC based sinusoidal generator implemented in the ES2 0.7/xm C M O S pro-
cess w i l l be used as a case study. I n addit ion, a bit-serial radix-2 two's complement 
recursive sinusoidal generator w i l l be used for comparison. 
5.1 DSP Systems Design 
A D i g i t a l Signal Processor is a specialised microprocessor optimised for intensive 
ar i thmet ic operations and a l l instructions take a single clock cycle. The DSP system 
consists of an Ar i t hme t i c Logic U n i t ( A L U ) , a Mul t ip ly -Accumula te uni t ( M A C 
) , a program sequencer, data address generators and architectures optimised for fast 
memory access for data and instructions [43]. 
A s impl i f ied block diagram of a DSP is shown in figure 5.1, which includes the P M A 
( Program Memory Address ) , D M A ( Data Memory Address ) , P M D ( Program 
Memory Data ) and D M D ( Data Memory Data ) buses. 
5.1.1 M u l t i p l i e r C o n s t r a i n t s 
The mul t ip l ie r ( Enclosed in the M A C ) must operate at a speed comparable to 
the R A M access t ime; this normally implies a f u l l parallel mul t ip l ier . The common 
number formats are two's complement, unsigned, f ract ional or integer and mixed-
mode operation. There are tradeoffs between fixed-point ar i thmet ic and f loat ing 
point ar i thmetic and extended or double precision capabilities. 
79 





















. PMA Bus 
• DMA Bus 
INPUT REGISTERS 
S H I P r E R 
OUTPUT REGISTERS 
Intern ill Ariihmciiu Result Bus 
Figure 5.1: A Simplif ied Structure of a DSP Processor 
The mul t ip l ie r should complete the operation in a single cycle and its I n p u t - O u t p u t 
registers should be configured for pipelined operation ( Latched registers ) or m i n -
i m u m latency operation ( known as transparent ) . A pipelined processor has the 
capabil i ty of maximising hardware usage but w i t h a f ixed in i t i a l delay or latency. 
This i n i t i a l delay is the time taken for the f i rs t da tum to pass through al l the 
processing elements. B y using registers i n between the funct ional elements, each 
element w i l l be operating in parallel. Therefore, once the pipeline is f u l l , the ou tpu t 
data w i l l appeal' once every clock cycle. 
5.1 .2 M A C C o n s t r a i n t s 
This part of the DSP is very important and must operate in a single cycle. The 
Accumulator should have adequate guard/extension bits to ease the possibili ty of 
overflow dur ing the ar i thmetic operations. The M A C should detect overflow before 
i t occurs and allow saturat ion ari thmetic. I t should support unbiased rounding and 
have internal feedback paths to lessen the overhead on loops. 
The un i t can either have op t imal throughput ( pipelined processor ) or have trans-
parent throughput ( un-pipelined processor ) . 
5. V L S I r e q u i r e m e n t s for synthes is 81 
5.1 .3 A L U Constra ints 
The Ar i thme t i c Logic U n i t should allow adequate accuracy and chain-abili ty for high 
precision computat ion. This normally implies a bit-slice design. When comput ing 
i n double/extended precision the A L U should operate w i t h the m i n i m u m lost t ime. 
I t would be useful i f the A L U provides division and/or mu l t i p ly capability. 
The ar i thmetic functions which an A L U should have are :-
A D D , S U B T R A C T , A D D W I T H C A R R Y , S U B T R A C T W I T H C A R R Y and A B -
S O L U T E V A L U E . 
The logic functions an A L U should have are :-
A N D , OR, E X O R , B A R R E L S H I F T E R and L O G I C A L N E G A T I O N . 
The status flags the A L U would generate include the fol lowing :-
ZERO, E Q U A L , LESS T H A N , G R E A T E R T H A N and C A R R Y . 
The A L U must provide efficient data movement and computat ion. This is achieved 
by either single-cycle data moves w i t h ari thmetic operation, or two operands load-
able into the A L U i n the same cycle. 
The A L U should be able to access a large number of registers configured as a dual-
set for context switching ( for data storage dur ing interrupts ) or as a register file 
for fast access and storage of intermediate results. The A L U ins t ruct ion set should 
include condit ional operations, i.e. add and shi f t left i f f lag is set. The A L U 
should have adequate feedback pathways to provide an ou tpu t to input l ink for an 
accumulator or provide I / O links to a shifter and A L U . This module should be 
designed to provide pipeline or transparent flow onto the input registers. 
5 .1 .4 D a t a Address Generator 
This module provides data accessing and must have the abi l i ty to address in linear, 
circular ( modulo ) and bit-reverse addressing. The bit-reverse option is required 
for most F F T algorithms. The generator should provide simple logic functions and 
masking; this is useful for interpolat ion and look-up tables. A n opt ional extra would 
be a shifter to aid the F F T twiddle factor generation. 
5 .1 .5 P r o g r a m Sequencer 
This module is not pipelined because a pipelined program sequencer w i l l cause 
program fai lure. The main elements of this module are :-
• Program Counter. 
5. V L S I requ irements for synthes is 82 
• Condi t ional Logic Tester. 
• Stack Counter for subrout ine/ in terrupt re turn addresses, counter values and 
j u m p addresses. 
• Loop Counters. 
• Zero Overhead Condit ional Branch Capabili ty. 
5.1.6 M e m o r y 
The memory should be dual-ported and the DSP device should have separate pro-
gram and data address and data buses ( The Harvard Architecture ) , accessed via 
the Da ta Address Generators. The memory mechanism would benefit by cache 
memory to allow most recent instructions to be stored on chip, providing a speed 
improvement; and support registers to be used as a scratch pad thus reducing access 
to external memory. 
The memory control un i t could have F I F O modules as a means of interfacing be-
tween slower memory and peripherals or between faster processors. 
5.1 .7 A r c h i t e c t u r e S u m m a r y 
A DSP processor requires a controller, interface logic for input and ou tpu t streams, 
storage and ar i thmetic processing elements. The m i n i m u m ari thmetic elements are a 
M A C , though a division element is useful i n D u r b i n recursion, ma t r ix inversion and 
ray-tracing. However, for ASIC design, the designer maps the part icular a lgor i thm 
to the m i n i m u m hardware. So division is neglected due to f in i te space on a silicon 
die, bu t can be implemented using Newton-Raphson i terat ion. 
Provided the a lgor i thm computes operations regularly in t ime, having an almost 
systolic nature, the designer can implement i t using ari thmetic logic elements and 
a very simple controller. I t is also convenient to assume that the completed device 
would pe r fo rm better than an equivalent implementat ion on general purpose DSPs. 
Alternat ively, a DSP core w i t h restricted funct ional i ty could be bu i l t and the algo-
r i t h m coded i n software stored on chip. This is known as microcode. However, this 
thesis is concerned w i t h ful l -custom ASIC designs. 
Useful design cri ter ia and analysis can be found in [81] for general purpose and 
custom DSP design. This book goes into some detail in multi-processor design and 
ar i thmet ic processor design using flowgraphs and clock re t iming. Unfor tuna te ly 
there are errors in diagram labelling and missing references. 
5. V L S I requirements for synthesis 83 
5.2 Datapath length criteria 
Before the comparative study between CORDIC and a serial based recursive oscil-
lator, analysis of the size of the data is required. Two's complement arithmetic wi l l 
be used throughout this investigation. 
In chip design, the designer must ensure that arithmetic operations do not overflow 
and produce the wrong answer in the arithmetic sense. In floating point arithmetic, 
the designer is restricted due to the available bits allocated to the exponent and 
mantissa. Hence some sort of arithmetic truncation or rounding is required. Word-
length increases for addition and multiplication are as follows :-
1. For N Additions, the designer allows log r N extra bits in the accumulator. 
This wil l guarantee no overflow. 
2. For multiplication of two N bit numbers produces a 2N bit result. The out-
put could be rounded or truncated, but computation of integrals or sum of 
products would cause numerical errors. 
Computing integrals with numbers of nearly full-scale without overflow can be de-
termined by the following :-
• A l l the numbers have the same large value, so no cancellation occurs. There-
fore the integral can be up to N times as large as any number, requiring log r N 
bits extra. 
• The inputs are of white noise. Therefore the integral process would go up by 
\/{N) and hence requires log r \/N bits extra. 
• The input is assumed to be sinusoidal of 0.99 full-scale. There would be no 
overflow, but there would be a slow drif t caused by the build-up of truncation 
or rounding errors. 
5.2.1 Overflow and underflow reduct ion 
The designer can decide to allow intermediate products to be stored in an accu-
mulator register having more precision than the input and output registers. This 
reduces the chance of overflow in additions. Double precision is the term given to 
an accumulator output which has two times the number of bits as the input and/or 
output. Extended precision is a compromise, having less bits than double precision, 
but allowing accumulation of r9 additions. Hence for two b bit numbers, extended 
5. V L S I requirements for synthesis 84 
precision has b + g bit storage. Therefore, the least significant part of the multipli-
cation is neglected. This is advantageous because the accumulator's word-length is 
reduced requiring less transistors to design i t , and DSP operations are noise toler-
ant. This can be seen in equation form by writing a b-bit binary number as [HL] 
and rewriting the multiplication in double precision format ( | ) -b i t binary number 
[HL\X x [HL}2 = ( 2 5 f f i + L i ) x (22# 2 + L 2 ) 
= 2bHl-H2 + 2l(Hl-L2+H2-L1) + L l L 2 
Note that 2 ! is a 2-bit left-shift and the partial product L\ • L2 being the least 
significant part can be ignored. This trick can also be used with extended precision 
numbers to implement FIR filters and store the output as an extended precision 
number. 
5.2.2 T r u n c a t i o n , R o u n d i n g and Unbiased R o u n d i n g 
There are three methods of quantising an arithmetic operation to smaller word 
lengths. They are truncation, rounding and unbiased rounding. In this analysis, we 
assume that the number representations is fractional in value. 
Truncation: The least significant bits are neglected. This systematically 
underestimates all values. 
Rounding: I f the least significant bits are less than half full-scale, the out-
put becomes the lowest value. However, if more than half-scale, 
the output is rounded upwards. 
Unbiased rounding: When a number falls at the half-way point, round the number 
upwards or downwards with equal probability. 
One mechanism is to round towards an even number and relies 
on the random distribution of even and odd numbers. This 
results in a skew in the even-odd distribution, but an averaging 
of the round-off errors. 
Normal biasing can be thought of as rounding towards +oo. Incidentally, in float-
ing point (IEEE 754) rounding is unbiased with the options of directed rounding 
towards +oo, downwards towards - co and towards 0. Unbiased rounding has su-
perior performance compared to truncation because the process chooses the nearest 
number to the unrounded result. This gives a zero mean, symmetrical error distri-
bution. Truncation always throws away information and underestimates the result, 
consequently the error distribution is skewed negatively and has a non-zero mean. 
5. V L S I requirements for synthesis 85 
5.2.3 Saturat ion A r i t h m e t i c 
Saturation arithmetic is a digital equivalent to analogue based clipping and is advan-
tageous because, if not invoked, word wrap would occur, producing horrible glitches 
in the sampled data stream. Word wrap occurs naturally in binary arithmetic when 
overflow occurs. These glitches can cause alias components in the spectrum which 
would be difficult to remove. This mechanism is useful on outputs of MACs to en-
sure no wraparound. However if the designer knows the maximum lengths a priori, 
this extra element is not required. 
5.2.4 Other noise problems 
In addition to arithmetic truncation and rounding, errors occur depending on how 
the input and output is quantised. Another problem is that the error caused by I IR 
filters is worse than an equivalent FIR filter because of feedback. Also I IR filters 
have a dead-band in the feedback loop which can cause instability by oscillating at 
low amplitude. This is known as limit cycles and is caused by rounding of near-
unity coefficients to unity. However, a sinusoidal oscillator with constant amplitude 
can be constructed around a fixed-point I IR filter limit cycle. 
The process of sampling a continuous time analog signal to its digital form generates 
quantization noise because the signal is rounded or truncated to f i t the digital word-
length and the error is half the quantization step size. The actual error range for 




where b is the word-length in bits of a binary 6-bit A / D converter. This variance 
is used as a benchmark against which all errors due to computation and coefficient 
are compared. Therefore, as soon as the total output error grows beyond this limit, 
information is lost because of the build up of noise. 
The number of bits allocated to the digital signal relates to the signal-to-noise ratio 
by approximately 66 dB. Also coefficient quantization is paramount in I IR filters, 
especially for the poles of the filter because they determine the filter's stability and 
frequency response. 
Arithmetic errors caused by addition and overflow requires careful scaling between 
accumulation and multiplication steps. An additional error is caused by the filter's 
own topology. The total noise generated by a filter having N +1 numerator elements 
and M denominator elements and the denominator of the overall transfer function 
is D(eJwt") is shown in equation 5.1. 
5. V L S I requirements for synthesis 86 
' - ^ ' • » ( £ ) s f B £ = > 
Using equation 5.1, the designer can estimate the number of bits required to achieve 
accurate computation of filter and numerical algorithms. Floating-point arithmetic 
has a large dynamic range so scaling is not necessary. However, normalisation before 
addition, and multiplication are susceptible to noise accumulation. 
5.3 Filter based Sine Generator 
A sinusoidal generator may be constructed from a discrete-time filter implemen-
tation, having a sinusoidal impulse response. The filter topology chosen for this 
investigation is a simple second order filter with the transfer function shown in 
equation 5.2. 
z2 
H [ z ) = z2 -2cos(u, 0 : r s )2 + i ( 5 ' 2 ) 
The poles of the filter are located at e ± J ' w o T s , implemented as a complex conjugate 
pair. Equation 5.2 can be implemented using the canonical direct form two filter 
structure, having only feedback coefficients. The discrete time recurrence equation 
for this filter has one multiplier, whose purpose is to set the frequency of the sinusoid, 
as shown in figure 5.2. 
This filter has limitations which are the following. Firstly, the impulse response 
is delayed by two samples; this can be cured by placing a double zero at zero 
radius in the z-plane. Secondly, the sinusoidal waveform requires a scaling constant 
which is pole dependent. By adding a zero at z = -1 and another zero at z = 
1 as described in [125], automatically adjusts the scaling constant giving unity 
amplitude across all frequencies. The increased complexity is in the addition of 
another addition. Another variation would be to use the second order waveguide 
oscillator [124]. The last limitation is caused by the poles of the filter which account 
for noise accumulation as in equation 5.1 and stability. The poles are quantised 
in either fixed or floating point formats and when plotted as a distribution, the 
designer can see the spread of frequencies and radii [154]. For the filter design 
proposed, the frequency resolution caused by the pole coefficient quantization is poor 
for frequencies close to the origin, but improves as the frequency approaches half the 
Nyquist rate. Therefore to get adequate resolution requires longer coefficients and 
different topologies [87]. The filter chosen could have been replaced by the Gold-
Rader coupled form [113]. This requires four multiplications in place of two, but 
5. V L S I requirements for synthesis 87 
x n y(n) 
y(n-l) 
2cos(co 0 T s ) 
1 
y(n-2) 
Figure 5.2: Block Diagram of the I IR Sinusoidal Generator 
places the complex-conjugate poles on a uniform grid in the z-plane when quantising 
coefficients to fixed point arithmetic, thus giving better frequency precision. 
5.3.1 F i l t e r S truc ture 
The filter structure requires multiplication, addition and delay elements and is im-
plemented in two's complement integer arithmetic. Due to prohibitive cost in silicon 
area for array multiplication, see section 3.6 on page 44, a bit-serial approach was 
chosen. 
The bit serial approach uses single wires to convey information as opposed to the 
simultaneous transmission of words on parallel wires (buses). This approach re-
duces routing requirements in chip design and communication between chips. By 
using pipelining, as most bit-serial systems naturally aspire to, a regular silicon 
structure results. Another advantage is that more functionality can be designed 
onto the chip for less space than an equivalent parallel architecture. The drawback 
is that the computation proceeds serially with a fixed latency. Most bit-serial sys-
tems [20] use least significant bit first transmission, which is unfortunate in rapid 
5. V L S I requirements for synthesis 88 
processing as required by Digital Signal Processing. Using most significant bit first 
techniques [26, 27] coupled with redundant number systems allows computational 
speed to increase [150]. 
5.3.2 Mul t ip l i er Des ign 
The multiplier can be implemented using N modules cascaded together to generate 
the partial products. However a subtraction stage and a complementer are required 
in addition to N - 1 two's complement modules [79]. I t is possible to reduce the 
latency of the pipeline by using recoding of the multiplicand as in Booth's recoding 
on page 49. The latency reduction arises from the reduction of addition operations 
in the partial product formulation. 
A two's complement number represented as 
VkVk-i • • -1/22/1 
has a value equal to equation 5.3, where n = k results in Y being in the range 
~ 1 < y < 1 or if n = k — 1, then Y is bound by — 2 < y < 2. 
Y = J 2 y i ? - n - y k 2 k - n (5.3) 
2 = 1 
Multiplication would be P = X • Y, where X and Y are the same length and range 
and are represented by equation 5.3. The result would be equal to equation 5.4. 
i=i j=i 
i= l j=l 
- xk V i 2 l +*- 2 " - £ Vk xj 2i+k~2n (5.4) 
i= i j = i 
The partial products in equation 5.4 can also be expressed as in equation 3.6 on 
page 44, where each column represents multiplication by two. Thus for two N-bit 
number multiplications, there are N columns by N rows representing the 2N-bit 
product. I f this was implemented, the output would be delayed by N clock cycles. 
Using a radix-2 Booth recoder, see section 3.7.4 creates N regular elements unlike 
the non-booth recoded bit-serial approach. A radix-2, signed-digit, 5-level Booth 
encoder recodes triplets of the multiplicand in an overlapped manner, representing 
5. V L S I requirements for synthesis 89 
signed digit numbers of -2, - 1 , 0, 1 and 2. However, the multiplicand has an even 
number of bits, so the first triplet has an extra bit inserted yo having a value equal 
to zero. Thus the multiplicand is recoded using the following expression :-
fc/2-1 
-k+'2 
y = E to* + w t + i - 2 22i 
i=0 
Now, the number of partial products decreases by a factor of two, because each 
module adds or subtracts by 0, X or 2X. As with the 3-level Booth recoder approach, 
each module is regular and supports two's complement arithmetic with no extra 
logic. I t is possible to extend the recoding approach to higher levels, but then the 
hardware must be able to generate ± 3X. However in [110] this problem is solved for 
a nonredundant radix-4 case. The bit-serial approach computes the multiplication 
of P by operating on each bit serially within the module. So, 
fc/2-1 
P = XY= £ (y 2 i + yx+i - 2 y 2 i + 2 ) X 2 2i-fc+2 
i=0 
This can be shown to equal equation 5.4 by performing multiplication as in equa-
tion 3.6 and as remembered in school mathematics classes. A simplified schematic 
diagram of the 5 level Booth multiplier module is shown in figure 5.3. 
Figure 5.3: Schematic Diagram of the 5 level Booth Multiplier Module 
5. V L S I requirements for synthesis 90 
To implement multiplication in a serial manner, the first module has yo hardwired to 
equal zero. This will change both ONE and TWO outputs of the recoder as shown 
in the diagram 5.3. The inputs to these gated recoded signals will need minor 
modification. I f the designer connected a cascade of these modules together, this 
design would implement multiplication with truncation at the final PPOUT output 
pin. The structure can be made to implement rounding to the nearest integer by 
inserting a one signal on the PPIN input of the first module at the correct time, 
otherwise PPIN should be set to zero [20]. The least significant N-bits of the product 
appear at the output of the adders every two clock ticks and can be retrieved using 
an additional multiplexor being fed from the serial adder. 
5.3.3 A c c u m u l a t o r Des ign 
The adder is simply a two's complement, least significant bit first, bit-serial adder 
module connected to the output of the multiplier. 
5.3.4 F i l t e r Design 
The multiplier coefficients have a word-length of 16 bits, whilst the data path is 24 
bits wide. The number of elements for the Booth multiplier would then be eight 
modules wide. Each multiplier has a latency of 3N/2 bit periods to compute the 
result. In this design, the delay elements are assumed to be contained within the 
Booth multiplier core and thus no external storage is necessary. As each multiplier 
in this design corresponds to the delay element, the —1 factor in figure 5.2 becomes 
an extra multiplier. The Booth multiplier can be configured for coefficients to 
be updated synchronously with the data input, or they can be latched and made 
constant. The latter form will be used in this design. A l l numbers are assumed to 
be two's complement in the range of +2 to —2. The y(n) input to the first delay 
element is truncated to 16 bits with sign extension for the multipliers. The least 
significant 16 bits of the output is sent serially to a bit-serial DAC. The x(n) input 
is a positive impulse of unity triggered by a bit line going high for almost one clock 
word ( 22 bits ) and whilst high outputs ones into the bit stream otherwise outputs 
zeros. 
Assuming the core is predominantly built of adders and multipliers gives a total floor 
space used by the core of 8.25 mm x 2.2 mm using 8906 transistors. The standard 
cells used in this estimation were from the ES2 design kit for Cadence [28]. 
5. V L S I requirements for synthesis 91 
5.4 The C O R D I C Sine Generator 
The COridinate Rotation Digital Computer ( CORDIC ) is a unified algorithm 
for computing trigonometric, square roots and multiply-accumulate functions. This 
algorithm was developed by Jack E. Voider in 1959 [140]. Further refinements to the 
basic algorithm were researched by [40, 141], allowing computation of hyperbolic 
functions and obtaining analytical rules to determine algorithm convergence. This 
is necessary for computation of periodic functions like the trigonometric functions. 
The CORDIC system has two modes, known as Rotation and Vectoring. In 
the rotation mode (circular), the user inputs the Cartesian coordinate components 
plus an angle of rotation, and the system rotates the Cartesian coordinates through 
the angle given. In the vectoring mode (circular), the coordinate components are 
given and the corresponding magnitude and angle are computed. This structure is 
geometric by nature and performs rectangular 4=> polar conversions. The algorithm 
achieves the conversion by using iteration in a divide-and-conquer strategy. 
5.4.1 Der ivat ion for C i r c u l a r Mode 
The rotation matrix to transform a point in x-y Cartesian coordinates through an 
angle 8 in an anti-clockwise direction is described by equation 5.5. 
' x' ' cos 9 sin 9 X 
. y ' . -sinO cos 9 . y . 
(5.5) 
Now, rectangular to polar conversion can be performed by the inverse, which is 
expressed by equation 5.6 and 5.7. 
9 = tan -l y 
x2 + y2 
(5.6) 
(5.7) 
Now substitute 8 = tan 1 r k and using the following trigonometric identities 
and 




Vl + tan 2 8 
1 
s/1 + tan 2 8 
sin[tan 1 r k] = 
, - f e 
5. V L S I requirements for synthesis 92 
and 
cos[tan 1 r k] = 
v/r+F¥ 
and substituting into the equations 5.5 for transforming the vector (x,y), it can 
be seen that multiplication is a simple radix shift right. I t should be noted that 
rotating an angle through — t a n - 1 r~k is achieved by changing the sign of the r~k 
term. 
An arbitrary angle 6 can be built up as follows, noting that ejt = +1 or - 1 . This 
parameter can be shown to be related to the sign of y in the rotation mode and 
related to the sign of 6 in the vectoring mode. 
9 = e_ x90 o + J2ek t an - 1 7 
The magnification after n iterations is equal to 
n N /TTT 3 
= Yl ^k&k (5.8) 
k = -\ 
2k (5.9) 
Jt=i 
Unfortunately, equation 5.9 results in a factor which must be normalised. Since this 
is a constant, the reciprocal can be input to the x or y input on initialisation. This 
factor can also be forced to unity by the addition of another layer of arithmetic 
operations. 
5.4.2 T h e C O R D I C A l g o r i t h m 










Zi k ^1 
(5.10) 
Qi is equal to limm_>±i o r o arctan( v/m r~l) and i > 0. The variable dj is equal 
to —sign(Zi+i) for rotating mode and sign( JYj+i) • sign(Vi + i) for vectoring mode 1 . 
Where sign(a;) is equal to 1 when x > 0 and —1 otherwise. The parameter m 
determines the coordinate system to be used and is equal to —1 for hyperbolic, 
0 for linear and 1 for circular coordinate systems. Incidentally, in linear mode <%i 
becomes just r - 1 and can be implemented using a shifter. In this mode there is no 
magnification and the system wil l thus perform accurate multiply and accumulate or 
division and accumulate functions to machine precision. As the algorithm stands, in 
5. V L S I requirements for synthesis 93 
Rotation Mode Vectoring Mode 
m = + 1 Circular 
x —¥ K\ (x cos z — y sin z) 
y —> K\{y cos z + x sin z) 
z - > 0 
.7; -> iCi \ A 2 + j / 2 
y -> 0 
z -» z + t a n - 1 (K) 
m = 0 Linear 
x —> X 
y ->y + x- z 
z -> 0 
:r —> 1 
y -> 0 
^ - > ^ + ( ^ ) 
m = - 1 Hyperbolic 
2 ; —)• i f - 1 (a; cosh z + ysinhz) 
y —> K-i(ycoshz + xsinhz) 
z -> 0 
x —> i*sr_ j \/x2 — y2 
v->o 
z -> z + t a n h - 1 ( | ) 
Table 5.1: Input-Output Functions for CORDIC Modes 
hyperbolic mode convergence fails and is solved by repeating iterations as necessary, 
see [141] for more details. 
Table 5.1 gives a summary of the operations available using the CORDIC iteration 
algorithm. The rotation mode requires inputs x, y and the angle 0 (z) and the 
vectoring mode requires the inputs x and y with the 9 parameter set to zero. The 
magnification factor k with subscript 1 or - 1 equals ^"Jo1 V l + m r~l. 
Thus using the rotation mode of CORDIC with the circular coordinate system, a 
sinusoidal oscillator can be devised. However, the beauty of this technique is that 
it can generate complex transcendental functions from nothing more than addition 
and arithmetic shifts. I t has also become a very useful building block in general 
DSP and lattice filtering [49]. There are other techniques available for generating 
transcendental functions like BKS [5] but they are not as efficient as CORDIC in 
terms of memory storage and computational effort, except possibly in a redundant 
number system. 
The algorithm can only resolve the smallest angle an-\ of zero in n steps and re-
quires an internal word-length of L + log,. L bits for the desired L bits output. The 
extra log r L bits are required because the arithmetic operations truncate in the 
' i n circular vectoring mode Si = s i g n ( l / i + i ) only. 
5. V L S I requirements for synthesis 94 
conventional weighted number system. An analytical investigation into CORDIC 
convergence limits can be found in [50]. By using redundant, arithmetic such as 
signed-digit, the scaling factor can be made constant [131] and operations can be 
implemented in a bit-serial manner. As mentioned before, the scale factor is re-
moved by incorporating extra shifts and addition in between some parts of the 
usual CORDIC algorithm. 
Finally, before the actual design of the sine generator, it is necessary to understand 
the limits of convergence for CORDIC [48]. In linear mode convergence is bounded 
by unity, in hyperbolic mode the angle is bounded to 1.1182 radians, and in circular 
mode it is bounded to 1.7433 (99°). As this section is concerned with sine and 
cosine generation, I will only mention how to increase the range for this particular 
mode of operation. I f the angle (z) is driven to zero, as in the rotation mode, then 
the binary digits can represent ±180° and the designer can utilise the wrap-around 
property of two's complement arithmetic and number representation as described 
by Daggett [18]. The alternative route is to add a range reduction process to the 
algorithm as an initialisation pass to map any angle greater than 99° into that 
range [40]. This approach is most amenable to the CORDIC algorithm requiring a 
shift of 90° and then a shift of 45°. I t should be noted that the derivation of the 
circular mode using the vector rotation matrix equation 5.5 has these modifications 
built in. 
5.4.3 C O R D I C Sine A l g o r i t h m 
Using the CORDIC rotation algorithm in circular mode, sine and cosine functions 
can be generated in quadrature. The CORDIC structure allows the trigonometric-
functions to be generated without multiplication by utilising addition and shifting 
primitives. 
This algorithm requires a clock input. The shifter will be implemented using a 
barrel shifter to allow division by powers of two. This algorithm requires a ROM to 
store in binary code the arctan(2 _ 1) where i is an integer from -1 to 16. To generate 
s in(^ j^) requires 18 iterations and an internal clock of 18 * 44.1KHz = 775.2 KHz. 
The 44.1KHz is the sampling rate for audio applications. This assumes that the 
sin(a;) at a certain period of time has a word-length of 16 bits. Consequently the 
CORDIC X and Y variables have an internal resolution of 20 bits ( n + log 2 (n), 
where n is the output accuracy in bits. ) The clock speed is quite slow, but 
the CORDIC algorithm can be made to compute thousands of sinusoid oscillators 
by increasing the clock and by pipelining the circuit. The iteration algorithm is 
illustrated in matrix form in equation 5.11 noting that 5i = — sign( Zi+\ ) and the 
5. V L S I requirements for synthesis 95 
iteration counter, i is bounded in the range of — 1 < i < 19 with A equal to zero 
when i is equal to —1 and A is equal to one 2 otherwise. One final point is the shifter 
2~ l is set to one when i = — 1. 
A 5l2~1 0 " 0 
-6i2-1 A 0 + 0 (5.11) 
Zi+i 0 0 1 Si arctan(2 _ l ) 
To generate the sine of an angle, the algorithm requires Zi to be loaded with the 
two's complement value of the angle at iteration time i = —1. After 21 cycles, the 
result appeals on Xi+\ and (i = 19). In this application, the cosine output 
is more valuable, since at power up this output wil l be zero and the audio stream 
wil l be glitch-less. Remembering the scale factor equation 5.9 with A; starting from 
zero results in the cosine to be bounded by a number bigger than unity. To rescale 
the output to be bounded by unity requires the input X_\ to be assigned the 
reciprocal of the magnification factor, which is 0.6072529351 instead of unity. The 
other variable Y l i is set to zero. 
So far, the design has concentrated around the CORDIC algorithm. This part of 
the system would only generate a single sine of an angle. Using a phase accumulator 
in two's complement arithmetic configured as a ramp generator, as in table lookup 
oscillators, allows the CORDIC algorithm to implement a sinusoidal function with 
respect to time. The phase accumulator is simply an adder and a feedback storage 
element which generates a ramp function of the form shown in equation 5.12. Note 
that Ax is the output word at time slot x, 5 is the increment and n is the word-length 
of the adder. 
A n + i = {An .+ 5)mod 2n (5.12) 
5.4.4 Des ign of the C O R D I C sine Generator 
Using equations 2.1 and 2.2 with F m a x to be 20 kHz and . F m m to be 0.002 Hz as a 
reasonable estimate, gives a phase accumulator word-length of 24 bits. But only the 
top 20 bits are used by the CORDIC algorithm. The phase accumulator internal 
word-length was designed to have 20 bits, to directly interface to the CORDIC 
iteration algorithm, which has an internal 20 bit datapath for all variables, X, Y 
and Z. The output from the cosine path of the CORDIC processor is truncated to 
the most significant 16 bits. 
2 A c t u a l l y the reciprocal of the magnification factor. 
5. V L S I requirements for synthesis 96 
The chip packaging has a direct relationship to cost, because the more pins required 
imply a larger silicon die size. In this design, a 40 pin DIL package was chosen. 
There are 16 pins for the output cosine, two diagnostic pins which are outputs from 
internal iteration signalling, and a pin indicating the sign of Zi+\. Apart from power 
pins and clock the phase accumulator is loaded in blocks of 8 bits using a load and 
two chip select pins. Finally, there is a reset to zero pin to initialise the chip. The 
input pins drive T T L non-inverting with internal pull-up resistors of 125 k f i and 
the output pins are driven by CMOS non-inverting 4 mA source/sink pads. 
A schematic diagram of the CORDIC iteration algorithm can be seen in figure 5.4. 
The X input is either zero or the reciprocal of the scaling factor under the control 
of A. The selecting of zero or non-zero constant can be achieved by using an AND 
gate with A as one input and the constant as the other. Using DeMorgan's rule, 
A • B = A + B and A can come from the NOT output of a D Flip-Flops and the 
signal B (A) can be inverted and stored in a ROM. This reduces the transistor 
count per bit from 6 to 4. A ROM table is required to store the angles used by the 
iteration, being of value 90°, 45° and arctan(2 _ t) for.? > 0 with 20 bits of precision 
and coded in Q19 format. 
5. V L S I requirements for synthesis 97 
From Phase 
Accumulator 
S H I F T E R 
x R E G I S T E R 
AND Array 
A D D E R / 
SUBTRACTER 
+8 
S H I F T E R 
0 y R E G I S T E R 
ANDArray — 
Decision; , 
c . . X < sign of z ; Signul O ): 
A D D E R / 
SUBTRACTER 
-5 
z R E G I S T E R 
A D D E R / 
SUBTRACTER 
+6 
ROM R E G I S T E R 
cunstonts: Ct m, F 
R E A D -
O N L Y 
M E M O R Y 
Sinusoidal 
Output. 
Figure 5.4: Block Diagram of Sinusoidal Generator 
5. V L S I requirements for synthesis 98 
M U X l> I . J U I I 
2:1 
Figure 5.5: Block Diagram of the Barrel Shifter 
The implementation of the shifter, 2~l would normally be implemented using an 
n x n crosspoint switch with each switch (transistor) connected diagonally to carry 
out the shift. An example CORDIC implementation using this with redundancy 
caused by the arithmetic right shifting can be seen in [40] and is labelled a scaler. My 
approach is to split, the shift up into smaller shifts by using a series of multiplexors 
connected in a tree topology as shown in diagram 5.5. This is known as a Barrel 
Shifter and is ideal for use in automated datapath generation. The inputs A, B and 
C are 2, 2 and 1 bit lines long and the signal for a particular shift is best served by 
storing the values in a ROM. Now all that is required is a sequencer which drives 
the ROM inputs according to i. This is implemented as a modulo counter. 
The input interface is a simple 2-to-4 de-multiplexor with load signal driving four 8-
bit registers. The load signal is used as a clock for the registers. The last bit sequence 
to the multiplexor (all ones) updates the input register in the phase accumulator, 
so changes in setting the input angle word do not effect the sinusoidal output. The 
output of the chip ready for DAC input is composed of edge-triggering D-flip-flops 
wi th asynchronous reset to zero. 
5.4.5 Sequencer Des ign 
Initially a mod 19 counter was designed as the sequencer unit. This required 5 D 
flip-flops and 10 4x4 Karnaugh maps and a state-machine design was attempted. 
Unfortunately, the state equations gave 18 NAND gates ranging from 2 to 4 inputs. 
This was not very efficient for VLSI design, but by using toggle flip-flops the num-
ber of external gates dropped. The T flip-flop is based around a D flip-flop and 
has an enable (EN) and synchronous reset (SYNCR) inputs as in figure 5.6. The 
synchronous reset allows the flip-flop to reset only at the clock edge and is active low 
(i.e. zero resets flip-flop). The circuit can perform a toggle operation every clock 
cycle by setting enable to route Q to the D input of the flip-flop. This performs a 
divide by two counter. 
To count modulo 64, the least significant flip-flop has EN tied to unity and all the 
5. V L S I requirements for synthesis 99 
5VNCR ^ 
3 'f orrR uuxs 
NA2 
32 
» 0 7 
NISt / 
CLK 
R E S E I 2 fr-
Figure 5.6: Schematic Diagram of a T flip-flop 
SYNCS ^ — 
Figure 5.7: Schematic Diagram of first T flip-flop for mod counting 
remaining four flip-flops have two input AND gates with enable and Q connections 
for each flip-flop daisy chained to the enable of the next flip-flop along. The most 
significant flip-flop has no AND gate across i t . Thus a modulo ripple counter has 
been made. The first flip-flop always has enable set to one and can be implemented 
as shown in figure 5.7. 
To convert this counter to a modulo 19 counter, the designer requires the monitoring 
of state 18 which in binary is 10010. This is used to reset the counter for the next 
cycle. Since the counter always starts from zero, it never counts beyond 18 so the 
sequence becomes 1XX1X with X equal to don't care states. Therefore the global 
synchronous reset, input is driven by a NAND gate connected to the Qi and Q5 
outputs of the flip-flops. It is possible to remove the synchronous resets on the T 
flip-flops three and four as this line is not necessary. However, flip-flop one will 
5. V L S I requirements for synthesis 100 
A I I D U I A D D K 1 A D D K 3 A U I 1 H J A l l D K S 
4 
Do Jo EN KN EN 
L I 
H K S K T Z 
5 S Y N C H S Y N C H 
Figure 5.8: Schematic Diagram of a modulo 21 counter 
change state. This extension was not implemented as it had no major advantage 
in VLSI. The circuit wil l work but the checks for 1, 11, 111 and 1111 implemented 
with the AND gates are the limiting factor because flip-flop five has 3 ganged gates 
resulting in a ripple delay. I f the clock was sufficiently fast, a false count would be 
possible. This can be overcome by using two, three and four input AND gates. 
The actual counting sequence is modulo 21, thus the counter must be reset after 
the next cycle at count 20 or 10100 in binary. Therefore the NAND gate inputs 
required are connected to Q2 and Q4 outputs as in figure 5.8. 
5.4.6 R O M Design 
The ROM array was initially designed to store the binary representation of the 
angles used by the CORDIC algorithm. However, since the shifter uses 5 lines and 
there are two extra control signals, a ROM also became the most convenient way to 
sequence the control aspects of the design. This is achieved by controlling the R.OM 
address using the modulo 21 counter. The angles are generated by the following 
C+4- source fragment on page 101. The output of the C + + code is a table of count 
(ROM address) and ROM contents in hexadecimal. These elements are recoded 
into two columns, the address in decimal and the contents in binary, and are passed 
to the ES2 Megacell generator which is part of the ES2 design kit for Cadence. The 
complete ROM contents in human readable form can be found on page 102. The 
signals SSL, NSTART (A) and INIT are internal control signals and select<0:4> 
controls the barrel shifter. 














double temp; // workspace v a r i a b l e 
i n t count; // counter c y c l i n g from 0 to 21 
long t a b l e [MAX]; // storage f o r arctan t a b l e 
temp = pow(2.0,19.0) - 1.0; // Uses Q19 format 
temp = temp / P I ; // modifies Q19 format f o r angles 
// within 2 PI 
t a b l e [ 0 ] = (long)floor(PI/2*temp); // 90 degrees 
for(count=l; count <MAX; count ++) 
{ 
table[count] = (long)floor(atan(pow(2.0,-(double)(count-1))) 
* temp); 
} 
fprintf(stdout,"\n\tR0M Address\tR0M value\n\n"); 
for(count=0;count<MAX;count++) 
f p r i n t f (stdout ,"\t7,d\t\t7 t08x\n" .count .table [count] » SHIFT) ; 
> 
5. V L S I requirements for synthesis 102 
Index SSL INIT select<0:4> romtable<19:0> rom s h i f l 
& (hex) 
NSTART 
0 1 1 - 00000 - 00000000000000000000 00000 I 
1 0 1 - 00000 - 01000000000000000000 04000 0 
2 0 0 - 00000 - 00011111111111111111 O l f f f 0 
3 0 0 - 10000 - 00010010111001000000 12e40 1 
4 0 0 - 01000 - 00001001111110110011 09fb3 2 
5 0 0 - 11000 - 00000101000100010001 05111 3 
6 0 0 - 00100 - 00000010100010110000 028b0 4 
7 0 0 - 10100 - 00000001010001011101 0145d 5 
8 0 0 - 01100 - 00000000101000101111 00a2f 6 
9 0 0 - 11100 - 00000000010100010111 00517 7 
10 0 0 - 00010 - 00000000001010001011 0028b 8 
11 0 0 - 10010 - 00000000000101000101 00145 9 
12 0 0 - 01010 - 00000000000010100010 000a2 10 
13 0 0 - 11010 - 00000000000001010001 00051 11 
14 0 0 - 00110 - 00000000000000101000 00028 12 
15 0 0 - 10110 - 00000000000000010100 00014 13 
16 0 0 - o n i o - 00000000000000001010 0000a 14 
17 0 0 - 11110 - 00000000000000000101 00005 15 
18 0 0 - 00111 - 00000000000000000010 00002 16 
19 0 0 - 11111 - 00000000000000000000 00000 17 
20 0 0 - 11111 - 00000000000000000000 00000 18 
The generator created a ROM with 21 words, having 27 bits per word configured in 
108 columns by 8 rows. The output data from the ROM is buffered. Using 0.7f.im 
ES2 design rules, the dimensions of the ROM are 902.50 x 374.40 (nm x /.an) with 
a total area of 0.34 square millimetres. Unfortunately, the ROM data address space 
is rounded up to the nearest power of two. 
5.4.7 C O R D I C register Design 
The registers, X and Y are implemented using synchronous flip flops because they 
are necessary for initialisation of the CORDIC iteration. The Y register is composed 
of synchronous reset flip-flops whilst the X register has both reset and set flip-flops 










Figure 5.9: Schematic Diagram of a Synchronous Reset Flip-Flop 
which describe the binary constant for normalisation. A diagram of the synchronous 
reset flip flop is shown in figure 5.9 and to implement the synchronous reset, pins 
SIN and SSL are tied together. To implement a synchronous set latch, the inverter 
is removed. 
The Z register is implemented using a normal latch/flip-flop but has a multiplexor 
on its input to access the angle generated from the ramp generator and the output 
of a 20-bit adder/subtracter. 
5.4.8 Adder Design 
The adder/subtracter is simply a two's complement full adder as shown schemati-
cally in figure 3.1 on page 41. 
5.4.9 Construction and Testing 
The chip was designed using the 0.7//m ES2 two metal layer CMOS design rules 
for Cadence and implemented in a 40 pin DIL package. The chip die size was 3.5 
mm x 3.5 mm with a core size of 2.5 mm x 2.5mm. The entire design was entered 
using hierarchical schematic data entry. The initial design was to have used ES2's 
datapath compiler, but unfortunately i t would not work with the macro generator 
and had to be ignored. Schematic entry proved to be troublesome, due to the 
placement of the datapath bus. An example of this occured with the barrel shifter 
design. This was detected by comparison with a C source program and a simulation 
of the shifter. The comparison indicated incorrect behaviour for shifting greater 
than three bits. I looked closely at the schematic to make sure all the wires were 
5. V L S I requirements for synthesis 104 
labelled properly, and found some missed interconnections. A Verilog or VHDL 
high level description language should have been used to save valuable design time. 
The chip pad layout was drawn using the notes from the guide [28] for simulation 
and routing. 
Test vectors for a C + + simulation and design were developed. I compared the C + + 
simulation with the Verilog simulator contained within Cadence. I implemented 
the excitation file for the design using Cadence's propriety language called STL ( 
Simulation and Test Language ) and the output was displayed in both tabular and 
graphical form. 
To test the chip, I used test vectors based on a phase increment value of 30°. Over 
time, the phase accumulator added this increment to its output. This created a 
periodic ramp function which drives the CORDIC angle input. The phase increment 
was coded as the 24-bit number 0x155550 in hexadecimal. This number was applied 
to the chip in three 8-bit portions and then a control command was issued to pass 
the ful l 24 bits to the phase accumulator. The comparison between the expected 
value of the sine of the angle and the corresponding output of the chip simulation 
can be shown clearly in figure 5.10. The sine function has been interpolated as a 
smooth line between all x ordinates for visual clarity. The chip simulation appears 
at integral sample values which are normalised to the output sample timestep. This 
rate is 20 times slower than the input clock rate used to drive the whole chip. 
5. V L S I requirements for synthesis 105 
40000 
Sine 











0 . . . 
-40000 1 1 1 ' 1 1 ' 1 1 1 
0 5 10 15 20 25 30 35 40 45 
Sample Intervals 
Figure 5.10: Comparison between Sine Function and CORDIC Simulation 
The error difference between the sine function and the CORDIC simulation appears 
as the least significant bit of a 16 bit two's complement number. This error varies 
with the phase value and can be seen in figure 5.11. The error points appear as 
dots in the mid-points of the boxes, which are plotted here for clarity. The error 
difference was as expected, because the CORDIC algorithm is iterative in nature. A 
constant error across all inputs would not occur due to two's complement rounding 
errors. The error is small enough to be neglected in this application as the noise 
floor of the audio output circuitry would be much larger. 
5. V L S I requirements for synthesis 106 
1.5 , , , , , , , , 
0.5 -
o 
I o 1 1 1 1 I ' I r 1 i i i ' 1 ' 1 -
Q 
•0.5 -
- 1 5 I I I I I I I I I 
0 5 10 15 20 25 30 35 40 
Sample Intervals 
Figure 5.11: Error Difference between Sine Function and CORDIC Simulation 
The chip was then routed via the fioorplanning environment and initially failed for 
power line routing. This was overcome by manual/interactive routing [28. 149]. 
After final testing the design was sent electronically to Rutherford Appleton Labo-
ratories in Oxfordshire to be checked. The design was then sent to IMEC in Belgium 
for manufacture under the EUROCHIP project. The chip contained approximately 
7000 transistors. I t was then tested using Durham University's chip testing rig us-
ing the HP 8180A data generator and HP 8182A data analyser. The chip has the 
following inputs and outputs as shown in table 5.2. 
Name Function I /O 
CLK Clock input 
INP<7:0> Angle Increment input 
RAMS<1:0> Mux Select input 
LOAD Angle Transfer input 
RESETZ Initilisation input 
DC<15:0> Sine Data output 
INIT Latch Load output 
NSTART ROM NSTART Signal output 
SIGN CORDIC Sign Signal output 
Table 5.2: CORDIC Sine Chip Input-Output Connections 
5. V L S I requirements for synthesis 107 





Table 5.3: Current Consumption of CORDIC Sine Chip 
The RESETZ line of the chip was driven by the 8180A and was set as a return to 
zero signal. This caused the chip always to reinitialise. Thus non return-to-zero 
coding was used. The output levels of the generator was set to 0V and 5V (true 
T T L levels). The first test was to check total current consumption of the chip at 
various frequencies by grounding the inputs and setting the 8180A data generator 
into cycle mode. Table 5.3 shows the current consumption of the chip for various 
clock rates; the current figure was obtained from the analogue meter on the test rig. 
This table clearly shows the linear relationship between clock frequency and current 
consumption. 
I then set the 8182A data analyser to be clocked via an external input, the I N I T 
pin of the chip. Thus the analyser only saved the output signals when there was a 
transition on INIT. I transferred the Verilog simulation test vectors to the 8180A 
data generator and configured the outputs (inputs to the chip). Initially the output 
cycled as expected from Verilog simulation. However, looking at the analyser log of 
activity, I saw that the cycle locked up into a wrong value. To investigate what was 
happening, I connected an Oscilloscope to an output pin and noticed an asymmetric 
waveform which was confusing the scope's trigger. The oscilloscope confirmed that 
the output pads of the chip were capable of driving up to 10 MHz. Beyond this 
frequency the voltage drop was too severe for CMOS or T T L to operate. 
The device was found to work intermittently. Initially I thought this problem was 
internal to the chip. However the chip worked when one of the Oscilloscope's leads 
was connected to ground and the other to one of the input pins on the test rig. 
After further exploration between the oscilloscope and test rig, I found a poor 
ground connection. 
A scanned photomicrograph of the completed die can be seen in figure 5.12: the 
block in the top left hand corner represents the Megacell ROM core. 












Figure 5.12: Photomicrograph of the CORDIC Cosine Chip 
5.5 Comparisons between Sinusoidal Generators 
Both the CORDIC and bit-serial I IR filter designs are of comparable size, however i t 
must be stressed that the filter design would take-up more space and the number of 
transistors and size is a conservative estimate. The bit serial Booth multiplier and 
adder arrangement has a major disadvantage in that the poles of the filter determine 
the sinusoidal frequency, whilst the CORDIC approach has no poles and is always 
stable. The filter form has a latency on each multiplier of 24 clock periods which is 
comparable to the iteration time of 21 clock periods for the CORDIC processor. 
The CORDIC and filter oscillators would need serious modification to enable multi-
ple sinusoids to be generated. The CORDIC structure would need R A M and a sub 
counter to allow 21 sinusoids to be generated in a conveyor belt (pipelined) fashion. 
I t would also need a better output stage, possibly an adder to add the sinusoids 
together, but the amplitudes of the sinusoids are constant. The filter structure 
would need a R A M store for the delay elements and substantial rerouting of the 
design. But the arithmetic modules can be made to compute many sinusoids if they 
have adequate storage and each filter generated sinusoid is computed and stored in 
a multiplexed manner. This approach originally designed for a cochlea simulation 
was done using 4 /um nMOS design rules [80], having a sample rate of 500 kHz for 
each filter with a clock of 12 MHz. The filter approach automatically can provide 
5. V L S I requirements for synthesis 109 
sinusoidals of varying amplitudes, but like the CORDIC design would need some 
output modifications unless the samples were sent out in a multiplexed manner and 
analogue circuitry provided the accumulation. 
By increasing the maximum clock frequency and pipelining or multiplexing the 
computational elements with a slight increase in silicon area, these designs could 
generate sinusoids approaching a million a second. A software based real-time si-
nusoidal oscillator implemented on one INMOS 20 MHz T800 chip generates only 
8 oscillators using the I IR filter approach [54]. Compared to software based sys-
tems running on DSP or microprocessors, these VLSI hardware designs outperform 
software based systems easily and with far lower clock speeds [105]. 
An interesting approach to sinusoidal tone generation [78] is to use oversampling 
in addition to sigma-delta modulation, in order to convert, an N x N bit multiplier 
to a N x 1 bit multiplier to be used for I IR [60] and FIR filter structures. The 
modulator is a second order type having a quantiser element which is implemented 
as a sign bit extractor. This structure has 4 additions and two delay elements, but 
on closer inspection two adders are N by 1 bit types and hence can be replaced by 
multiplexors. This topology is ideal for on-chip self-test applications where spare 
silicon floorspace is at a premium. A 16 tone oscillator implemented as two 8 tone 
oscillators operating at a clock rate of 5.2 MHz with an effective sampling rate for 
each oscillator of 655 KHz was proposed [77]. A four tone version using 84 % of a 
Xilinx XC4010 FPGA was then built. 
In [117], Roza proposed an oversampling scheme using sigma-delta modulators suit-
able for video based applications, whose aim was to reduce power dissipation and 
clock skew in the datapaths. This oversampled sigma-delta approach is ideally 
suited to DSP based systems which require a direct interface to the analogue world. 
Sigma-delta converters are used before a one-bit DAC in order to re-quantise an N-
bit signal and to reshape the noise spectra beyond the Nyquist rate. This improves 
the signal-to-noise ratio in band and is cheaper to implement than a multi-bit DAC. 
C H A P T E R 6 
Decaying Sinusoidal Additive 
Synthesis 
In this chapter, a parametric additive synthesis technique for musical tone gener-
ation will be presented. The classical approach approximates the waveform in the 
time-domain by a series of sinusoids, as in equation 2.3. This method is known as 
additive synthesis using Fourier analysis. However, in reality, musical instruments' 
timbre have bursty characteristics. Most instruments are difficult to model using 
Fourier techniques because sharp transitions require many high frequency harmonics 
(partials). Another failing of classical additive synthesis results from the assumption 
that the waveform is periodic for all eternity. Clearly, this approach is absurd and 
requires non-intuitive approaches to pretend that the waveform is periodic. Hence 
the implementation of short-time Fourier analysis, overlap-add and overlap-save 
techniques. 
The alternative approach, as discussed in this thesis, is to assume that the basic 
element of the instruments' waveform is a wave packet. This approach is analogous 
to quantum mechanics; both cases the waveform is periodic and has an envelope. 
The envelope makes the waveform discontinuous in time, but Gabor proved that 
gaussian modulated sinusoids behave like sinusoids [35] and an additive synthesis 
technique could be implemented. In my section on additive synthesis it was stated 
that any waveform can be decomposed by a sum of simpler waveforms provided 
these simple waveforms were orthogonal to each other. 
I have chosen to use a wave packet which has a sharp attack and an exponential 
110 
6. Decaying Sinusoidal Additive Synthesis 111 
decay. This elemental waveform can be modelled as the impulse response of a 
damped second order filter. This waveform is very useful because it has the ability to 
model resonances (formants) in physical systems. Therefore, the proposed synthesis 
technique is inspired by classical instruments; e.g. violins, trumpets and the versatile 
human voice. The spectra generated by real musical instrument's can be modelled 
by parallel and/or series connection of second order filter sections. The rest of this 
chapter wi l l be concerned with the investigation of these filters, their limitations 
and alternative solutions. 
6.1 Analysis of a Second Order Filter Structure 
I t is well known that the z-transform of an exponentially decaying sinusoid, ex-
pressed as e~at s\n(u>t + </>) is the following :-
sin(</>)z2 + sin(u — (j))e~az 
z2 - 2e- Q cos(w)z + e~2a 
By dividing numerator and denominator by z2, this filter structure is similar- in form 
to the transfer function below. 
1 + dz~x 
1 + az~l + bz~2 
Notice the structure has two unity and three variable coefficients. When expressed 
as a linear difference equation, there are three multiplications and four additions. 
The multiplications arise from the coefficients a, b and d. The numerator part of 
the transfer function, when d — 1, generates zeros in the spectrum at 0 and Nyquist 
frequencies. This corresponds to a bandpass filter in the digital domain. There-
fore resonances (formants) can be modelled using bandpass filters with frequency, 
amplitude, phase and bandwidth parameters. 
However, for vocal synthesis [115], the waveform has a noticeable attack rather than 
an abrupt attack as generated by a second order filter. The approach chosen by 
Xavier Rodet [115] was to modulate the impulse response of a second order filter 
by an asymmetric Hanning window. This window was chosen because of its useful 
frequency placement properties and time domain behaviour similar to the natural 
glottal pulse in human speech [76]. The human ear is sensitive to phase below 
500 Hz [93] and consequently a time-reversed glottal pulse, as approximated by 
integrating a short pulse twice would be differentiated relative to the human glottal 
pulse. The wavefunction chosen generates static zeros in the z-domain, whilst the 
real human impulse expands and shrinks with the pitch period. However, both 
methods are of equal merit provided the glottal spectrum has no fixed real-frequency 
zeros. 
6. Decaying Sinusoidal Additive Synthesis 112 
6.2 Analysis of the F O F technique 
The Forme d'Onde Formantique (FOF) wavefunction is composed of three sections. 
The first section is the second order bandpass filter response function, the second 
is half a Hanning window and the last is a unity constant. The wavefunction is as 
described in equation 2.9 and is reproduced below. 
f i ( l - cos(/3*))e-Q tsin(u;o t + 4>) 0 < t < 1 
ib(t) = < , p 
[ e~a 1 sin(wo t + <f>) t > I 
This continuous time form has | equal to the attack time 1 and a proportional to the 
bandwidth of the filter. The initial phase {(f)) and angular formant frequency (u>o) 
are also defined. This wavefunction is triggered by an idealised impulse train with a 
period inversely proportional to the pitch. Figure 6.1 shows one wavefunction grain 
initiated at zero time with a bandwidth of 40 Hz and attack duration and oscillation 
frequency adjusted to be visually clear. 
! 0 . 5 
! 
I 
WW*** H 0 V ft 
i 
0 . 5 
0 500 1 0 0 0 . 1 5 0 0 2 0 0 0 2 5 0 0 
t ime 
Figure 6.1: The FOF Wavefunction 
The spectral properties of the wavefunction can be illustrated by taking the Laplace 
transform of the equation 2.9 in a piece-wise linear fashion. Here, it is necessary to 
'Beta is known as the skirtwidth in the frequency domain and evaluates to the - 40 dB point. 
6. Decaying Sinusoidal Additive Synthesis 113 
assume that the wavefunction's amplitude falls to zero at infinite time. Only the 
envelope is operated on; the oscillatory part is neglected. The Laplace transform is 
found to be 
I T ( a + j ) 
1 ^ 0 + 1 > (61) 
2 {(s + a)2 + /32) (s + a) K ' 
The power spectrum can be obtained by substituting s = ju and multiplying by 
the complex conjugate. Then the answer is the square root of the expression and is 
shown in equation 6.2, with W = e 0 . 
02 v / W 2 + 2 W c o s ( ^ ) + l 
= T 7 « 2 + y/(ar+ ulY + W 2 + 2a 2 - 2a;2) ( 6 ' 2 ) 
I t can be shown that around the centre frequency of the formant, the power spectrum 
approximates to 
1 T , 1 VW2 + 2W + 1 W + l 1 
2 \J a 2 + U J 2 2 sjot2 + u>2 
Therefore the shape of the FOF wavefunction about the formant frequency is of the 
form 
K 
sja2 + (uc - U J ) 2 
This corresponds to the second order bandpass filter structure. 
6. Decaying Sinusoidal Additive Synthesis 114 
dB 0-
-20-
-40- A \ j = 0.1 msec 
-60 
-80- • / \ jj = 1 msec 
-100- \f / 
\ \ / 
-120- '* • / 
\ U 
\ ' 
-140- \ | = 10 msec 
-160-
0 1 2 5 0 2 5 0 0 3 7 5 0 5 0 0 0 Hz 
Figure 6.2: Power Spectrum of the FOF Wavefunction 
A graphical plot of the power spectrum of the FOF wavefunction equation 2.9 was 
obtained by varying the angular frequency, cu and plotting the magnitude in decibels 
of using equation 6.2. This is shown in figure 6.2 with a = 80 • ix for various 
attack durations ( | ) . Note that the bandwidth is 80 Hz. 
Further work by researchers [21] applied the oscillation into the Laplace transform 
of the wavefunction by substituting s = s — juo into equation 6.1. To get the 
power spectrum they substituted the new s parameter by ju> and followed the steps 
as above. The researchers then implemented the wavefunction in the time-domain 
using recursion and found that the arithmetic cost was 11 multiplications and 8 
additions per sample. They generalised the skirtwidth parameter ^. However, I 
believe this generalisation is ultimately deemed unimportant because the purpose 
of the skirtwidth parameter is to tidy the formant shape in the frequency domain 
and is normally constant for the timbre to be synthesised. 
6.2.1 Implementation Approaches to the F O F wavefunction 
In this section, I wil l investigate the two different approaches to implementing the 
FOF wavefunction. They are broadly cast as time domain, involving table lookup, 
6. Decaying Sinusoidal Additive Synthesis 115 
or f i l te r based composed of defining the problem in a recursive manner. Bo th meth-
ods can result in efficient construction of the waveform, however they bo th have 
disadvantages. 
6.2.2 Wavetable Based F O F Synthesis 
This approach is the most common implementation of F O F and appears in C H A N T 1 
and C S O U N D implementations. As C H A N T source code is not commonly available, 
the discussion on this approach w i l l involve the uni t generator concept of C S O U N D . 
C S O U N D is a program which allows musicians to create instruments by w i r i n g 
control and generator bui ld ing blocks together, in a manner reminiscent of the old 
analogue synthesisers w i t h patch cords to route sound c i rcui t ry together. The sys-
tem uses two files, one contains the instrument def in i t ion whils t the other tells the 
program when to play them. This concept is like the conductor and the orchestra. 
The program has been designed to compute the different flows in music opt imal ly. 
Th i s is achieved by separating the low frequency control f r o m the regular and faster 
audio stream. This enables the use of vector based ar i thmetic to speed up compu-
ta t ion , a l though implementat ion is on each uni t generator due to the design of a 
f ixed scheduler. 
Now, we focus our at tent ion on the F O F uni t generator. The analysis w i l l be 
s implif ied as C S O U N D has recently been upgraded and the uni t generator has been 
completely re-wri t ten. 
2New version of C H A N T includes F O F wavetable and filter approaches. 
6. Decaying Sinusoidal Additive Synthesis 116 
1; 
'Attack' Envelope Table 




'Decay' Envelope Table 
Pitch ( x2 ) u 
Uctaviation Mechanism 
Repetition Fre<|. (Pitch 
Formant Freq. 
Sinusoid Lookup Table 
Figure 6.3: Simplif ied CSOUND F O F Implementat ion 
I n f igure 6.3, the ma jo r implementation aspects of a wave-table based FOF wave-
func t ion synthesiser are presented. When a new wavefunction is triggered, the 
attack envelope and exponential decay are adjusted to the nearest sample interval 
and act as i f operating at a much higher sampling rate. Af te r a period of t ime, 
related to the sk i r tw id th parameter, only the exponential decay is computed and 
f ina l ly the decay envelope is triggered. A l l these are mul t ip l i ed together and the for-
mant frequency is applied. The formant frequency is driven by its frequency and the 
p i tch rate, al igning the sinusoid correctly w i t h respect to the impulse trigger. The 
oscillator's phase adjustment is paramount to the perceived audio qual i ty because i t 
affects the frequency spectrum when mult iple formants are summed together. This 
removes spectral nulls i n the spectra which have unpleasant audio artifacts [116]. 
Final ly, octaviat ion [15] allows the musician to metamorphose the sound by al ter ing 
the exci tat ion frequency and ampli tude in powers of two, thus picking out even and 
odd partials. This can be made gradually to fade out alternate excitations, even-
tua l ly halving the rate at which new excitations are produced and thus halving the 
p i tch . This effect can best be described as making a vocal imi t a t ion sound deeper 
and reverberant. This is shown schematically as two impulse trains, one twice the 
frequency of the other, being subtracted to create a compound impulse t ra in . 
The wave-table approach requires storage allocation for each new excited wavefunc-
t ion and these resources are only relinquished when the wavefunction has decayed 
to zero at a certain t ime in the future . The inspired reader w i l l also realise that 
each new wavefunction w i l l consume more computat ional requirements, thus slow-
ing down the synthesis. Each wavefunction is summed internally and for real-time 
6. Decaying Sinusoidal Additive Synthesis 117 
operation and a l l wavefunctions must be computed w i t h i n the sampling period. 
Clearly, for high fundamental frequency, this approach would require many wave-
functions to be computed and thus be a burden on the f in i te resources available. 
The insane solution is to have a computat ional device having unbounded storage 
and computat ional resources, an inf in i te ly powerful computer ! 
I n [116], Rodet et al . waxed lyr ical about the importance of f loat ing point a r i th-
metic in the control and synthesis of vocally inspired sounds. However, for V L S I 
implementat ion, f loat ing point ari thmetic units consume enormous amounts of si l-
icon and possibly power consumption. I n my opinion i t is better to optimise the 
design to use cheap ari thmetic processing elements using integer based ar i thmetic 
so that the control can be part i t ioned to run on a host processor. 
The problem I discovered w i t h the wavetable based approach as applied to V L S I 
implementat ion, is the unbounded nature of the computat ion and storage require-
ments. A single chip silicon based engine requires al l computat ion to be computed 
w i t h i n a sampling period and storage should be on chip. The wavetable method 
would require some computat ional fix to prune out wavefunctions, thus d is tor t ing 
the synthesis quality. I f this approach is not followed, the system would only be 
opt imis t ica l ly capable of monophonic synthesis. The a im of this thesis is to create 
real-time polyphonic voice synthesis. 
6.2.3 Filter Based F O F Synthesis 
The filter-based approach is more amenable to silicon because time-overlapping of 
the impulse response to each source pulse input is the result of the autoregressive 
structure of the f i l te r . Current f i l ter approaches, most notably the second order 
f i l t e r approach as implemented in Samson's Box, require complex methodologies to 
solve the zeros in the spectrum resulting f r o m the parallel summation of formant 
fi l ters. This problem is common to many speech parallel based formant synthesisers, 
see equation 2.8 on page 21. 
To overcome this d i f f i cu l ty I decided that a more complex f i l ter having properties 
similar to the F O F wavefunction is required. This section investigates a hybr id 
architecture consisting of a f i l ter generating the asymmetric envelope of the F O F 
wavefunction being mul t ip l ied (heterodyned) by a sine wave. The sine wave would 
be generated by a wave table approach optimised for silicon. The C O R D I C based 
sine generator is ideal for this synthesis method. This generator was discussed earlier 
in this thesis and can be found in section 5.4 s tar t ing f r o m page 91. 
Using a Laplace to Z transformation method of section A for equation 6.1, the 
6. Decaying Sinusoidal Additive Synthesis 118 
envelope can be recoded into a structure suitable for d ig i ta l implementat ion. The 
z-transform of this envelope can be shown to be equal to equation 6.3. A slight 
change of the wavefunction def ini t ion in equation 2.9 is to substitute /? to equal ^ . 
This makes the (3 parameter equal to an integer sample value, directly corresponding 
to the attack t ime. 
= (e-<"V-fl + 1) z ((1 ~ c o s ( ^ ) ) e - 2 ^ + (1 - c o s ( ^ ) ) e - ^ z ) 
{ Z ' 2 { z - e - a T ' ) (z2 - 2 e - ^ c o s ( ^ ) z + e - 2 ^ ) 
(6.3) 
I t was found that implementat ion of equation 6.3 in a parallel structure, when 
modelled using M A T L A B , created visible ripples i n the t ime domain response. I m -
plementing the equation as a cascade of a four non-zero tap F I R f i l te r and two I I R 
filters of orders one and two respectively, gave the correct response. Figure 6.4 shows 
the F O F envelope w i t h varying bandwidths and attack durations. These envelopes 
have also been normalised. 
500 1000 1500 2000 2500 3000 3500 500 1000 1500 2000 2500 3000 3500 
Simple Initivala Sample Inteivals 
(a) (3 = 1500 and zero bandwidth (b) (3 = 500 and 5 Hz bandwidth 
500 1000 1S00 2O00 2500 50 100 150 200 250 
SampU LnMivaU Sample Intervale 
(c) 0 = 1500 and 10 Hz bandwidth (d) P = 500 and 100 Hz bandwidth 
Figure 6.4: T ime domain behaviour of F O F envelope generator 
6. Decaying Sinusoidal Additive Synthesis 119 
Assuming un i t ampli tude impulse excitation of the filter (equation 6.3), the max-
i m u m ampli tude occurs when t = | a r c t a n ( ^ ) / ? . The /? parameter is assumed 
to be quantised to sample number. Therefore, 1 ms equals 44.10 samples, when 
the sampling rate / s equals 44.1 kHz. Also, a = TT bandwid th / f s and le t t ing the 
bandwid th be represented by A , gives the scaling factor of 
2 - 2 A -f t 
A2p2+ f2 
(6.4) 
Clearly as bandwid th and/or attack durat ion approaches zero, the factor approaches 
unity. 




(a) P — 10 and zero bandwidth (b) P = 10 and 1000 Hz bandwidth 
Figure 6.5: Zero-Pole plot of the FOF envelope generator 
Notice that there are three poles and /? + 1 zeros and, ideally, pole-zero cancella-
t ion occurs around the three poles. This has the effect of requiring long integer 
wordlengths for stabili ty, o p t i m u m performance and resolution for the bandwid th . 
However, the structure seems to be compact and usable. I f the musician assumes 
the sk i r t -wid th /a t t ack t ime to be constant and quantised to the sampling rate of 
0.03 seconds, then 1323 samples are required. The to ta l delay line size of the filter 
would be 3*/3, and equals approximately 4 kilobytes. This memory requirement for 
each filter is excessive and could be reduced by using mult i - rate techniques on each 
filter. This approach might make the synthesis technique unrealisable. Al terna-
tively, i f musicians use this synthesis a lgor i thm and f i x the attack t ime parameter, 
the provision of resources to implement such a parameter becomes unnecessary. 
6. Decaying Sinusoidal Additive Synthesis 120 
The complete fi l ter-oscillator hybr id FOF formant generator [128] is shown i n f ig-
ure 6.6. Notice bo th oscillator and f i l ter are in i t ia ted by the trigger (impulse gen-
erator) and the outputs of these are mul t ip l ied together. Heterodyning is normally 
applied to sinusoids and creates sidebands in the frequency domain. Lowpass fi l ters 
are required to remove the sidebands generated by the sum of the two input sinu-
soids. I n our case this f ina l f i l ter is not required. I n the l i terature [76], Linggard 
mentioned a speech synthesiser bu i l t by Lawrence in 1953. Th i s approach m u l t i -
plied the envelope by a carrier sinusoidal generator and the resulting product was 
fur ther mul t ip l i ed by a sinusoidal generator operating at the carrier frequency plus 
the required formant frequency. The output of each formant-envelope product was 
lowpass f i l tered to remove the high frequency sidebands caused by heterodyning, 
and the resultant was summed to create a voice. A n alternative, similar to my 
approach, is known as the I C L synthesiser implemented in 1976 by Underwood and 









Figure 6.6: Heterodyned version of the F O F A l g o r i t h m 
6.2.4 Full Hanning Window F O F envelope 
The next approach I experimented w i t h was when the exponential decay is m u l t i -
pl ied by a Hanning window, so ip(t) equals the fol lowing :-
£(1 - c o s ( ^ ) ) e - a t O<t<20 
0 elsewhere 
(6.5) 
The z-transform of equation 6.5 was originally derived through empirical means 
and can be solved as an extension to the z-transform of a Hanning window (wi thou t 
6. Decaying Sinusoidal Additive Synthesis 121 
exponential decay). The empirical derived equation was found to be as described 
by equation 6.6. 
H(z) = 
e - ° T . ( l - c o s ( ^ ) ) ^ - 1 + e - Q T * z - 2 - e-2°0TMz-2P-l _ e - 2 a / 3 T S e - C v T i z - 2 / 3 - 2 ) 
2(1 - e-aT'z-*)(\ - 2 e - Q r * c o s ( | ) z - 1 + e - 2 f t T ' z - 2 ) 
(6.6) 
Figure 6.7 shows two envelope responses and zero-pole plots. The reader should 
note that again the t ime domain responses have been scaled using equation 6.4. 
(a) P = 500 and zero bandwidth (b) P = 10 and zero bandwidth 
9 
i -0 2 
(c) (3 = 500 and 100 Hz bandwidth (d) P = 10 and 1000 Hz bandwidth 
Figure 6.7: T i m e domain and z-plane responses to Full Hanning F O F 
As before w i t h the F O F fi l ter version, this structure w i l l use a large amount of stor-
age to func t ion . Despite this, there are only 4 non-zero elements in the numerator 
( F I R f i l t e r ) . One ma jo r disadvantage occurs when is very small, which results i n 
a gl i tch . So the o p t i m u m setting of (3 is sufficiently long to allow the exponential to 
decay. 
6. Decaying Sinusoidal Additive Synthesis 122 
6.2.5 F O F filter structure with oscillation 
I n this section, I w i l l propose a f i l ter which removes the need for an attack parameter. 
Th i s approach is used because the attack parameter is normally a constant for most 
synthesis algori thms involving FOF. The wavefunction chosen was the 
t2e~at sm{u0t) 
because i t has an attack caused by the t 2 element which eventually falls to zero 
because the exponential decay predominates as t ime progresses. In i t i a l l y a col-
league [103] proposed a version which used too many delay elements. I commented 
on this i n his paper i n 1996 [128]; but subsequently managed to reduce the num-
ber of delay elements by directly computing the z-transform using the Laplace to z 
technique (section A ) . 
Equat ion 6.7 is the z-transform of t2e~at s'm(tJot). Note there is an addi t ional scaling 
factor of s i n ( ^ ) which should be added to the z domain transfer func t ion . 
Js 
e~az-1 + 2 c o s ( ^ ) e - 2 a z - 2 - 6 e - 3 a 2 " 3 + 2 c o s ^ l e " 4 ^ - 4 + e - 5 a z ~ 5 
H(z) = f s h 
(1 - 2 e - « c o s ( ^ ) * - 1 + e - 2 " z - 2 ) 3 
(6.7) 
This f i l t e r topology requires a scaling factor which is crudely equal to 
4e~ 2 
a2 
using the s tar t ing equation of t2e~al. The t ime domain response to this f i l ter is 
shown i n figure 6.8. 
6. Decaying Sinusoidal Additive Synthesis 123 
1000 1500 2 0 0 0 
S a m p l e Intervals 
3 0 0 0 3 5 0 0 
Figure 6,8: T ime Domain response of the i2e n t sin(wo^) wavefuhction 
The zero-.pble diagram for a frequency of 480 H Z and a bandwid th of 250 Hz is 
shown i n f igure 6:9. 
Real part 
Figure 6.9: z-plane response to I. a n t s'm(u>at) wavefunction 
6. Decaying Sinusoidal Additive Synthesis 124 
I n the z-plane p lo t of figure 6.9 I would like to point out that there are four zeros. 
Each pair is almost the reflection about the uni t circle of each other. Notice that 
the zeros outside the uni t circle may cause problems when integer ar i thmetic is used 
i n the filter implementat ion. This may be solved by factorising the zeros. 
The frequency and phase spectrums of this wavefunction w i t h narrow and wide 









a 40 c 










(a) f0 = 1250 Hz and 5 Hz bandwidth (b) Jo = 1250 Hz and 250 Hz bandwidth 
Figure 6.10: Magni tude and Phase Spectra for the t e sin(wo^) wavefunction 
The responses shown above are not normalised and the magnitude responses are 
very similar to the normal second order bandpass f i l ter sections. The filter topology 
is approximately three times the computat ional cost of a second order filter, bu t 
this extra cost results in a t 2 shaped attack. 
6.2.6 Miscellaneous Wavefunction Designs 
I n the penul t imate section of this chapter, I w i l l describe some structures which can 
generate F O F style wavefunctions. 
1. The standard FOF wavefunction as described by equation 2.9 can be expanded 
into the fol lowing fo rm. 
^(*) = \e~Qi sin(w 0<) 
6. Decaying Sinusoidal Additive Synthesis 125 
- 1-e-atsm(u0t±j) 
Where r varies f r o m 0 to /3 and then remains constant. I t is easy to prove 
that when T is equal to /3, the equation above becomes the famil iar decaying 
sinusoid. I f r is allowed to decrease to zero f r o m (3 at some later t ime, then 
the F O F as implemented i n CSOUND results. I explored this approach but 
i t was not accurate enough. I t was possible to hear the difference. 
2. I n the l i terature [115], Rodet mentioned an envelope of the fo rm e - a t ^ ^ — ( 
where c is a constant. Consequently I proposed the fol lowing envelope :-
This envelope is the max plank thermal d i s t r ibu t ion for black body radia t ion 
and has a peak at a = 2.821. Those w i t h knowledge of Quantum Mechanics 
w i l l realise that the equation is a closed f o r m expression for an inf in i te series. 
Consequently, bo th waveforms are not implementable as a filter. 
3. Mi l l e r Puckette has a patent [112] on a similar structure using mult ipl icat ions 
and sinusoidal modulat ion to generate the fol lowing 
A l l these implementations are unsuitable for filter implementat ion due to inf in i te 
series or other problems. They can be implemented using look-up tables, additions 
and mult ipl icat ions and are ideal for implementation on general purpose computers 
or DSPs. 
6.3 Approaches to formant synthesis 
This chapter has concentrated on the various implementations of formants i n the 
t ime domain. They can either be implemented i n the time domain, using lookup 
tables or as the impulse response of a set of filters. 
I n this thesis, I proposed the t ime squared exponential decaying sinusoid wavefunc-
t ion as a replacement for FOF because i t can be implemented as a filter. This allows 
an elegant and uncluttered implementation wi thou t mul t ip l ica t ion and generation 
of the sinusoid. However, a filter implemented in integer ar i thmetic has a reduced 
number of allowable bandwidths and frequencies caused by the quantisation of the 
at 
oo 
6 *(*)= E 
oo 
6. Decaying Sinusoidal Additive Synthesis 126 
coefficients. This is thought to be of minor consequence because the human audio 
system prefers r ich evolving textures rather than precise static sounds. 
This chapter w i l l conclude w i t h a table showing the relative cost of the F O F and 
f u l l Hanning F O F envelopes and the t ime squared decaying sinusoid f i l te r imple-
mentations. A l l filters in this analysis are implemented in a cascaded direct f o r m 
structure and any gain factors are ignored. I n this analysis, the actual coefficient 
values are assumed to be automatical ly generated by a host processor w i t h sufficient 
computat ional power. 
Structure Mul t ip l ica t ions Adds/Subtracts Delays Sections 
F O F 5 5 0 + 5 4 
Hanning F O F 7 7 2/3 + 5 3 
T i m e Squared 11 10 11 4 
Table 6.1: Comparisons between Formant Fi l ter Topologies 
B o t h the F O F and Hanning window FOF filters are comparable in computat ion and 
storage requirements, but they need either a second order sinusoid generator or a 
lookup table to shif t the envelope around the formant frequency. The disadvantage 
w i t h these envelope based fil ters is that the (5 or window parameter is quantised 
in uni ts of the sample period. I t is possible to create all-pass fi l ters to generate 
non-integer d ig i ta l t ime delays [72], but the computat ional cost becomes excessive 
in this application. The surprise result f r o m looking at table 6.1 is that the T i m e 
Squared filter uses the most ar i thmetic resources but the least storage requirements. 
A l l these filter topologies do not have an envelope to t idy up the exponential decay. 
This is unlike the C S O U N D and C H A N T implementations and is deemed unneces-
sary because of increased computat ional and storage costs. A compromise solut ion 
would be to truncate the response after three t ime constants, since the decay would 
be w i t h i n 0.1 % of its f ina l value. This would only happen i f the impulse generator's 
pulse rate were sufficiently slow, so that no overlapping of responses could occur 
w i t h i n the f i l t e r bank. A counter would be required to orchestrate this procedure. 
I n the next chapter, the t ime squared sinusoidal decaying oscillator w i l l be extended 
to include the source implementat ion of this source-filter topology. 
C H A P T E R 7 
Parametric Additive Synthesis 
Implementation 
I n the previous chapter the reader was guided through the different implementa-
tions of damped sinusoidal oscillators. By providing damping, a spread of harmon-
ics appears in the spectrum, instead of a single harmonic (w i th zero bandwid th ) . 
As most sound generating devices have bu i l t - i n damping, i t seems reasonable to 
t ry to model this using wavefunctions having this property. The advantage, when 
compared w i t h Fourier-based Addi t ive Synthesis using sinusoids, is the reduced 
number of oscillators required to model the spectra. Consequently sinusoidal based 
synthesis techniques require advanced algorithms, either F F T based [34] or mul-
t i ra te [104, 106, 108] i n order to be cost effective because of the immense com-
puta t ional burden of generating many thousands of sinusoids. I n comparison the 
alternative, f ixed sample rate approaches [10, 58, 86, 127] can only produce a few 
hundred. Formant based systems compute decaying sinusoids and thus the num-
ber of oscillators is reduced to a m i n i m u m of one per gross formant region of the 
spectrum to be modelled. 
This chapter extends my work done on implementing a decaying sinusoid oscillator 
as a f i l ter which has a f ixed attack envelope. The first section w i l l discuss imple-
mentable details of the t2e~at sin(wot) f i l ter . The remaining sections w i l l deal w i t h 
the exci tat ion generator that drives the fil ters and also the coalit ion of the separate 
fi l ters for audio output . The structures to be presented w i l l be for the implementa-
t ion of a single formant . For modelling of formant based instruments, e.g. v io l in , 
gui tar and the human voice, a parallel collection of forrnants is needed. 
127 
7. Parametric Additive Synthesis Implementation 128 
7.1 Optimisation of the filter t2e atsin(wot) 
The f i l te r s tructure as shown by the z transfer func t ion of equation 6.7 consists of a 
transversal (F IR) and three second order I I R fil ters implemented in cascade configu-
ra t ion because the parallel fo rm has poor zero sensitivity and scaling problems [53]. 
7.1.1 Analysis of the Transversal Element of the Filter Topology 
The numerator ( F I R part) of the transfer func t ion gives rise to four zeros which are 
configured in pairs related by the reflection around the uni t circle and which lie on 
the real axis. I n tests using M A T L A B , the numerator as described by equation 6.7 
does not precisely give the correct zeros as mentioned above. I t seemed reason-
able to assume that these errors occurred due to the conversion between symbolic 
and numerical f o rm . This statement was proved to be correct by investigating the 
different ou tpu t numerical formats of M A T L A B (normal and rat ional) . 
For each pair of zeros, the fol lowing factors arise. 
N(z) = z2 + £z + l (7.1) 
I f |£| > 2, two real zeros satisfy symmetry w i t h z\ = r and ^2 = f <md equation 7.1 
can be expressed as follows :-
N{z) = z2 + (r + ^ z + 1 
I f |£ | < 2, then two complex conjugate pairs result and N(z) becomes :-
N{z) = z2 + 2cos(x)z + 1 
i f |£ | = 2, then there are two zeros either a t z = l o r z = — 1 . 
For four zeros, an expression for N(z) becomes 
N{z) = z4 + £z3 + 7 z 2 + iz + 1 
I f £ < 4(7 — 2) and 7 > 2, the expression can be factored as shown below. 
N{z) = zA 
+ 
2 r 2 + 1 
cos(rc) 




r2 + \ + 4 cos 2 (a;) 
2r2 + 1 
cos(x) 
This equation can be factorised whenever a zero or more lie on the real axis or on 
the uni t circle. 
7.1.2 Transversal Filter Structure 
The F I R part of the formant oscillator as mentioned in section 7.1.1, can be fac-
torised according to the real coefficients in N(z). As these zeros are always on 
the real axis, their values are always the reciprocals of each other. This is because 
the filter can never be allowed to have zero bandwid th because i t would then be-
come unstable, growing in ampli tude wi thout bounds. I n tests, the bandwid th w i l l 
always be a small f rac t ion of the sampling rate, so the leftmost zero w i l l appear 
w i t h i n z = —10. However, at the time of w r i t i n g , the designer has not been able 
to factorise and s impl i fy the F I R structure. The hopeful benefits would be reduced 
computat ion of the coefficients on the host processor. 
A F I R filter can be implemented in direct f o r m by investigating the figure 7.1, which 
is i n flow graph fo rm. The z~l arms represents delays, the + below nodes represents 
add/subtract elements and the h(0 . . . 5) represent the mult ipl icat ions. 
In(n) 
• i -1 
h(0) h(l) h(2) h(3) h(4) 
Out(n) 
Figure 7.1: Flowgraph of Direct Form F I R Fi l ter Structure 
The filter topology chosen was the direct f o r m as shown i n figure 7.1. However, 
alternative forms are possible [102], by expressing the filter as a cascade of second 
7. Parametric Additive Synthesis Implementation 130 
order sections or i n f o u r t h order f o r m w i t h zeros at re±3& and \e±3b'. The latter 
approach is sensitive to quantisation and maintains the linear phase properties, bu t 
w i t h an increase in mul t ip l ica t ion cost. 
7.1.3 I I R Filter Structure 
The denominator of equation 6.7 is represented by three identical I I R filters. These 
filters have the direct f o r m similar to figure 5.2 but the —1 gain factor on the y ( n — 2) 
ou tpu t becomes — e~2a and the other gain factor has an addi t ional factor mul t ip l i ed 
to i t and equal to e~°. A n undesirable problem occurs when the oscillation frequency 
is very small; the three complex conjugate poles become two pairs reflected about 
the un i t circle and two real poles centred around z = 1. 
There has been a large amount of informat ion published on f i l te r topologies [16, 19, 
99, 154] and their analysis. The topology of the f i l ter imparts impor tant benefits 
to the designer, such as coefficient sensitivity and noise immunity . I n [87] various 
filter forms were analysed for the equalisation of d igi ta l audio. The direct canonical 
filters were found to have poor resolution of poles near the origin when the coef-
ficients are quantised. The structure had poles densely packed at high frequency 
bu t had the advantage of the m i n i m u m number of delays. Ladder and latt ice fil-
ter structures have good coefficient sensitivity and roundoff noise performance and 
are the d ig i ta l equivalent to their analog counterparts. I n [84], Massie comments 
on the decoupling of bandwid th and tun ing coefficients. These fil ters are found in 
L P C synthesisers and in a musical instrument bu i l t by E - M u . They are normally 
cascaded to represent formant regions and are essentially a series formant synthe-
siser. I n a fu r the r paper [23] a method of control l ing bandwid th and frequency 
was proposed so that the frequencies would be logari thmically swept. This allowed 
musically useful applications of the f i l te r to sound transformation. 
Work done by [3, 31] on d ig i ta l sinusoidal oscillators might be adaptable to musical 
purposes, i f the structures could be implemented w i t h decay (bandwidth) properties. 
This would be of benefit i n producing u l t r a low frequency formant regions. 
The approach I took was to use the Gold-Rader coupled f o r m [113] because the 
wavefunction requires a wide range of frequencies and bandwidths for musical gen-
eration and because this f i l te r structure quantises the poles on a u n i f o r m mesh 
around the un i t circle. The work done by Kingsbury [67, 68] generalised the cou-
pled f o r m approach for bo th poles and zeros near the real axis and could be useful i f 
the F I R par t of the filter topology were to be included w i t h some of the I I R fi l ters . 
The figure 7.2 shows a schematic diagram of the coupled f o r m filter w i t h a constant 
7. Parametric Additive Synthesis Implementation 131 
gain of i?sin(0) and two complex conjugate poles. 
[n]—>( + )— * ( Y |—T — [ > — * 0 — H J I 






Figure 7.2: Second Order Coupled I IR filter Form 
7.2 Source Excitation Driver 
In this section I wil l discuss the design and implementation of the impulse genera-
tor. This generator then drives the filter bank, as described earlier in this chapter. 
There are many ways of generating impulses (pulse trains) and in this thesis two 
designs wil l be presented. The first one is based on discrete summation formulae 
to approximate a pulse train with resolution finer than the sample rate. The latter 








Figure 7.3: Simple Impulse Generator Schematic 
The obvious method of implementing a discrete-time version of an impulse train is 
to approximate i t by a unit-sample pulse-train. This can be easily implemented by 
using a phase accumulator. The most significant bit of the accumulator is designed 
7. Parametric Additive Synthesis Implementation 132 
to flip at the repetition rate. This line then becomes a trigger to drive a multi-
plexor and a latch. The multiplexor selects zero or some amplitude A in the chosen 
wordlength and number format. This can be clearly seen in figure 7.3 However, 
with / being the repetition frequency, the period in samples would be Fs/f and the 
expression must be an integer. I f the expression is not exactly an integer, the pulse 
rate wil l j i t ter around the nearest sample interval. The human pitch perception is 
so sensitive that the ear can detect this jitter, which corresponds to a tenth of a 
percent of the sampling rate. How can the designer overcome this dilemma ? 
The Fourier transform of a pulse (strictly speaking a Dirac delta function), is the 
sine function 
s i n ( ^ ) 
Ts f -
77 
Where the sine function is defined to be unity when / is zero. The function has 
many zeros related by ±3p*, with n > 1, as shown in the normalised amplitude 
against normalised frequency diagram 7.4 below. 
8 0.4 
-0.015 -0.01 -0.005 0 0.005 
Normalised Frequency 
0.01 0.015 
Figure 7.4: Amplitude vs Frequency for the sine function 
The figure 7.4 should really be a magnitude spectrum and the diagram clearly 
shows the zero crossings. The figure shows a great deal of energy above the first 
7. Parametric Additive Synthesis Implementation 133 
zero causing aliasing of all higher harmonics. When a pulse train is generated, the 
impulses appear as lines within the sine spectra. The lines are separated by y", 
where T is the period between pulses. To allow equal excitation of any filter set to 
any formant frequency, the first zero, at jj? must be well above the highest formant 
frequency of the system. 
Therefore, to allow pulses with finer than the sampling period resolution, the pulses 
must be resampled at a higher sampling rate. One approach is to describe the effect 
of a higher sampling rate pulse on a lower sampling rate. This is essentially an 
impulse which is low pass filtered. Low pass filtering has a sine impulse response 
function and can be band-limited to avoid aliasing in the frequency domain. The 
low pass filter acts as an interpolator and is the digital analog of a plane wave 
incident on a circular aperture in optics. 
7.2.1 Discrete S u m m a t i o n Formulas for Impulse Generat ion 
In [129], Stilson and Smith presented a sum of sine functions represented by 
^ sin(7r(n + /• P)) 
V [ n ) = , £ „{n + l.p) 
i = - o o v ' 
where P = ij- and is not an integer and by using discrete summation formulae y(n) 
becomes 
»(»»= S # y (7-2) 
where M is the number of harmonics and is always odd because the impulse train 
has one harmonic at DC and an even number of non-zero harmonics. Thus 
M = 2 + 1 
Hence M is the largest odd integer not exceeding the period P in samples. 
Formant generation can be achieved by using discrete summation formulas as pro-
posed by [95] hinges on the closed form for summing sinusoids as shown in equa-
tion 7.3 with a < 1. 
Taksm(0+k3) =
 sinW ~ ~ P) ~ I s i n ( g + ( N + W ~ "M6 + N f i ) } 
^ 1 -2acos(/3) + a 2 
(7.3) 
7. Parametric Additive Synthesis Implementation 134 
Notice that the right hand side of equation 7.3 has variable amplitude controlled by 
a. The normalisation factor for expression 7.3 is 
The normalisation factor becomes erroneous when any harmonics from other sum-
mation formulae fall upon the computed formulas, overlapping results giving an 
incorrect amplitude factor. I suggest that the reader consults the relevant arti-
cle [95] to obtain the solution. The summation of many sinusoids of zero phase 
relative to each other wil l result in the approximation of an ideal unit impulse. 
Both the approaches mentioned in this section compute functions directly in the time 
domain and thus require an expensive (for integer) division circuit. Relating these 
algorithms to the spectral based parametric additive synthesis technique proposed in 
this thesis results in a rather expensive scheme to implement octaviation, requiring 
multiplication. As mentioned earlier, multiplication is currently more expensive to 
implement in silicon than addition as it is performed from multiple addition circuits 
In the next section, a multi-rate technique will be proposed which can be made very 
computationally efficient. 
7.2.2 Impulse generation using Dec imat ion 
In section 7.2, the algorithm requires accurate control of the impulse rate. In 
CSOUND the FOF algorithm requires accurate control of the impulse generator 
such that if the impulse falls within a sample period, the wavefunction is advanced 
by the time difference between the impulse firing and the next sample interval. Fail-
ure to take this into account would result in poor quality synthesis. Therefore the 
impulses are generated at a high sampling rate and the output is resampled at a 
lower rate, which is the audio sampling rate. 
The impulse generator uses the phase accumulator with a single-bit output. This bit 
latches a register which holds the amplitude of the impulse if the bit is one, otherwise 
it outputs zero, as shown in figure 7.3. Using this simple scheme, octaviation and 
simple filter scaling can be achieved. The output is then passed through a decimator. 
7.2.2.1 The Decimator Structure 
1 - a 
2N+2 l - a 
in VLSI. 
A decimator is conceptually a device which drops every M — 1 samples from the 
high sample rate. Essentially it acts as a switch operating at the low sampling rate 
7. Parametric Additive Synthesis Implementation 135 
and provides a decrease in sampling rate by a factor of M. However, the output 
of the switch requires band-limiting because the information is passable at the high 
sample rate, but when resampled at the lower rate would cause aliasing. 
By analysing the FOF unit generator in CSOUND, the effective sample rate required 
to provide adequate temporal resolution was found to be the following. 
_ „ „ 2 2 4 • fundamental 
effective SR = —-—— -
fundphs • fs 
The equation arising from CSOUND has the value 2 2 4 in the denominator. This 
value is the maximum length of a 24 bit phase accumulator as implemented in 
CSOUND. Consequently fundphs varies from 0 to 2 2 4 - 1 incl usive. I t turns out 
that the worst case sampling rate is when fundphs is equal to 1. 
Where fundphs = 1 and fundamental = 7KHz results in an effective sample rate of 
2.7MHz. This rate corresponds to a time scale of 360ns. The decimation factor is 
approximately 2 6 times the original sample rate of 44.1KHz. The smallest increment 
generated by the impulse generator using a 24 bit phase accumulator is found to be 
equal to 0.168Hz. By approximating the interpolation factor to be 2 6 results in a 
maximum excitation frequency of 7418.86Hz. 
7.2.2.2 Decimator Design 
The first approach is to implement a decimator which removes every 64 samples and 
the resulting rate output drives a FIR lowpass filter. Using the Remez-exchange 
algorithm and multi-rate design equations in [53] and allowing 0.01 dB ripple in the 
passband and - 72 dB rejection in the stop band produces a finite impulse response 
filter wi th a length of 4953. This filter is not implementable with the Remez-
exchange algorithm and so a multiple stage decimator approach was investigated. 
I t can be shown that one big filter and decimator can be split into a cascade of 
smaller, cheaper and easier to design sections. These sub-sections have the property 
of conforming to the original specification. I t was discovered that the optimum 
multi-stage implementation using source code available on disk [53], was in two 
stages. 
The first stage decimator has a decimation factor of 16 and has 81 coefficients, the 
second stage has a decimation factor of 4 and requires 329 coefficients. The cascade 
of both filters wil l meet the required specifications. The actual filter coefficients 
were generated using the signal processing toolkit of M A T L A B . 
By using powers of two for decimation ratios, a compact and efficient decimator can 
be designed [32]. However, the current implementation has excessive latency caused 
7. Parametric Additive Synthesis Implementation 136 
by the FIR filter coefficients and excess computation caused by dropping every M—l 
samples from the stream. I t is possible to coalesce both filter and commutator to 
form an efficient structure known as a polyphase filter. Essentially the polyphase 
filter reschedules the data flow to only compute the M ' t h samples which appeal' at 
the low sampling rate output. Therefore, the filter becomes a time varying filter 
because the coefficients are changed on a sample by sample basis. Thus the original 
filter becomes a series of parallel filters, each one computing once per sample. The 
minor disadvantage of this approach is that the filter length must be a multiple of 
the decimation factor. 
Since the filter is decomposed into D sub-filters, each of length K, K can be deter-
mined by the following formula, with D as the decimation factor and M is now the 
length of the original FIR filter. 
K = M_ 
~D 
(7.4) 
The filters designed earlier have new lengths of 96 and 332 respectively, by using 
equation 7.4. This gives a sample delay of 6 and 83, so a total delay of 89 samples, 
instead of 428 samples. However, the structure still requires 428 coefficients to be 
stored. I t should be noted that the filter coefficients are symmetrical, so using 
suitable address logic, the coefficient storage can be reduced by a half. 
An example of a reduced sample delay polyphase filter with time-varying coefficients 
is shown diagrammatically in 7.5 with M = 12, D — 4 and K = 3. The little arrows 
shaped in a loop determine the coefficients which wil l be used for the multiplication 











-1 -1 Out(n) 
Figure 7.5: Reduced Dynamic Memory Storage Polyphase Filter 
7. Parametric Additive Synthesis Implementation 137 
The filter topology shown in figure 7.5 in flow graph form is the standard FIR filter 
(see figure 7.1) having undergone certain transformations as described in [32]. 
I t would be possible to implement the decimator using Infinite Impulse Response 
filters, but apart from the design complexity required to meet the specification, 
they are not linear-phase. I believe that the polyphase approach allows the FIR 
filter to compete well with an I IR filter structure in terms of storage and number 
of arithmetic operations per second. 
7.2.2.3 Reduced Coefficient Storage Design 
As mentioned in the previous section on decimator design, the coefficient storage 
required for each polyphase filter can be reduced due to the coefficients being sym-
metrical. Taking the coefficients of a normal FIR and mapping them with the 
polyphase filter-bank equations, it is possible to determine the coefficient symme-
tries. 
Pk(n) = h{k + n • D) < 
k = 0 , 1 , . . . , D-l 
n = 0 , 1 , . . . ,K - 1 
^ D 
(7.5) 
Using the equations 7.5 shown above, it can clearly be seen how many filter banks 
are required for the decimation process and the relationship between the polyphase 
filter's coefficients and the original filter's coefficients h{n). For the D = 16 and 
M = 96 case we get the following impulse response structure for each sub-filter :-
15 31 47 32 16 0 
0 16 32 47 31 15 
1 17 33 46 30 14 
2 18 34 45 29 13 
3 19 35 44 28 12 
13 29 45 34 18 2 
14 30 46 33 17 1 
The numbers above represent the coefficient index relative to the original FIR filter. 
There are only seven banks of six coefficients which are unique and in the table 
the coefficients and delays are read from left to right, starting at a delay of z~ 6. 
The symmetry of coefficients can thus be read right to left (forwards) or left to 
7. Parametric Additive Synthesis Implementation 138 
right (backwards) depending on the sample count modulo sixteen. This can be 
implemented using a R A M address selector, the most significant bits determining 
the bank and the least significant bits controlling the coefficient storage element. 
The remaining indices which have no elements stored in them are neglected. Thus 
a table lookup implemented in radix 2 can be used and is thus efficient in binary 
based VLSI. 
This novel approach to coefficient storage reduction, which I formulated, has not 
been reported in the literature. 
7.3 Synthesiser Structure 
In this final section, the entire formant based additive synthesiser is presented. But 
first there will be an interlude section on the importance of phase in audio systems. 
7.3.1 T h e importance of Phase in Audio 
The ear is phase sensitive at low frequencies (< 500 Hz) as described in [93]. So the 
formant wavefunction presented in this thesis would be detectable to a minority of 
people as the shape is approximately the time reversed of the real glottal pulse shape. 
However, this perceived disadvantage can be made into an advantage, especially for 
creating new timbres. 
Work done by Avery and Julius [142] created a framework of generating FIR (linear 
phase) filters using I IR filters and truncating the I IR filters' response. They achieved 
this by using a pole-zero cancellation scheme. To get linear phase they created 
time reversed truncated I IR filters which allowed magnitude squared filter design. 
However, these filters reduced the arithmetic requirements of an equivalent filter, 
but storage remained the same. Also the need to make unstable time-reversed 
filters stable was required. The purpose of this technique was to generate efficient 
FIR filters for audio analysis and equalisation problems. In these problems, the 
system designer does not want unnecessary coloration to occur during analysis or 
equalisation and thus linear phase is of paramount importance. 
In synthesis of waveforms, as is the case in this thesis, phase is important for creating 
filters to generate particular impulse responses. However, these filters can not have 
linear phase because the wavefunctions generated are asymmetrical. Adding many 
formant regions together may require a structure which supports linear phase accu-
mulation. In relation to FOF wavetable based synthesis, the skirt width parameter 
has a very minor use despite current thinking. The addition of many wavefunctions 
7. Parametric Additive Synthesis Implementation 139 
together uses an initial phase fix which is very counter-intuitive and abstract in 
construction. 
7.3.2 Para l l e l Formant Mode l 
I t is well known that instability is caused when filter coefficients are altered whilst 
the filter is still active. To remedy this problem, I implemented a minimum of 
4 filter/oscillator combinations per FOF oscillator. Each of these combinations 
represents an equivalent FOF wavefunction, as found in CSOUND. 
The ful l synthesiser structure is shown in figure 7.6 and uses two impulse generators 
(one creates pulses twice as fast as the other) and an adder cell (subtracter) to 
implement octaviation. As in all formant based synthesisers, the filter-banks are 
added together in alternate signs in order to avoid nulls in the entire spectrum. Each 
filter has separate amplitude/gain controls, which are not shown in the diagram. 
Finally, the output needs to be stored in floating point, or it requires some sort of 
dynamic compression [88] to restrict the dynamic range. This mirrors the use of 
companders for voice and audio to ' f i t ' onto analogue tapes. 
7. Parametric Additive Synthesis Implementation 140 
0 















7. Parametric Additive Synthesis Implementation 141 
My novel and elegant approach to octaviation is ideally suited to VLSI and is here 
described. When implementing octaviation the two impulse generators require the 
same frequency values but the phase values must differ by a factor of two. Over 
time the amplitude 2 value in figure 7.6 is increased in a two's complement sense. In 
this way the scaled impulse alternates between ful l value and progressively smaller 
values when the impulse streams coming out of the impulse generators coincide and 
diverge at the output of the adder. When the impulse train appears to be operating 
at half the original frequency, the process can be repeated by resetting the phase 
values to the new lower rate with the same factor of two relationship. I f octaviation 
is not required, amplitude 2 value is set to zero and phase 2 and frequency 2 values 
are ignored. 
The two stage decimator as shown in figure 7.6 rejects every 16 samples. The output 
is sent through the first lowpass filter. The 4th sample of this data stream then 
passes through the second lowpass filter. In reality the decimator uses a polyphase 
filter design as described earlier, see figure 7.5. Using my reduced storage scheme 
based on the symmetry of the filter's response, the number of storage elements can 
be further reduced by another factor of two. The increase in control circuitry is only 
marginal, but I have produced a great reduction in memory storage requirements. 
My method saves memory usage without changing the latency of the polyphase 
filter structure. 
Instead of updating the filter coefficients on a sample-by-sample basis, I chose to 
follow in the footsteps of Rodet [116] and freeze the coefficients on initialisation and 
disallow any attempt to alter them whilst the filter was computing the wavefunction. 
This approach requires a counter to signal the system when a wavefunction is com-
plete. I f another impulse arrives to the filter whilst i t is generating a wavefunction, 
updating the coefficients must be halted. 
The system thus requires multiple filter-banks so coefficient updates can occur and 
provide variation in timbre. However, too many per formant would be a burden on 
VLSI resources and too few would reduce the flexibility of the synthesis technique. I 
decided that four filter banks per formant was the minimum resource necessary both 
to keep VLSI costs down and to maximise the flexibility of the synthesis technique. 
This number was calculated from the number of overlaps required in CSOUND for 
a fundamental frequency (pitch) of around 440 Hz. 
Each filter would be composed of three I IR filters in a Gold-Rader coupled form 
structure and a FIR filter combined together in series. The updating mechanism 
for a bank of four filters to describe one formant region will be discussed in the 
7. Parametric Additive Synthesis Implementation 142 
next chapter. This structure, coupled with the update mechanism, has never been 
reported in the literature for a synthesis engine. 
In speech synthesis, the number of controllable formants is five, so the minimum 
number of formants for this synthesis technique should be the same. However, an 
extra two would provide the musician with extra flexibility. The total number of 
filter-banks then required would then be 28. In a polyphonic voiced instrument 
each voice would then require 28 filter-banks all connected to a common impulse 
generator and decimator. 
This approach wisely constrains the computational cost to a precise figure. I f an 
algorithm requires increasing computation and storage requirements and operates 
in finite time, real-time audio synthesis would eventually become impossible. My 
approach solves this problem but unfortunately requires approximately six times 
the number of filters compared with old parallel formant synthesisers. Currently, 
VLSI design is nowhere near the level of integration required for this type of model 
based synthesis technique. 
C H A P T E R 8 
Algorithm Scheduling and 
Miscellaneous Topics 
In this chapter, I wil l discuss the "missing" sections of the proposed algorithm. 
This chapter consists of four subsections, the first being algorithm scheduling of the 
filterbanks. The middle section wil l deal with the parts of the algorithm which are 
simple to implement, such as ADSR amplitude envelopes. A description of the user 
interface design will follow and I will end with hardware/software segmentation. 
8.1 Algorithm Scheduling 
The i? exp(—at) sin(woi) generator as shown earlier in this thesis can be composed 
of a cascade of 1st and 2nd order filter sections, whose coefficients control formant 
frequency and bandwidth of the particular formant. The work by Rodet [116] on 
CHANT mentioned that a filter based structure using 2nd order bandpass filters 
was constructed to operate on the Samson's Box at CCRMA. This implementation 
requires the filter coefficients to be linearly interpolated to alter the bandwidth and 
formant frequency and may cause problems in filters due to instability in the poles 
of the structure. The approach I [128] took uses multiple filters driven by a common 
excitation source plus structures to control each filter's execution. This approach 
fails when the excitation rate is large and the available filters are small 1 . The 
chosen solution is to stop the updating of the new coefficients and to utilise the 
autoregressive nature of the filters to perform overlapping waveforms automatically. 
'This will in practice always be true. 
143 
8. Algorithm Scheduling and Miscellaneous Topics 144 
On the grounds that the signal would already be evolving, this simple mechanism 
avoids the problem which plagues wavetable based FOF synthesis, by allowing op-
erations to be performed in finite time and with constant computational resources. 
Two problems arise with this approach. The first being the increased computational 
overhead required; this is solved by implementing the algorithm in silicon. The sec-
ond problem is the control of each filter's state. This then becomes a scheduling 
problem. 
8.1.1 Contro l ler A lgor i thm Overv iew 
Each filterbank crudely represents one formant of a particular range of frequencies 
and bandwidths. This approach seems very expensive in terms of resources, but, 
provided the designer chooses the number of niters in each filterbank as a compro-
mise between physical resources and musical freedom, it wil l result in a useable and 
implementable system. For the sake of brevity, say the number of sub filters in each 
filterbank is four. The algorithm as it stands has no information on how to control 
these niters. 
Clearly, some form of memory is required to keep track of each sub-filter's status. 
The status flag indicates the filter's availability, being in one of two states; Id le and 
Running. 
Initially all sub-filters are in the idle state , and as soon as one filter receives an 
excitation impulse from the impulse generator, the coefficients are loaded and the 
filter starts running. How do we know when the filter has decayed to zero, assuming 
no new excitation impulse arrives ? The observant reader will remember that all 
the practical filter structures investigated were ones which had no truncation in the 
impulse response, and thus relied on the bandwidth parameter to decay to zero in 
three time constants. I t is possible to track the data stream output from the filter 
and decide when it is zero even though the output is oscillatory. However, this 
approach adds more expensive arithmetical operations and a simpler and overesti-
mating approach does exist which involves a decrementing counter. When the filter 
is triggered, the counter is initialised with a time value related to three time con-
stants at the chosen sampling rate. Then every time a new sample period occurs, 
the counter is decremented by one. When the counter reaches zero, the sample is 
computed normally and the filter is zero initialised and the idle flag is set as shown 
in figure 8.1. Counters are easy to design in silicon and are extremely compact. 





Underflow A Detection Iille/Running Flag 
Figure 8.1: Block Diagram of the controller for idle/running status 
8.1.2 F i l t e r b a n k Scheduler Control ler 
Each niter's status can thus be expressed as a single bit being either zero (idle) or 
one (running). Thus for one filterbank a nibble of memory and four counters are 
required. I t was initially thought that the nibble's bits could be made to perform 
a cyclic shift and each filter would be triggered by a new excitation, regardless of 
whether new coefficients were required. Thus the nibble represented the newest 
filter running. However, if the bandwidths of the filters vary widely, the linear 
sequence wil l become wasteful of resources because some filters wil l remain idle for 
a long time. Also this approach limits the choice of scheduling algorithm. A better 
approach is to use a linked list structure and a search method to find idle filters 
(resource slots). 
Linked lists have been used in at least two musical applications. The first being 
a sound studio workstation for Lucasfilm [96] and more recently in a multirate 
additive synthesis engine [108]. The latter approach used linked lists to group 
sinusoids together, whilst the former allowed data of a transient nature to bypass a 
queuing system and be updated between samples. 
In the present application, as the number of elements in the linked list is very small, 
the implementation requires deletion and insertion of elements at the head of the 
list. The list acts as an input queue but elements are flushed asynchronously by the 
decrementing counter. This is different from both the Lucasfilm Audio Processor 
and the Multirate Additive Synthesis Engine. This new approach allows the system 
always to know which filter element is most recently triggered by accessing the top 
element of the list. This removes the need to search the list. Normally linked lists 
are used when data of unknown length needs to be stored as occurs in random 
access floppy disks. In this application, there are a finite number of resource slots 
which cannot be exceeded. Hence the requirement of a decrementing counter which 
8. Algorithm Scheduling and Miscellaneous Topics 146 
is activated whenever a new filter needs to be initiated. However, if all filters are 
allocated, then this counter should trigger an interrupt to stop the filter being-
initiated. I f this interrupt is triggered, the current filter's triggered counter is reset 
and any frequency and bandwidth parameters which have changed during the course 
of the musical note event is ignored. 
A further refinement to the linked list algorithm would be to map the elements of 
the list to an index which redirects the DSP engine to an area of memory holding 
the filter parameters. Thus the list would appear more amenable to silicon as 
its functionality would be reduced and the elements would simply be indices. A 
simplified filterbank resource allocator is shown in figure 8.2. 
From Filter Decrementer 
Find index 
table 
Arbiter / Priority Encoder 
If 
flag is in running ignore 
else 
initialize filterbank and 
flip flag from idle 
Figure 8.2: Block Diagram of Resource Filter Allocation 
8.1.3 N u m b e r of F i l t ers required 
In the original CSOUND FOF unit generator, the subroutine prints error messages 
if the number of overlaps exceeds the number input in the score and orchestra files. 
8. A l g o r i t h m Schedul ing a n d Misce l laneous Topics 147 
The number of overlaps is equal to 
(/? + 7 ) - / P 
where /3 is the time at which the f inal decay starts, 7 is the length of this final 
decay and f p is the fundamental excitation frequency. Thus the equation represents 
the tota l grain l i fe t ime in seconds mul t ip l ied by a rate, g iving number of overlaps. 
Assuming the max imum excitation frequency of 7 kHz and (3 and 7 set to 0.01 and 
0.007 seconds respectively for singing voice emulation, the number of overlaps is 
140. Most parameters are constant for a lot longer than the max imum excitat ion 
frequency. Setting the excitat ion frequency to equal old excitat ion frequency divided 
by parameter update rate, provides a worst case estimate for the number of overlaps 
and hence number of f i l ters required. The value is found to be 14. A power of two for 
the number of f i l ters is preferable and 8 was chosen as adequate because i t provides 
parameter variat ion wi thou t causing excess overhead i n filter construction. 
8.2 Miscellaneous Topics 
I n this sub-section, the overlooked aspects of the synthesis method w i l l be addressed. 
The two simplest elements to implement are impulse j i t t e r and formant gain. For-
mant gain is achieved by mul t ip l ica t ion , whils t the j i t t e r is overcome by operating 
the phase accumulator at high speed and downsampling the result. 
8.2.1 E n v e l o p e G e n e r a t i o n 
Envelopes are t ime varying parameters and in musical synthesisers they are im-
plemented as piece-wise linear line segments. The structure is composed around 
an adder, a mul t ip lexor and addi t ional logic arranged in such a way as to allow 
breakpoints to be performed [97]. This method has three inputs; current value, in-
crement and f ina l value. I t has one output and the logic has four inputs ( including 
the ou tpu t ) to control the mapping of the f ina l value to the ou tpu t or to the adder's 
ou tpu t . 
The logic required to implement bipolar envelopes is shown in table 8.1, where X 
is either zero or one. Also, 1 i n a l l columns except the A c t i o n column represents a 
negative number and 0 represents a positive number. W h e n A c t i o n is one, ou tpu t 
is the F i n a l value, otherwise the N e w ou tpu t is selected. Th i s can be implemented 
i n silicon or on a general purpose DSP, after the t r u t h table is converted into sum-
of-products fo rm . A f t e r computat ion, the current value input becomes the envelope 
ou tpu t as shown i n figure 8.3. The generator is also capable of threshold, m a x i m u m 
8. A l g o r i t h m Schedu l ing a n d Misce l laneous Topics 148 
Fina l - New Current Increment F ina l New Ac t ion 
0 0 0 0 0 0 
0 0 0 0 1 1 
1 0 0 0 X 1 
X 0 0 1 X 1 
0 0 1 0 X 1 
1 0 1 0 0 0 
1 0 1 0 1 1 
0 0 1 1 0 0 
0 0 1 1 1 1 
1 0 1 1 X 0 
0 1 0 0 X 0 
1 1 0 0 0 1 
1 1 0 0 1 0 
0 1 0 1 0 1 
0 1 0 1 1 0 
1 1 0 1 X 1 
X 1 1 0 X 1 
0 1 1 1 X 1 
1 1 1 1 0 1 
Table 8.1: Decision Table for Bipolar Envelope Ramp Generator. (Af t e r [97]) 
and m i n i m u m functions. I n this implementat ion, the envelope generation is of 
paramount importance, but updat ing the envelope in piece-wise linear fashion must 
be made by setting the increment value before the f inal breakpoint value. Using a 
Boolean t r u t h table to logic minimisat ion process 2 the A c t i o n control line may be 
controlled via the fo l lowing Boolean expression :-
A = F • (N + C) + C -N + 7 • F + I • N 
where the symbols are defined as follows :-
F = Fina l N = New C = Current 
I = Increment A = Ac t ion 
8 .2 .2 S u b t r a c t i v e S y n t h e s i s E x t e n s i o n s 
I n this sub-section, I w i l l br iefly highlight the three most useful subtractive synthesis 
techniques; A D S R envelopes, d ig i ta l f i l t e r ing and low frequency oscillators, which 
al l provide more acoustical movement to the sound. 
2Using the Quine-McQluskey method. 





Figure 8.3: Schematic of a Bipolar Envelope Generator. (Af t e r [97]) 
The A D S R envelopes are directly related to the previous subsection and are an 
extension to the break-point envelope generator. This allows Attack, Decay, Sustain 
and Release (ADSR) style envelopes to control either the global ampli tude or the 
ind iv idua l formant ampli tude. This envelope can be implemented as interrupts f r o m 
the bipolar envelope generator to trigger new increment and final values which w i l l 
be loaded into i t . 
D i g i t a l f i l t e r ing of the sum of all the formant oscillators provides better emulation 
of the characteristics of the singing voice [6] by using a static f i l ter w i t h low cut-
off frequency and a small bandwid th in order to impose frequency asymmetry to 
the formant oscillators. Dynamic effects can be imposed by d r iv ing a d igi ta l filter 
w i t h an envelope or a keyboard scaling func t ion . However, i n this case, because 
the f i l ter 's parameters are t ime-varying, the designer needs a f i l ter which provides 
independent coefficient modif ica t ion. Therefore, varying frequency or bandwid th 
parameters does not affect the filter's stability. This type of filter is known as a 
state variable filter. A second order variant is shown in figure 2.3. 
Low frequency oscillators provide periodic movement to timbres and can be i m -
plemented as a table lookup oscillator. They, like envelopes, can be used as a 
modula t ion source w i t h i n any part of the synthesis technique, requir ing extra ad-
dit ions and mult ipl icat ions. Vibra to , for example, is the output ampli tude being 
mul t ip l i ed by an L F O , which is similar to A M radio. 
8.2.3 S y n t h e s i s b y R u l e 
I n parametric formant additive synthesis systems, because the user is confronted 
w i t h adding formants i n parallel, certain computat ional rules need to be invoked. 
8. A l g o r i t h m Schedul ing and Misce l laneous Topics 150 
The majo r rules concern the scaling factors of the filters and the methods used in 
order to ensure that dynamic range is preserved. This scaling can be thought of as a 
compander, that is a dynamic range compressor, l inked to its inverse, the expander. 
Th i s scaling technique is used natural ly when there is too much dynamic range 
available for the chosen storage medium, as is used in record producing studios. 
However, i n a studio si tuat ion this technique can be abused for creative gain, by 
making certain sounds more implosive. 
Other rules concern performance criteria, especially the relation of formants relative 
to the perceived pi tch . This happens in singing synthesis and is a frequency tracking-
technique. I n natural (non synthesis) singing, the singer w i l l use ar t icula t ion to 
control tonal clari ty and volume and therefore the harmonic structure of the sound 
w i l l become purer as the pi tch increases. These structures impar t naturalness to 
the t imbre. The singer can also impar t chaos onto the tone producing v ibra to 
effects. A l l these elements need to be either modelled by the synthesis technique or 
implanted into the synthesis model as a f o r m of rules [6, 130], thus captur ing the 
essence of the singing voice. 
B y provid ing a rule system, the composer can explore other musical timbres which 
have no natural analogue. I n the next section in this chapter, the user interface 
aspects of the synthesis technique w i l l be discussed. 
8.3 User Interface Design 
A n y musical instrument to be used by a human musician needs to be ergonomically 
designed. The designer is faced w i t h many confl ict ing parameters concerning on the 
one hand, implementat ion of the synthesis method, and on the other hand, making 
the instrument easy to use. The F O F synthesis technique used the LISP language 
to provide a hierarchical knowledge base to fur ther enhance the sonic possibilities 
of C H A N T which is known as F O R M E S [116]. However, i n a realisable and com-
mercial setting, i f musicians are forced to learn another language in order to achieve 
greater flexibility; progress in using the synthesis system for generating music w i l l be 
impeded. Musicians have spent a long t ime gett ing to know the intricacies of their 
chosen instrument and anything to provide rapid acceptance of a new instrument 
must be paramount i n the minds of musical instrument manufacturers. No won-
der the m a j o r i t y of electronic instruments utilise the M I D I specification, which was 
based on the Piano. I t is also no surprise that implementing a keyboard is easy com-
pared to translat ing v ibra t ing strings into an electronic message as used i n M I D I . 
8. A l g o r i t h m Schedu l ing a n d Misce l laneous Top ic s 151 
Therefore the designer must provide some tactile feedback f r o m the instrument so 
that the musician can create music in real-time. 
For this synthesis system, the design requires some "templates" which act as mod-
els for the part icular instruments the musician might wish to emulate; e.g. v io l in , 
singing voices or drums. When Yamaha brought out their V L 1 . the f irst rnonophonic 
waveguide based synthesiser to implement a set of physical models of w ind inst ru-
ments, the template approach was used. However, maybe an instrument should 
have greater depths which would mean musicians could get to know the possibilities 
over t ime, be they of performance or acoustical value. This happened in the Syn-
ergy, an additive based 32 oscillator wave table synthesiser which provided a f loa t ing 
spli t feature, al lowing two different tones to move across the keyboard rather than 
remaining static [63]. To date, no other instrument has provided this feature. Due 
to the rise of sampling, instruments have mul t ip le spli t points which are statically 
assigned across the keyboard. 
For the type of instrument under discussion, the designer should provide a simplistic 
synthesis model for quick edi t ing of recognisable sounds and an advanced layer 
beneath this. The latter layer could be thought of as a language such as F O R M E S 
or C S O U N D , provid ing flexibility should a musician wish to explore the synthesis 
method. The instrument would have a control surface providing a keyboard and 
knobs to interact using the template programming paradigm. I f the musician wanted 
to use the advanced programming mode, he would have to program the instrument 
outside its physical ent i ty by using the M I D I interface or other network system. 
I t is well known that most musicians do not program synthesisers, they play them 
and use tones created by th i rd parties. Hence the programming system should be 
thought of as an expansion possibility for which the manufacture could charge a 
p remium. These programmable layers would be interfaced to the synthesis engine 
either by a direct l ink through shared memory to the engine or via a software layer 
which would implement extra funct ional i ty , e.g. LFOs and rule bases. 
8.4 Hardware/Software Segmentation 
This thesis has shown that the filter, impulse and envelope generators are suitable 
for implementat ion in silicon. The impulse and envelope generators are simple in 
terms of hardware and hence would use min ima l resources on a die. Consequently 
they could be mult iplexed and could be designed automatical ly to per form their j ob 
w i t h a low data rate f r o m the host. This in format ion would be in the f o r m of piece-
wise lineai - breakpoints (values) and rates (gradients). The filter itself is numerically 
8. A l g o r i t h m Schedul ing a n d Misce l laneous Topics 152 
intensive and is best suited to hardware because of its regular high speed structure. 
The polyphase decimator is a b i t of a chimera, i n that i t could be implemented in 
hardware or software depending on the spare resources available on the silicon die. 
The rest of the design, including controller, user interface and subtractive f i l t e r ing 
would be implemented in software on a host computer. The f i l terbank controller, 
because of its non ari thmetic components, could easily be designed to run on the 
host in software. The host would need to be reasonably fast i n logic computat ion 
because of the f i l terbank controller's fast response. However, this would make a 
more expensive host to co-processor interface due to the fast data transfers across 
the bus. 
This would provide the f l ex ib i l i ty necessary for the interaction between man and 
machine. 
C H A P T E R 9 
Conclusions and Further Work 
9.1 Conclusions 
This thesis has investigated the implementational issues of a novel formant synthesis 
technique loosely based on the F O F paradigm. Previous incarnations have been 
unsuitable for real-time implementation owing to the way the a lgor i thm evolved 
under a software environment. I n order to understand the difficult ies involved in 
this research direction, a review of ar i thmetic algorithms and architectures was 
performed. I n addi t ion, the major synthesis algorithms were reviewed, not ing that 
the nearest competi tor is sinusoidal addit ive synthesis. 
The current trend in the f ield is to implement algorithms in software, relying on 
chip manufacturers and processes to increase clock speeds and minimise transistor 
sizes on a die. This increases bo th the number of operations and also the speed 
of which a silicon chip can perform. This has led to the prol i fera t ion of sinusoidal 
addit ive synthesis techniques using inverse fast fourier transforms. Al ternat ively a 
mul t i - ra te approach to additive synthesis can be used, borrowing heavily on current 
telecommunication theory using quadrature mi r ro r filters and perfect reconstruc-
t ion . However, to generate rich sound s t i l l requires an enormous amounts of data 
to describe each ind iv idua l sinusoid in phase, frequency and ampli tude at every 
instant. 
I n formant based synthesis, the source can be modelled by an idealistic pulse shape 
and the f i l t e r describes a decaying sinusoid. Instead of a single impulse at a specific 
height and frequency, as in sinusoidal synthesis, a peak was obtained in the frequency 
153 
9. C o n c l u s i o n s a n d F u r t h e r W o r k 154 
domain, which has a spread of frequencies caused by the decaying sinusoid. Using 
this representation, i t is possible to model many sounds w i t h less parameters than 
in sinusoidal based addit ive synthesis. I n most p r imi t ive parametric based addit ive 
synthesis engines using formants, the structure is either a cascade or a parallel 
arrangement of fi l ters. I t is well known that the human voice can be modelled 
accurately in this way. 
The synthesis a lgor i thm proposed extends the simplistic parallel formant structure 
to new heights and complexities, but keeps the parameters the same as in the simplis-
t ic case. Af t e r reviewing many fi l ter topologies, the time squared decaying sinusoid 
f i l t e r was chosen for its f ixed number of operations and delays over al l bandwidths 
and frequencies. Because of the pole sensitivities caused by I I R filters, a Gold-Rader 
coupled f o r m second order f i l te r and a small F I R filter were used. Scaling of these 
filters was addressed and i t was assumed that the controll ing processor would select 
the o p t i m u m values. 
The d r iv ing func t ion was optimised for silicon by using a phase accumulator, and 
the single b i t output controlled a latch having a variable ampli tude or zero. Th i s 
then passed through a polyphase decimator to preserve t i m i n g in format ion at a 
lower rate. This is analogous to l ight passing through an aperture and spreading 
outwards in space. The decimator is symmetric, so a structure to reduce the physical 
coefficients by half was found. A single decimator could not be produced due to 
the large number of coefficients and so a two stage version was found to be op t imal . 
The filters were then run al l in parallel at a slow rate. 
I n order to control the fi l ters, a system to monitor their progress was required. 
Counters were necessary to deallocate them after a certain length of t ime. This is 
very similar to note p r io r i ty and note stealing algorithms in synthesisers, but the 
technique has been extended to a basic part of the synthesis itself. This added 
burden was necessary because once the coefficients were loaded into the filter they 
became frozen un t i l the filter's response decayed to zero. This mirrors the wave-
func t ion mental i ty as found in FOF . When all filters are used up in a part icular 
formant , the impulse driver excites the last f i l ter and resets the counter u n t i l an 
available filter appears. The filter coefficients i n this case stay the same. 
The design of the technique requires the par t i t ion ing of DSP intensive operations 
f r o m a controller. H igh level user interface requirements were addressed w i t h the 
addi t ion of low speed functions to dynamically vary the spectra. These should 
operate i n the controller or i n the main processor of the synthesis engine. 
I t had been hoped that the addi t ion of parallel formant structures would avoid 
9. C o n c l u s i o n s and F u r t h e r W o r k 155 
spectral zeros. Methods to overcome this problem proved d i f f i cu l t . The simplest 
approach is to alternate the f i l terbanks ' signs. Al ternat ively the ou tpu t of the 
decimated impulse response could be delayed according to an index, so that each 
f i l ter would have a sl ightly time-skewed driver signal. One factor which seemed to 
be negligible was the in i t i a l attack envelope of the waveform. This is the opposite 
of the work highlighted in [115, 116]. 
The key conclusions obtained were the reduction of bandwid th outside the DSP 
engine and the high bandwid th inside, thus allowing a reduction in parameter up-
dates. The design chosen traded f lex ib i l i ty for realizabilty in order to provide a 
system suitable for V L S I implementat ion. This design models the large scale spec-
t ra l properties of a sound rather than its fundamental components. This follows 
on logically f r o m the fact that many sounds can be described as audio waves con-
strained in a box, resulting in formants. The only commercial synthesiser which 
uses formants to model vowel voices is the subtractive based sampler known as the 
E-mu Morpheus which has a 14 pole interpolat ing coefficient d ig i ta l f i l te r structure 
per voice, but i t is not easily controlled. 
A comparison between a bit-serial sinusoidal generator and a C O R D I C parallel 
version found them to be of comparable size and speed. The C O R D I C version 
takes 21 clock cycles to generate a sample representing the sine of an angle. The 
bit-serial I I R f i l te r version takes 24 clock cycles for mul t ip l i ca t ion and an addi t ional 
clock cycle for addi t ion to complete a mul t ip l ica t ion and add operation. T w o sets 
of delay and ar i thmetic operators are performed i n parallel, and so the I I R f i l te r is 
marginal ly slower than the C O R D I C variant. 
M y new theoretical approach to d ig i ta l design, known as D A D Logic, uses an active 
feedback mechanism to map a cont inuum of voltages to discrete voltages. This 
replaces the hysteresis mechanism for binary digi ta l circuits. The resulting scheme 
appears as a multi-voltage quantiser and allows standard analogue circui t design 
practice to be used, which reduces the number of transistors necessary in a design. 
Instead of coding mul t ip l ica t ion over many digits and wires, as i n d ig i ta l design, the 
number of wires is dramatical ly reduced and the physical behaviour of the transistors 
are uti l ised to per form the mul t ip l ica t ion . Thus the reduced number of wires and 
transistors allow higher speed w i t h lower power consumption. To the outside wor ld , 
the chip appeal's to be entirely d ig i ta l , i n that i t only accepts binary d ig i ta l inpu t . 
9. C o n c l u s i o n s a n d F u r t h e r W o r k 156 
9.2 Further Work 
Further work can be directed i n a number of directions. As i t stands, there have 
been no direct implementations of the system in silicon or software. I n order to 
make the technology useful ergonomic constraints would have to be deployed. This 
would result i n an in tu i t ive user interface which musicians could use w i t h min ima l 
t ra in ing . 
A n interesting line of attack would be to look at the discrete summation formu-
las [95, 129] for formant generation and investigate the possibilities of implemen-
ta t ion of a F O F wavefunction as a square root f i l ter . The summing of formant 
regions and the normalisation factors necessary to obtain a perfect mul t i - fo rmant 
model of the sound also needs investigation. Hopeful ly this could also be applied to 
f i l t e r implementat ion. I n theory and in practice filters i n cascade f o r m of a partic-
ular t ime domain model performed better than the equivalent parallel fo rm, even 
when floating point ar i thmetic was used. Does the parallel fo rm require manual fine 
tun ing to overcome numerical imprecision ? 
Rapid pro to typ ing of a lgor i thm to ASIC using high-level languages such as Silage 
and C A D design tools such as Hyper and C A T H E D R A L need to be used to a 
greater degree. Also high-level s imulat ion of algorithms in Mat lab and/or Ptolemy 
are essential, as coding of algorithms is very t ime consuming. The need for some 
high level opt imisa t ion process like Javalina [41] to symbolically analyse and control 
filter topologies would enable designers to investigate their performance and to 
reduce the number of mult ipl iers necessary. Noise and sensitivity requirements 
could also be controlled. To my knowledge H Y P E R and IRIS do not allow the 
topology represented by a signal flow diagram to be rearranged. 
I n order to bu i ld this synthesis system, ready bui l t Intellectual Property compo-
nents would be required. Combining these components w i t h V H D L would reduce 
design t ime significantly. W i t h such methods and using a system design paradigm 
algori thms which had been created in software could be transferred to ASIC imple-
mentat ion very easily. 
I n chapter 4 the need for different low level technologies and architectures was inves-
tigated. This is especially impor tant w i t h portable applications requir ing low power 
consumption. I f signals on wires carried more than a binary representation, then 
designers could make the ar i thmetic elements more versatile, and the space savings 
could be uti l ised to provide more funct ional units. Low power could be achieved 
by using parallel execution units whilst s t i l l operating at modest clock rates. Asyn-
chronous design removes the global clock and replaces i t w i t h handshakes control l ing 
9. C o n c l u s i o n s a n d F u r t h e r W o r k 157 
dataflow locally. I t can operate as fast as clocked logic and was erroneously thought 
t o have a low power benefit: D A D Logic, which is inherently analogue ibased, would 
automatical ly operate in an asynchronous manner should the need arise. 
C H A P T E R A 
The Laplace to Z Transform 
Algorithm 
This a lgor i thm uses the property that the product of two signals is the convolution 
i n the s domain (Laplace). 
The ideal sampling func t ion , Srs{t), is represented as a t ra in of uni t ampl i tude 




Taking the Laplace t ransform of STS is 
£[* r , ( f ) ] = A r . (p ) 
The result above when expressed in closed f o r m becomes 
A r . ( « - P ) = 1 _ e _ 1 ( , . p l r , (A . l ) 
Let the Laplace t ransform of the input signal, x(t), be X{p) and comput ing the 
Laplace t ransform of the product of x(t) and the sampling funct ion , 6rs{t), the 




A . T h e L a p l a c e to Z T r a n s f o r m A l g o r i t h m 159 
c[x(t) • sTs(t)} = y(s) 
= ^ r r * ^ &TS(s-p)dP (A.2) 
Z7TJ Jcr-joo 
Subst i tu t ing equation A . l into equation A.2 results i n the fol lowing expression 
(A.3) 
Now, f r o m the z t ransform, e~sT" = z~i, so equation A.3 can be rewri t ten to 
equation A.4 below. Note that any Laplace t ransform of x(t) which has e ~ a s T s can 
be substi tuted for z~a. 
y [ z ) = _L r * ° 
— J(J 
*(p) 
2-KJ <J-JOO 1 — e p T s z' 
-dp (A.4) 
The integral in equation A.4 can be evaluated using Cauchy's residue theorem. The 
theorem states that i f V(s) is analytic w i t h i n and on a closed contour C, except 
possibly at a f in i te number of singularities w i t h i n C, then 
2TTJ JC 
where kT are the residues of V(s) at the singularities. 
A p p l y i n g the theorem yields the fol lowing a lgor i thm A.5. 
y ( z ) = 51 residues of 
poles of X(p) 
1 - evTs z - i 
(A.5) 
The residue for a pole of order ?n at p — x can be evaluated using the fol lowing 
expression 
i . . f d m ' 1 r , , z 
z — & 
lim 
{m- l)\ P^x [dp™-1 (P-*r'X(p)—<f; 
References 
[1] A b o , A . and Mehta, S. "CMOS Current Mode Adders." May 1993. From 
U R L kabuki.eecs.berkeley.edu/~abo/. 
[2] Adams, R. and K w a n , T . "A Stereo Asynchronous Dig i t a l Sample-Rate 
Converter for D ig i t a l Audio ." IEEE Journal of Solid-State Circuits, 29(4), 
pp. 481-488, A p r i l 1994. 
[3] A l - l b r ah im , M . and Al-Khateeb, A. "Dig i t a l Sinusoidal Oscillator w i t h Low 
and U n i f r o m Frequency Spacing." IEE Proceedings - Circuits, Devices and 
Systems, 144(3), pp. 185-189, June 1997. 
[4] Avizienis, A . "Signed-Digit Number Representations for Fast Parallel A r i t h -
metic." IEEE Transactions on Electronic Computers, EC-10, pp. 389-400, 
September 1961. 
[5] Ba j a rd , J . -C , K la , S. and Mul ler , J . -M. " B K M : A new Hardware Algo-
r i t h m for Complex Elementary Functions." IEEE Transactions on Comput-
ers, 43(8), pp. 955-963, August 1994. 
[6] Bennett , G. and Rodet, X . "Synthesis of the Singing Voice." I n Mathews, 
M . and Pierce, J., eds., Current Directions in Computer Music Research, pp. 
19-44. M I T Press, Cambridge, Massachusetts, 1989. 
[7] Brennan, J. "Mul t i leve l ASICs Boost Audio Recording Applicat ions." IEEE 
Circuits and Sytems Magazine, pp. 18-21, May 1996. 
[8] Capasso, F. , Sen, S., Bel t ram, F. , Lunard i , L . , Vengunlekar, A. , Smi th , P., 
Shah, N . , Ma l ik , R. and Cho, A. "Quantum Functional Devices : Reso-
nant Tunnel ing Transistors, Circuits w i t h Reduced Complexity, and M u l t i p l e -
Valued Logic." IEEE Transactions on Electron Devices, 36(10), pp. 2065-
2082, October 1989. 
160 
R E F E R E N C E S 161 
[9] Carter, G. and K r o n , G. " A . C . Network Analyzer Study of the Schrodinger 
Equat ion." Physical Review, 67(1/2), pp. 44-49, Jan 1/15 1945. 
[10] Chamberl in , H . Musical Applications of Microprocessors. Hatden Books, A 
divis ion of Howard W . Sams & Company, USA, second edit ion, 1985. I S B N 
0-8104-5768-7. 
[11] Chowning, J. M . "The Synthesis of Complex Audio Spectra by Means of 
Frequency Modula t ion . " I n Roads, C. and Strawn, J., eds., Foundations 
of Computer Music, pp. 1-29. M I T Press, Cambridge, Massachusetts, 1985. 
Reprinted f r o m the Journal of the Audio Engineering Society 21(7), 1973. 
[12] Chua, L . O. and Deng, A . -C . "Negative Resistance Devices: Part I I . " In-
ternational Journal of Circuit Theory and Applications, 12(4), pp. 337-373, 
1984. 
[13] Chua, L . O., Y u , J. and Y u , Y . "Negative Resistance Devices." International 
Journal of Circuit Theory and Applications, 11(2), pp. 161-186, 1983. 
[14] Chua, L . 0 . , Y u , J. and Y u , Y . " B i p o l a r - J F E T - M O S F E T Negative Resistance 
Devices." IEEE Transactions on circuits and Systems, CAS-32(1), pp. 46-61 , 
January 1985. 
[15] Clarke, J., Manning , P., Berry, R. and Purvis, A. " V O C E L : New Implemen-
tations of the F O F Synthesis Method." I n Proceedings of the International 
Computers Music Conference, pp. 333-348. Cologne, W-Germany, 1988. 
[16] Crochiere, R. E. and Oppenheim, A . V . "Analysis of Linear D ig i t a l Networks." 
Proceedings of the IEEE, 63(4), pp. 581-595, A p r i l 1975. 
[17] Current , K . "Current-Mode CMOS Mult ip le-Valued Logic Circuits ." IEEE 
Journal of Solid-State Circuits, 29(2), pp. 95-107, February 1994. 
[18] Daggett, D . "Decimal-binary conversions in C O R D I C . " IRE Transactions 
On Electronic Computers, EC-8(3), pp. 335-339, September 1959. 
[19] Dat to r ro , J. "The Implementat ion of Recursive Dig i t a l Fil ters for High-
Fidel i ty Audio . " Journal of the Audio Engineering Society, 36(11), pp. 8 5 1 -
878, November 1998. 
[20] Denyer, P. and Renshaw, D. VLSI Signal Processing: a Bit-Serial Approach. 
Addison-Wesley Publishing Company, 1985. 
R E F E R E N C E S 162 
[21] Depalle, P., Mat ignon, D . and Stroppa, M . "Source-filter fo rmula t ion and 
analytic control of the sk i r tw id th of C H A N T formant-wave-functions." I n 
Proceedings of the International Computer Music Conference, pp. 372-373. 
San Jose, California, USA, October 1992. 
[22] Despain, A. , Peterson, A. , Rothaus, O. and Wold , E. "Fast fourier trans-
f o r m processors using gaussian residue ari thmetic." Journal of Parallel And 
Distributed Computing, 2(3), pp. 219-237, 1985. 
[23] D i n g , Y . and Rossum, D . "Fil ter Morph ing of Parametric Equalizers and 
Shelving Fil ters for Audio Signal Processing." Journal of the Audio Engineer-
ing Society, 43(10), pp. 821-826, October 1995. 
[24] Edmonds, A . Angular Momentum in Quantum Mechanics. Pr inceton Land-
marks in Mathematics and Pysics series. Princeton University Press, 2nd Ed i -
t i on , 4 th p r in t ing , 1996. 
[25] El lei thy, K . and Bayoumi, M . "Fast and Flexible Architectures for RNS A r i t h -
metic Decoding." IEEE Transactions on Circuits and Systems - II : Analog 
and Digital Signal Processing, 39(4), pp. 226-235, A p r i l 1992. 
[26] Ercegovac, M . D . and Lang, T . "On-the-Fly Conversion of Redundant into 
Conventional Representations." IEEE Transactions on Computers, C-36(7), 
pp. 895-897, July 1987. 
[27] Ercegovac, M . D . and Lang, T . "Fast Mul t i p l i c a t i on W i t h o u t Carry-Propogate 
A d d i t i o n . " IEEE Transactions on Computers, 39(11), pp. 1385-1390, Novem-
ber 1990. 
[28] ES2. ES2 Cadence Design Kit User Guide. European Silicon Structures, 
Fiance, May 1994. 
[29] Etzel , M . and Jenkins, W . "The design of specialized residue classes for 
efficient recursive d ig i ta l filter realization." IEEE Transactions On Acoustics, 
Speech And Signal Processing, ASSP-30(3), pp. 370-380, 1982. 
[30] F.J.Taylor and Ranmarayanan, A . " A n efficient residue-to-decimal converter." 
IEEE Transactions On Circuits And Systems, CAS-28(12), pp. 1164-1169, 
1981. 
[31] Fliege, N . and Wintermantel , J. "Complex Dig i t a l Oscillators and FSK M o d -
ulators." IEEE Transactions on Signal Processing, 40(2), pp. 333-342, Febru-
ary 1992. 
R E F E R E N C E S 163 
[32] Fliege, N. J. Multirate Digital Signal Processing : Multirate Systems, Filter 
Banks, Wavelets. John Wiley & Sons, 1994. ISBN 0-471-93976-5. 
[33] Fredkin, E. and Toffoli, T. "Conservative logic." International Journal of 
Theoretical Physics, 21(3/4), pp. 219-253, 1982. 
[34] Freed, A., Rodet, X. and Depalle, P. "Synthesis and Control of Hundreds 
of Sinusoidal Partials on a Desktop Computer without Custom Hardware." 
In Proceedings of the International Computer Music Conference, pp. 98-101. 
Tokyo, Japan, September 10-15 1993. 
[35] Gabor, D. "Theory of Communication." Journal of the Institute of Electrical 
Engineers, 93(Part I I I ) , pp. 429-457, 1946. 
[36] Garner, H. "The residue number system." IRE Timisactions On Electronic 
Computers, EC-8(6), pp. 140-147, 1959. 
[37] Gilbert, B. "A Precise Four-Quadrant Multiplier with Subnanosecond Re-
sponse." IEEE Journal of Solid-State Circuits, SC-3(4), pp. 365-373, Decem-
ber 1968. 
[38] Guillemain, P. and Kronland-Martinet, R. "Characterization of acoustic sig-
nals through continuous linear time-frequency representations." Proceedings 
of the IEEE, 84(4), pp. 561-585, 1996. 
[39] Hatano, Y., Yano, S., Mori, H., Yamada, H. and Hirano, M. "A 1-GIPS 
Josephson Data Processor." IEEE Journal of Solid-State Circuits, 26(6), 
pp. 880-883, June 1991. 
[40] Haviland, G. and Tuszynski, A. "CORDIC Arithmetic Processor Chip." IEEE 
Transactions on Computers, C-29(2), pp. 68-79, February 1980. 
[41] Hebel, K. J. "Javelina: An Environment for Digital Signal Processing Software 
Development." Computer Music Journal, 13(2), pp. 39-47, Summer 1989. 
[42] Hiasat, A. "New designs for a sign detector and a residue binary convertor." 
IEE Proceedings-G Circuits Devices And Systems, 140(4), pp. 247-252, 1993. 
[43] Higgins, R. Digital Signal Processing in VLSI. Prentice-Hall Inc., 1990. 
[44] Honda, M . , Kameyama, M . and Higuchi, T. "Residue arithmetic based 
multiple-valued VLSI image processor." In IEEE 22nd International Sym-
posium On Multiple-Valued Logic, pp. 330-336. IEEE Press, 1992. 
R E F E R E N C E S 164 
[45] Horner, A. "A comparison of Wavetable and F M Parameter Spaces." Com-
puter Music Journal, 4(21), pp. 55-85, Winter 1997. 
[46] Horner, A. and Beauchamp, J. "Piecewise-Linear Approximation of Additive 
Synthesis Envelopes: A Comparison of Various Methods." Computer Music 
Journal, 20(2), pp. 72-95, Summer 1996. 
[47] Houghton, A. D., Fisher, A. J. and Malet, T. F. "An ASIC for Digital Additive 
Sine-wave Synthesis." Computer Music Journal, 19(3), pp. 26-31, Fall 1995. 
[48] Hu, X., Harber, R. G. and Bass, S. C. "Expanding the Range of Convergence 
of the CORDIC Algorithm." IEEE Transactions on Computers, 40(1), pp. 13-
21, January 1991. 
[49] Hu, Y. "CORDIC-Based VLSI Architectures for Digital Signal Processing." 
IEEE Signal Processing Magazine, pp. 16-35, July 1992. 
[50] Hu, Y. H. "The Quantization Effects of the CORDIC Algorithm." IEEE 
Transactions on Signal Processing, 40(4), pp. 834-844, Apr i l 1992. 
[51] Huang, X., Liu, W.-J. and Wei, B. W. "A High-Performance CMOS Redun-
dant Binary Multiplication-and-Accumulation ( MAC ) Unit." IEEE Trans-
actions On Circuits And Systems - 1: Fundamental Theory And Applications, 
41(1), pp. 33-39, January 1994. 
[52] Hwang, K. Computer Arithmetic - Principles, Architecture And Design. John 
Wiley & Sons, Inc., 1979. 
[53] Ifeachor, E. and Jervis, B. Digital Signal Processing : A Practical Approach. 
Addison-Wesley Publishing Company, 1993. 
[54] Itagaki, T., Purvis, A. and Manning, P. "Real-time Synthesis on a Mul t i -
processor Network." In Proceedings of the International Computer Music 
Conference, pp. 382-385. Aarhus, Denmark, September 1994. 
[55] Jain, A., Bolton, R. and Abd-El-Barr, M. "CMOS Multiple-Valued Logic 
Design - part I : Circuit Implementation." IEEE Transactions on Circuits 
and Systems - I: Fundamental Theory and Applications, 40(8), pp. 503-514, 
August 1993. 
[56] Jain, A., Bolton, R. and Abd-El-Barr, M . "CMOS Multiple-Valued Logic 
Design - part I I : Function Realization." IEEE Transactions on Circuits 
and Systems - I: Fundamental Theory and Applications, 40(8), pp. 515-523, 
August 1993. 
R E F E R E N C E S 165 
[57] James, M . , Smith, G. and Wolford, J. Analog and Digital Computer Methods 
IN ENGINEERING ANALYSIS. International Textbook Company, Scranton, 
Pennsylvania, 1964. 
[58] Jansen, C. "Sine Circuitu : 10,000 high quality sine waves without detours." 
In Proceedings of the International Computer Music Conference, pp. 222-225. 
Montreal, Canada, 1991. 
[59] Jenkins, W. and Leon, B. "The use of residue number systems in the design 
of finite impulse response digital filters." IEEE Transactions On Circuits And 
Systems, CAS-24(4), pp. 191-201, 1977. 
[60] Johns, D. and Lewis, D. "IIR Filtering on Sigma-Delta Modulated Signals." 
Electronics Letters, 27(4), pp. 307-308, 14th February 1991. 
[61] Kahrs, M . "Notes on Very-Large-Scale Integration and the Design of Real-
Time Digital Sound Processors." In Roads, C , ed., The Music Machine : 
Selected Readings from Computer Music Journal, pp. 623-631. M I T Press, 
Cambridge, Massachusetts, 1989. Reprinted from Computer Music Journal, 
vol. 5, No. 2, Summer 1981. 
[62] Kameyama, M . , Kawahito, S. and Higuchi, T. "A multiplier chip with 
multiple-valued bidirectional current-mode logic circuits." IEEE Computer 
Magazine, 21, pp. 43-56, Apr i l 1988. 
[63] Kaplan, S. J. "Developing a Commercial Digital Sound Synthesizer." In 
Roads, C , ed., The Music Machine : Selected Readings from Computer Music 
Journal, pp. 611-622. M I T Press, Cambridge, Massachusetts, 1989. Reprinted 
from Computer Music Journal, vol. 5, No. 3, Fall 1981. 
[64] Karplus, W. J. Analog Simulation - SOLUTION OF FIELD PROBLEMS. 
McGraw-Hill Series in Information Processing and Computers. McGraw-Hill 
Company, Inc., 1958. 
[65] Karplus, W. J. and Soroka, W. W. ANALOG METHODS - Computation 
and Simulation. McGraw-Hill Series in Engineering Sciences. McGraw-Hill 
Company, Inc., 2nd edition, 1959. 
[66] Kawahito, S., Kameyama, M. , Higuchi, T. and Yamada, H. "A high speed 
compact multiplier based on multiple-valued bi-directional current-mode cir-
cuits." In IEEE HTh International Symposium, On Multiple-Valued Logic, 
pp. 172-180. IEEE Press, 1987. 
R E F E R E N C E S 166 
[67] Kingsbury, N. "Second-Order Recursive Digital Filter Element for Poles near 
the Unit Circle and the Real Z Axis." Electronics Letters, 8(6), pp. 155-156, 
23rd March 1972. 
[68] Kingsbury, N. "Digital-filter 2nd-Order Element with Low Quantising Noise 
for Poles and Zeros at Low Frequencies." Electronics Letters, 9(12), pp. 271-
273, 14th June 1973. 
[69] Klatt , D. "Software for a cascade/parallel formant synthesizer." Journal of 
the Acoustic Society of America, 67(3), pp. 971-995, March 1980. 
[70] Kron, G. "Electric Circuit Models of the Schrodinger Equation." Physical 
Review, 67(1/2), pp. 39-43, Jan 1/15 1945. 
[71] Kuroda, T., Suzuki, K. et al. "Variable Supply-Voltage Scheme for Low-Power 
High-Speed CMOS Digital Design." IEEE Journal of Solid-State Circuits, 
33(3), pp. 454-462, March 1998. 
[72] Laakso, T. I . , Valimaki, V., Karjalainen, M . and Laine, U. K. "Splitting the 
Unit Delay - Tools for fractional delay filter design." IEEE Signal Processing 
Magazine, 13(1), pp. 30-60, January 1996. 
[73] Lenzlinger, M. and Snow, E. "Fowler-Nordheim Tunneling into Thermally 
Grown Si02-" Journal of Applied Physics, 40(1), pp. 278-283, January 1969. 
[74] Levy, H. J. and McGill, T. "A Feedforward Artificial Neural Network Based 
on Quantum Effect Vector-Matrix Multipliers." IEEE Transactions on Neural 
Networks, 4(3), pp. 427-433, May 1993. 
[75] L i , Y., Eichmann, G., Dorsinville, R. and Alfano, R. "Demonstration of a 
picosecond optical-phase-conjugation-based residue-arithmetic computation." 
Optics Letters, 13(2), pp. 178-180, 1988. 
[76] Linggard, R. Electronic Synthesis of Speech. Cambridge University Press, 
1985. 
[77] Lu, A. and Roberts, G. "An Analog Multi-Tone Signal Generator for Buil t- in-
Self-Test Applications." In The International Test Conference, pp. 650-659. 
1994. 
[78] Lu, A. K., Roberts, G. W. and Johns, D. A. "A High-Quality Analog Oscil-
lator Using Oversampling D / A Conversion Techniques." IEEE Transactions 
on Circuits and Systems - II: Analog and Digital Signal Processing, 41(7), 
pp. 437-444, July 1994. 
R E F E R E N C E S 167 
[79] Lyon, R. "Two's Complement Pipeline Multipliers." IEEE Transactions on 
Communications, COM-24, pp. 418-425, Apri l 1976. 
[80] Lyon, R. F. "Filters : An Integrated Digital Filter Subsystem." In Denver, P. 
and Renshaw, D., eds., VLSI Signal Processing: a Bit-Serial Approach, pp. 
253-262. Addison-Wesley Publishing Company, 1985. 
[81] Madisetti, V. K. Digital Signal Processors AN INTRODUCTION TO RAPID 
PROTOTYPING AND DESIGN SYNTHESIS. Butterworth-Heinemann, 
Boston, 1995. ISBN 0-7506-9406-8. 
[82] Manning, P. Electronic and Computer Music. Clarendon Press, Oxford, 1985. 
[83] Marks, P. "Making waves." New Scientist, 157(2124), pp. 34-37, 7th March 
1998. 
[84] Massie, D. C. "An Engineering Study of the Four-Multiply Normalized Ladder 
Filter." Jorunal of the Audio Engineering Society, 41(7/8), pp. 564-582, July 
1993. 
[85] Matthews, R. "Take a spin ..." New Scientist Magazine, 157(2123), pp. 24-28, 
28th February 1998. 
[86] Mauchly, J. and Charpentier, A. "Practical Considerations in the Design of 
Music Systems using VLSI." In The Proceedings of the AES 5th Interna-
tional Conference on Music and Digital Technology, pp. 28-36. The Audio 
Engineering Society, May 1987. 
[87] McNally, G. "Digital Audio : Recursive Digital Filtering for High Quality 
Audio Signals." Report No. 1981/10, The BBC Research Department, 1981. 
[88] McNally, G. "Dynamic Range Control of Digital Audio Signals." Jounal of 
the Audio Engineering Society, 32(5), pp. 316-327, May 1994. 
[89] Mead, C. Analog VLSI And Neural Systems. Addison-Wesley Publishing 
Company, 1989. 
[90] Mirsalehi, M . , Shamir, J. and Caulfield, H. "Residue arithmetic processing 
utilizing optical fredkin gate arrays." Applied Optics, 28(18), pp. 3940-3946, 
1987. 
[91] Mohan, S., Mazumder, P. and Haddad, G. "Subnanosecond 32 bit multiplier 
using negative differential resistance devices." Electronics Letters, 27(21), 
pp. 1929-1931, 1991. 
R E F E R E N C E S 168 
[92] Mohan, S., Mazumder, P., Haddad, G., Mains, R. and Sun, J. "Logic design 
based on negative differential resistance characteristics of quantum electronic 
devices." IEE Proceedings-G, 140(6), pp. 383-391, 1993. 
[93] Moore, B. C. An Introduction to the Psychology of Hearing. Academic Press 
Ltd. , London, second edition, 1991. ISBN 0-12-505624-9. 
[94] Moore, F. "Table Lookup Noise for Sinusoidal Digital Oscillators." In .Roads, 
C. and Strawn, J., eds., Foundations of Computer Music, pp. 326-334. M I T 
Press, Cambridge, Massachusetts, 1985. Appeared in Computer Music Journal 
1(2):26-29,1977. 
[95] Moorer, J. A. "The Synthesis of Complex Audio Spectra by Means of Dis-
crete Summation Formulas." Journal of the Audio Engineering Society, 24(9), 
pp. 717-727, November 1976. 
[96] Moorer, J. A. "The Lucasfilm Audio Signal Processor." In Roads, C , ed., The 
Music Machine : Selected Readings from Computer Music Journal, pp. 599-
609. M I T Press, Cambridge, Massachusetts, 1989. Reprinted from Computer 
Music Journal, vol. 6, No. 3, Fall 1982. 
[97] Moorer, J. A., Chauveau, A., Roads, C , Eastty, P. and Lawson, J. "The 
4C Machine." In Roads, C. and Strawn, J., eds., Foundations of Computer 
Music, pp. 261-280. M I T Press, Cambridge, Massachusetts, 1991. Reprinted 
from Computer Music Journal, vol. 3, No. 3, 1979, pg 16-24. 
[98] Moshinsky, M. and Smirnov, Y. The Harmonic Oscillator in Modern Physics. 
Harwood Academic Publishers, 1996. 
[99] Mullis, C. T. and Roberts, R. A. "Synthesis of Minimum Roundoff Noise 
Fixed Point Digital Filters." IEEE Transactions on circuits and systems, 
CAS-23(9), pp. 551-562, September 1976. 
[100] Nielsen, R. O. and Willson, JR, A. N. "A Fundamental Result Concerning 
the Topology of Transistor Circuits with Multiple Equilibria." Proceedings of 
the IEEE, 68(2), pp. 196-208, February 1980. 
[101] Oates, S. and Eaglestone, B. "Analytical Methods for Group Additive Syn-
thesis." Computer Music Journal, 21(2), pp. 21-39, Summer 1997. 
[102] Oppenheim, A. V. and Schafer, R. W. Digital Signal Processing. Prentice/Hall 
International, Inc., 1975. 
R E F E R E N C E S 169 
[103] Phillips, D. November 1995. Matlab FOF script and E-Mail correspondence. 
[104] Phillips, D., Purvis, A. and Johnson, S. "A Multirate Optimisation for Real-
Time Additive Synthesis." In Proceedings of the International Computer Mu-
sic Conference, pp. 364-367. Aarhus, Denmark, September 1994. 
[105] Phillips, D., Purvis, A. and Johnson, S. "An Efficient Algorithm and Archi-
tecture for Real-Time Additive Synthesis of Musical Tones." In Euromicro 
Conference, pp. 1-8. Como, Italy, September 1995. 
[106] Phillips, D., Purvis, A. and Johnson, S. "Multirate Additive Synthesis." In 
Proceedings of the International Computer Music Conference, pp. 496-499. 
Hong Kong, August 1996. 
[107] Phillips, D., Purvis, A. and Johnson, S. "On an efficient VLSI Architecture 
for the multirate additive synthesis of musical tones." Journal of Systems 
Architecture, 43(1-5), pp. 337-340, 1997. 
[108] Phillips, D. K. Algorithms and Architectures for the Multirate Additive Syn-
thesis of Musical Tones. Ph.D. thesis, School of Engineering, University of 
Durham, December 1996. 
[109] Prieto, A., Pelayo, F. and Lloris, A. "Multithreshold logic circuits imple-
mented with operational amplifiers." International Journal of Electronics, 
58(3), pp. 395-406, 1985. 
[110] Primlani, K. K. and Meador, J. L. "A Nonredundant-Radix-4 Serial Mul t i -
plier." IEEE Journal of Solid-State Circuits, 24(6), pp. 1729-1736, December 
1989. 
[ I l l ] Pryce, J. D. Numerical Solution of Sturm-Liouville Problems. Monographs 
on Numerical Analysis. Clarendon Press, 1993. 
[112] Puckette, M . "Process and device for musical and vocal dynamic sound syn-
thesis by non-linear distortion and amplitude modulation." June 4th 1996. 
USA Patent No. 5,524,173. 
[113] Rader, C. and Gold, B. "Effects of Parametric Quantization on the Poles of 
a Digital Filter." Proceedings of the IEEE, pp. 688-689, May 1967. 
[114] Roads, C , Strawn, J., Abbott, C , Gordon, J. and Greenspun, P. The Com-
puter Music Tutorial. M I T Press, Cambridge, Massachusetts, 1996. 
R E F E R E N C E S 170 
[115] Rodet, X. "Time Domain formant-wave-function Syntheis." Computer Music 
Journal, 8(3), pp. 9-14, 1984. 
[116] Rodet, X., Potard, Y. and Barriere, J.-B. "The CHANT Project : From 
the Synthesis of the singing voice to synthesis in general." Computer Music 
Journal, 8(3), pp. 15-31, 1984. 
[117] Roza, E. "Recursive Bitstream Conversion: the Reverse Mode." IEEE Trans-
actions on Circuits and Systems - II: Analog and Digital Signal Processing, 
41(5), pp. 329-336, May 1994. 
[118] Sandell, G. J. and Martens, W. L. "Perceptual Evaluation of Principal-
Component-Based Synthesis of Musical Timbres." Journal of the Audio En-
gineering Society, 43(12), pp. 1013-1028, December 1995. 
[119] Sen, S., Capasso, F., Cho, A. and Sivco, D. "Resonant Tunneling Device with 
multiple Negative Differential Resistance : Digital and Signal Processing Ap-
plications with Reduced Circuit Complexity." IEEE Transactions on Electron 
Devices, ED-34(10), pp. 2185-2191, October 1987. 
[120] Shibata, T. and Ohmi, T. "A functional MOS Transistor Featuring Gate-Level 
Weighted Sum and Threshold Operations." IEEE Transactions on Electron 
Devices, 39(6), pp. 1444-1455, June 1992. 
[121] Shibata, T. and Ohmi, T. "Neuron MOS Binary-Logic Integrated Circuits 
- Part I : Design Fundamentals and Soft-Hardware-Logic Circuit Implemen-
tation." IEEE Transactions on Electron Devices, 40(3), pp. 570-576, March 
1993. 
[122] Shibata, T. and Ohmi, T. "Neuron MOS Binary-Logic Integrated Circuits -
Part I I : Simplifying Techniques of Circuit Configuration and their Practical 
Applications." IEEE Transactions on Electron Devices, 40(5), pp. 974-979, 
May 1993. 
[123] Shieh, M.-H. and Lin, H. C. "A Multiple-Dimensional Multiple-State SRAM 
Cell Using Resonant Tunneling Diodes." IEEE Journal of Solid-State Circuits, 
29(5), pp. 623-630, May 1994. 
[124] Smith, J. and Cook, P. "The second-order Digital Waveguide Oscillator." In 
Proceedings of the International Computer Music Conference, pp. 150-153. 
1992. 
R E F E R E N C E S 171 
[125] Smith, J. O. and Angell, J. B. "A Constant-Gain Digital Resonator Tuned 
by a Single Coefficient." Computer Music Journal, 6(4), pp. 36-40, Winter 
1982. 
[126] Smith, I I I , J. O. "Physical Modeling using Digital Waveguides." Computer 
Music Journal, 16(4), pp. 74-91, Winter 1992. 
[127] Snell, J. "Design of a Digital Oscillator That Wi l l Generate upto 256 Low-
Distortion Sine Waves in Real Time." In .Roads, C. and Strawn, J., eds., 
Foundations of Computer Music, pp. 289-325. M I T Press, Cambridge, Mas-
sachusetts, 1985. Appeared in Computer Music Journal 1(2): 4-25, 1977. 
[128] Spanier, J., Johnson, S. and Purvis, A. "Optimisations of the FOF Algorithm 
for VLSI Implementation." In Proceedings of the International Computer 
Music Conference, pp. 493-495. Hong Kong, August 1996. 
[129] Stilson, T. and Smith, J. "Alias-Free Digital Synthesis of Classic Analog 
Waveforms." In Proceedings of the International Computer Music Conference, 
pp. 332-335. Hong Kong, August 1996. 
[130] Sundberg, J. "Synthesis of Singing by Rule." In Mathews, M . and Pierce, J., 
eds., Current Directions in Computer Music Research, pp. 45-55. M I T Press, 
Cambridge, Massachusetts, 1989. 
[131] Takagi, N. , Asada, T. and Yajima, S. "Redundant CORDIC Methods with a 
Constant Scale Factor for Sine and Cosine Computation." IEEE Transactions 
on Computers, 40(9), pp. 989-995, September 1991. 
[132] Taylor, F. "Residue arithmetic: a tutorial with examples." IEEE Computer 
Magazine, 17(5), pp. 50-62, 1984. 
[133] Taylor, F. "On the complex residue arithmetic system ( CRNS )." IEEE 
Transactions On Acoustics, Speech And Signal Processing, ASSP-34(6), 
pp. 1675-1677, 1986. 
[134] Taylor, F. and Huang, C. "A floating-point residue Arithmetic Unit." Journal 
of the Franklin Institute, 311(1), pp. 33-53, January 1981. 
[135] Tempelaars, S. "The VOSIM Signal Spectrum." Interface, 6, pp. 81-96, 1977. 
[136] Tomovic, R. and Karplus, W. J. HIGH SPEED ANALOG COMPUTERS. 
John Wiley & Sons, Inc., 1962. 
R E F E R E N C E S 172 
[137] Vail, M. Vintage Synthesizers : Groundbreaking Instruments and Pioneer-
ing Designers of Electronic Music Synthesizers. Miller Freeman Books, San 
Francisco, 1993. ISBN 0-87930-275-5. (pbk). 
[138] Vercoe, B. "Extended Csound." In Proceedings of the International Computer 
Music Conference, pp. 141-142. Hong Kong, August 1996. 
[139] Vittoz, E. "Analog VLSI Signal Processing : Why, Where and How ?" Journal 
of VLSI Signal Processing, 8(1), pp. 27-44, 1994. 
[140] Voider, J. "The CORDIC Trigonometric Computing Technique." IRE Trans-
actions on Electronic Computers, EC-8(3), pp. 330-334, September 1959. 
[141] Walther, J. "A unified algorithm for elementary functions." In Proceedings 
Of The Spring Joint Computer Conference - AFIPS, pp. 379-385. Atlantic 
City, N.J., U.S.A, May 1971. 
[142] Wang, A. L.-C. and Smith, I I I , J. O. "On fast FIR filters implemented as 
tail-canceling I IR filters." T R Stan-M-90, CCRMA, Dept. of Music, Stan-
ford University, Stanford, CA 94305-8180, November 1994. Obtained from 
f tp: / / ccrma-ftp.Stanford .edu. 
[143] Wawrzynek, J. "VLSI Models for Sound Synthesis." In Mathews, M . and 
Pierce, J., eds., Current Directions in Computer Music Research, pp. 113— 
148. M I T Press, Cambridge, Massachusetts, 1989. 
[144] Wawrzynek, J. and Mead, C. "A VLSI Architecture for Sound Synthesis." 
In Denyer, P. and Renshaw, D., eds., VLSI Signal Processing: a Bit-Serial 
Approach, pp. 277-297. Addison-Wesley Publishing Company, 1985. 
[145] Wayner, P. "Silicon in Reverse." Byte Magazine, 19(8), pp. 67-74, August 
1994. 
[146] Weber, W., Prange, S. J., Thewes, R., Wohlrab, E. and Luck, A. "On the Ap-
plication of the Neuron MOS Transistor Principle for Modern VLSI Design." 
IEEE Transactions on Electron Devices, 43(10), pp. 1700-1708, October 1996. 
[147] Wei, S.-J. and Lin, H. C. "Multivalued SRAM Cell Using Resonant Tunneling 
Diodes." IEEE Journal of Solid-State Circuits, 27(2), pp. 212-216, February 
1992. 
[148] Wilkins, B. Analogue and Iterative methods in Computation, Simulation and 
Control. Chapman and Hall Ltd. , 1970. 
R E F E R E N C E S 173 
[149] Willoughby, M . Design Framework II : Automatic Place and Route. Ruther-
ford Appleton Laboratory, May 1995. 
[150] Woods, R., Floyd, G., Wood, K., Evans, R. and McCanny, J. "Programmable 
high-performance I IR filter chip." IEE Proceedings - Circuits and Systems, 
142(3), pp. 179-185, June 1995. 
[151] Yamashina, M . and Yamada, H. "An MOS Current Mode Logic (MCML) 
Circuit for Low-Power GHz Processors." NEC Research and Development, 
36(1), pp. 54-63, January 1995. 
[152] Yassine, H. and Moore, W. "Improved Mixed-Radix Conversion for Residue 
Number System Architectures." IEE Proceedings - G, 138(1), pp. 120-124, 
February 1991. 
[153] Yuh, J.-D. and Newcomb, R. W. "A Multilevel Neural Network for A / D 
Conversion." IEEE Transactions on Neural Networks, 4(3), pp. 470-483, 
May 1993. 
[154] Zolzer, U. "Roundoff Error Analysis of Digital Filters." Journal of the Audio 
Engineering Society, 42(4), pp. 232-244, Apri l 1994. 
