Speech Synthesis Project by Meltzer, David
Speech Synthesis ProJect* 
David Meltzer 
*Sponsored. in part by the J:latio.oal Science Foundation through Grant 
Grl-534.1 from the Office ot' Science Information Service to the 
Computer and Information Science Research Center, The Ohio State 
University. 
- 59 -
Speech Synthesis Project 
David ,foltzer 
The purpose of this project has been to attach a terminal analog 
speech synthesizer of the Gle.c~-Holmes type (JAWORD) to the PDP-10 
computer to a.llow on-line· speech synthesis in an interactive em•i:ron-
ment. 
The JAWORD synthesizer is an advanced terminal analog type 
with 7 resonators and two forms of excitation (see Figure 1). The 
frequency and amplitude of each resonator are controlled by an exter-
nally supplied voltage of from Oto +3 volts. Of the 14 parameters 
thus needed, only 10 are used at any one time; ~hich of these lb 
are used is deter~ined by an 11th parameter which also controls 
which form of excitation is applied to the resonators (noise or pulse 
excitation). The important characteristics of this synthesizer from 
the viewpoint of interfacing it to a computer are: 
1. Eleven analog volta~es must be supplied simultaneously 
to the synthesizer. 
2, A new set of voltages must be available every 10 ~illi-
seconds 1 i.e., the control voltages chanf;e at a ·..re'!'y slow rate. 
Previous experience had shovn tr.at the normal confiviration of 
JAWORD did not allow sufficient control of the time of occurrence 
of the fundw:iental excitation pulses. The synthesizer was therefore 
modified so that the pulse could be supplied from an extern~l source, 
in this ~ase the computer. 
The configuration used to control the srnthesizer is shown in 
Figure 2. The digital control information generated by the control 
program is fed to the synthesizer controller via the PDP-10 Input-
Output Bus. This controller, designed and built at Ohio State 
University, converts this information into the required analog 
voltages and supplies them to the synthesizer. The controller (see· 
Figure 3a) uses a.n analog m~ltiplexor feeding a series of capacitive 
hold circuits to generate the 9 continuously •rariable outputs from 
the output of one Digital-to-Ana.log Converter (DAC). The control 
vord (Figure 3b) includes the information to set the DAC as vell 
as the address of the hold circuit which will receive the value. 
Although considerably slower than a one DAC pre-channel system, its 
response is adequate for this a~plication. Additional bits are 
provided in the control word for control functions, including a bit 
for the fundamental excitation pulse (l=pulse, O=no pulse). 
T'ne logic of .the controller is irnplenented in Digital Equipment 
Corporation B-series discrete component logic, the same family as 
used in the PDP-10 central proceszing unit (CPU}. This family was 
chosen for reasons of ease of interfacin~ to the CPU. The DAC is a 
- 60 -
- Cl -
high prec1.s1on 10-bit unit also made b:r Dip,ital Eq_uipment Corporation. 
The analog hold and amplifier circuits as vell as miscellaneous 
level conversion circuits were desie,ned and built at Ohio State 
University. The hardware is completely operational on-line with 
the C?U. 
The next ma.jar ele.'nent in the s;tstem is the control prop;ram 
for the synthesizer. The prosra.~ is the basic interface bet~een the 
user and t~e synthesizer. This control progra~ operates as a privileged 
user job •,;i thin the time-sharing environment so that other jocs n:a.;r 
continue to use the system during speech synthesis jobs. T1:ere is a 
definite random degradation in output speech quali t:r' due to the 
presence of other e.ctiYity in the syster.: during synthesis, ·cut this 
is not n pro·olem for trial 3ynthesis runs. A future revision of the 
control program is planned which will optionally turn off the time-
sharing for the duration of synthesis output and then return to time-
sharing mode. The actual output is of short enough duration so that 
this will not cause appreciable response time de~radation. 
7he control program as no,-r in operation performs several 
functions necessary for effective interaction with the experirr.enter. 
The primary level controls the operation of the I/o bus and 
appropriately sequences the outputting of control ·.,ords to the 
synthes.:zer controller. These control ·,:ords are gene:-ated by the 
next leYel of program, the conversion routine which translates from 
a standardized code (Carlson, 19G9) to the appropriate channel 
addresses and DAC level, adds offset and calibration data and ?ackn 
the words into a ta.ole in CPU core memory. This table becomes the 
input to the first program. 
The basic interface to the experimenter is one of the Teletype 
consoles attached to the CPU. It is anticipated that tnis function 
can be ta."li;.en over by the CRT dis!-]lay at some future date but the 
lack of adequate system:; prograimning support and. ha.rd·~·ru-e character 
eenerator feature on tte disr,lay makes t.hi~ a difficult task, 
Synthesis of a speech sample using t:-ie current program involves 
the following steps, 
1. Typine: in the sample in coded form and in-core 
editing it to the experimenter's satisfaction. The r,rogrrun provides 
for selective display and alteration of portions of the para.meter 
table. 
2. Outputting the speech sa.mple and then making needed. 
corrections. 
3, Dumpinr, the table in coded form on paper-tan.e for 
later use. The paper tape so prepared may be used a.s innut in 
~lace of step 1. 
The speech output may also be recorded on~ built-in tane 
recorder. Future progra~ versions will allo~ dumpin~ the output 
on DEC tape so that r:.any words or phrases rna:r be kept on one ta.ne. 
At present, ho·..ever, there are not enoup:h DEC tane drwns available 
to allow each user of the system to have one tane and still let the 
synthesis job have tvo, one for program storage and one for sneech 
s ar.,:ple storage. 
The r,roF-:ra.m as described above with paper-tape store~e is 
fully opErational. Zvolutionar~/ chan-,es are being n:e.de to inprove 
- 62 -
the output quality as experience with the synthesizer grows. 
Synthesis parameter tables distributed through SOUGHS (Society 
of Users, Glace-Holmes Synthesizer) are being used to gain 
experience with the system. 
Reference 
Carlson, W. A. 1969, "On the Establishment of a Stannard Format 
for Exchange of Data for the Synthesizer t~~ong the Users," 
{private communication, distributed through SOUGHS). 
Figure l. jAWORD SPEECH SY!1TP...ES1Zr1R ( Sl11PLIFIED} -. 
FREQOElTCY -.:.....~-........:::,,.1  
Fl 
AHPLITUDE -------"  
SPEECH 
OUTPUT 
FREQUENCY ----~.,,. 
FK2 
A!,~L!TUDE --1----
1  
I
'· . 1
~--------- I
EXTER!{AL 
COtiTROL VOLTAGES 
lIOISE  
SOURCE  
SWITCH 
EXCITATIO?Ij_J~'--
PULSE 
SOURCE 
I 
- 64 -
Figure 2. SYSTEM CONFIGURATION 
DEC INPUT-OUTPUT BUS 
PDP-10 
CEUTP...~L 1-------.......----------, SIZER 
PROCESSING 
SYNTHE-
CONTROLLER 
SPEECH 
SYUTHESIZE:i 
l 
I 
I 
I . 1 
UNIT 
S?EECH 
OUTPUT 
DEC-
' 
TAPES
l 
I ' 
II CRT 
DISPLAY 
TELE-
TYPE 
CONTROL 
TTY  
CO!'ISOLES  
Figure 3, §!NTHESIZER conTROLLEfl 
3a., £E.9EtionaJ. Diat;!"am 
18 ~ 
!/0 
BUS 
IN?UT 
35 
. I ANATl)G 
F'lh'iDAHENTAL HOLD 
&I~EXCITATIOI:r 
PULSE 
I 
10 BIT 
DIGITAL10 -TO- BUFF.ER 
A:·fJlLOG 
CO!'fVERTER 
4 DECODER 
I 
T 
.l\J-1?f.,JFIER 
~ 
t=:i  
E,.;  
t/)  
H 
0r:,:: 
~ 
A 
6 
~ 
...:i 
0 
Ct: 
I;-< 
~ 
0 u 
AUftLOG 
HOLD 
& ~ 
AMPLIFIER 
j
I/0 nus CONTROL 
SIGliALS AifALOG CONTROL
ii SWITCHES VOLTAGi'.:S 
{;ONTROL TO 
10 KHZ LOGIC 3YNTll.E~I7..E 
CLOCK 
3b. Control Word Format 
18 19 20 29 30 31 32 35 ~ BIT POSITIOH IN nrr- DAC SWITCH SCAN MAIH MEMORY 
j_[J__,,_AL_UE________,_c_o_N_TR_oL_....___AD_n_R_E:-'.,·l_s ____ 
