This paper describes a new method for synthesizing
speech by concatenating sub-word units from a
database of labelled speech. A large unit inventory is
created by automatically clustering units of the same
phone class based on their phonetic and prosodic context.
The appropriate cluster is then selected for a target
unit offering a small set of candidate units. An optimal
path is found through the candidate units based on
their distance from the cluster center and an acoustically
based join cost. Details of the method and justification
are presented. The results of experiments using
two different databases are given, optimising various
parameters within the system. Also a comparison
with other existing selection based synthesis techniques
is given showing the advantages this method has over
existing ones. The method is implemented within a full
text-to-speech system offering efficient natural sounding
speech synthesis