2,849 research outputs found
Real-time dynamic articulations in the 2-D waveguide mesh vocal tract model
Time domain articulatory vocal tract modeling in one-dimensional (1-D) is well established. Previous studies into two-dimensional (2-D) simulation of wave propagation in the vocal tract have shown it to present accurate static vowel synthesis. However, little has been done to demonstrate how such a model might accommodate the dynamic tract shape changes necessary in modeling speech. Two methods of applying the area function to the 2-D digital waveguide mesh vocal tract model are presented here. First, a method based on mapping the cross-sectional area onto the number of waveguides across the mesh, termed a widthwise mapping approach is detailed. Discontinuity problems associated with the dynamic manipulation of the model are highlighted. Second, a new method is examined that uses a static-shaped rectangular mesh with the area function translated into an impedance map which is then applied to each waveguide. Two approaches for constructing such a map are demonstrated; one using a linear impedance increase to model a constriction to the tract and another using a raised cosine function. Recommendations are made towards the use of the cosine method as it allows for a wider central propagational channel. It is also shown that this impedance mapping approach allows for stable dynamic shape changes and also permits a reduction in sampling frequency leading to real-time interaction with the model
Wavenet based low rate speech coding
Traditional parametric coding of speech facilitates low rate but provides
poor reconstruction quality because of the inadequacy of the model used. We
describe how a WaveNet generative speech model can be used to generate high
quality speech from the bit stream of a standard parametric coder operating at
2.4 kb/s. We compare this parametric coder with a waveform coder based on the
same generative model and show that approximating the signal waveform incurs a
large rate penalty. Our experiments confirm the high performance of the WaveNet
based coder and show that the speech produced by the system is able to
additionally perform implicit bandwidth extension and does not significantly
impair recognition of the original speaker for the human listener, even when
that speaker has not been used during the training of the generative model.Comment: 5 pages, 2 figure
Singing synthesis with an evolved physical model
A two-dimensional physical model of the human vocal tract is described. Such a system promises increased realism and control in the synthesis. of both speech and singing. However, the parameters describing the shape of the vocal tract while in use are not easily obtained, even using medical imaging techniques, so instead a genetic algorithm (GA) is applied to the model to find an appropriate configuration. Realistic sounds are produced by this method. Analysis of these, and the reliability of the technique (convergence properties) is provided
- …