2 research outputs found

    Flexibility of cosine impedance function in 2-D Digital Waveguide Mesh for plosive synthesis

    No full text

    Articulatory-Based English Consonant Synthesis in 2-D Digital Waveguide Mesh

    Get PDF
    In articulatory speech synthesis, the 3-D shape of a vocal tract for a particular speech sound has typically been established, for example, by magnetic resonance imaging (MRI), and this is used to model the acoustic output from the tract using numerical methods that operate in either one, two or three dimensions. The dimensionality strongly affects the overall computation complexity, which has a direct bearing on the quality of the synthesized speech output. The digital waveguide mesh (DWM) is a numerical method commonly used in room acoustic modelling. A smaller space such as a vocal tract, which is about 5 cm wide and 16.5-18 cm long in adults, can also be modelled using DWM in one, two and three dimensions. The latter requires a very dense mesh requiring massive computational resources; these requirements are lessened by using a lower dimensionality (two rather than three) and/or a less dense mesh. The computational cost of 2-D digital waveguide modelling makes it a practical technique for real-time synthesis in an average PC at full (20 kHz) audio bandwidth. This research makes use of a 2-D mesh with the advantage of the availability and flexibility of existing boundary modelling and the raised-cosine impedance control to study the possibilities of using it for English consonant synthesis. The research was organized under the phonetic ‘manner’ classification of English consonants as: semi-vowel, nasal, fricative, plosive and affricate. Their production has been studied in terms of acoustic pressure wave propagation. Meshing topology was fixed to being a 4-port scattering 2-D rectilinear waveguide mesh for ease of understanding and mapping to the tract shape. As the characteristic of consonant production requires vocal tract articulation variations that are quite unlike vowels, this research adopts the articulatory trajectories using electromagnetic (mid-sagittal) articulograph (EMA) data from mngu0 to guide the change of cross-sectional vocal tract area. Generally, articulatory trajectories have been used to improve the accuracy of speech recognition and synthesis in recent decades. This research adopts the 3 trajectories to control coarticulation in consonant synthesis to demonstrate that a 2-D digital waveguide mesh (DWM) is able to simulate the formant transition accurately. The formant transitions in the results are close acoustically to natural speech and are based on controlling articulation for four places of articulation. Positions of lip, tongue tip, tongue body and tongue dorsum are inversely mapped to their corresponding cross-sectional areas. Linear interpolation between them enabled all tract movements to be modelled. The results show that tract movements are best modelled as non-linear coarticulation
    corecore