13 research outputs found

    Ultrasound-Based Silent Speech Interface Built on a Continuous Vocoder

    Get PDF
    Recently it was shown that within the Silent Speech Interface (SSI) field, the prediction of F0 is possible from Ultrasound Tongue Images (UTI) as the articulatory input, using Deep Neural Networks for articulatory-to-acoustic mapping. Moreover, text-to-speech synthesizers were shown to produce higher quality speech when using a continuous pitch estimate, which takes non-zero pitch values even when voicing is not present. Therefore, in this paper on UTI-based SSI, we use a simple continuous F0 tracker which does not apply a strict voiced / unvoiced decision. Continuous vocoder parameters (ContF0, Maximum Voiced Frequency and Mel-Generalized Cepstrum) are predicted using a convolutional neural network, with UTI as input. The results demonstrate that during the articulatory-to-acoustic mapping experiments, the continuous F0 is predicted with lower error, and the continuous vocoder produces slightly more natural synthesized speech than the baseline vocoder using standard discontinuous F0.Comment: 5 pages, 3 figures, accepted for publication at Interspeech 201

    Adaptation of Tacotron2-based Text-To-Speech for Articulatory-to-Acoustic Mapping using Ultrasound Tongue Imaging

    Get PDF
    For articulatory-to-acoustic mapping, typically only limited parallel training data is available, making it impossible to apply fully end-to-end solutions like Tacotron2. In this paper, we experimented with transfer learning and adaptation of a Tacotron2 text-to-speech model to improve the final synthesis quality of ultrasound-based articulatory-to-acoustic mapping with a limited database. We use a multi-speaker pre-trained Tacotron2 TTS model and a pre-trained WaveGlow neural vocoder. The articulatory-to-acoustic conversion contains three steps: 1) from a sequence of ultrasound tongue image recordings, a 3D convolutional neural network predicts the inputs of the pre-trained Tacotron2 model, 2) the Tacotron2 model converts this intermediate representation to an 80-dimensional mel-spectrogram, and 3) the WaveGlow model is applied for final inference. This generated speech contains the timing of the original articulatory data from the ultrasound recording, but the F0 contour and the spectral information is predicted by the Tacotron2 model. The F0 values are independent of the original ultrasound images, but represent the target speaker, as they are inferred from the pre-trained Tacotron2 model. In our experiments, we demonstrated that the synthesized speech quality is more natural with the proposed solutions than with our earlier model.Comment: accepted at SSW11. arXiv admin note: text overlap with arXiv:2008.0315

    ρan-ρan

    Get PDF
    "With the peristaltic gurglings of this gastēr-investigative procedural – a soooo welcomed addition to the ballooning corpus of slot-versatile bad eggs The Confraternity of Neoflagellants (CoN) – [users] and #influencers everywhere will be belly-joyed to hold hands with neomedieval mutter-matter that literally sticks and branches, available from punctum in both frictionless and grip-gettable boke-shaped formats. A game-changer in Brownian temp-controlled phoneme capture, ρan-ρan’s writhing paginations are completely oxygen-soaked, overwriting the flavour profiles of 2013’s thN Lng folk 2go with no-holds-barred argumentations on all voice-like and lung-adjacent functions. Rumoured by experts to be dead to the World™, CoN has clearly turned its ear canal arrays towards the jabbering OMFG feedback signals from their scores of naive listeners, scrapping all lenticular exegesis and content profiles to construct taped-together vernacular dwellings housing ‘shrooming atmospheric awarenesses and pan-dimensional cross-talkers, making this anticipatory sequel a serious competitor across ambient markets, and a crowded kitchen in its own right. An utterly mondegreen-infested deep end may deter would-be study buddies from taking the plunge, but feet-wetted Dog Heads eager to sniff around for temporal folds and whiff past the stank of hastily proscribed future fogs ought to ©k no further than the roll-upable-rim of ρan-ρan’s bleeeeeding premodern lagoon. Arrange yerself cannonball-wise or lead with the #gut and you’ll be kersplashing in no times.

    Play Among Books

    Get PDF
    How does coding change the way we think about architecture? Miro Roman and his AI Alice_ch3n81 develop a playful scenario in which they propose coding as the new literacy of information. They convey knowledge in the form of a project model that links the fields of architecture and information through two interwoven narrative strands in an “infinite flow” of real books

    Play Among Books

    Get PDF
    How does coding change the way we think about architecture? Miro Roman and his AI Alice_ch3n81 develop a playful scenario in which they propose coding as the new literacy of information. They convey knowledge in the form of a project model that links the fields of architecture and information through two interwoven narrative strands in an “infinite flow” of real books
    corecore