13 research outputs found
Ultrasound-Based Silent Speech Interface Built on a Continuous Vocoder
Recently it was shown that within the Silent Speech Interface (SSI) field,
the prediction of F0 is possible from Ultrasound Tongue Images (UTI) as the
articulatory input, using Deep Neural Networks for articulatory-to-acoustic
mapping. Moreover, text-to-speech synthesizers were shown to produce higher
quality speech when using a continuous pitch estimate, which takes non-zero
pitch values even when voicing is not present. Therefore, in this paper on
UTI-based SSI, we use a simple continuous F0 tracker which does not apply a
strict voiced / unvoiced decision. Continuous vocoder parameters (ContF0,
Maximum Voiced Frequency and Mel-Generalized Cepstrum) are predicted using a
convolutional neural network, with UTI as input. The results demonstrate that
during the articulatory-to-acoustic mapping experiments, the continuous F0 is
predicted with lower error, and the continuous vocoder produces slightly more
natural synthesized speech than the baseline vocoder using standard
discontinuous F0.Comment: 5 pages, 3 figures, accepted for publication at Interspeech 201
Adaptation of Tacotron2-based Text-To-Speech for Articulatory-to-Acoustic Mapping using Ultrasound Tongue Imaging
For articulatory-to-acoustic mapping, typically only limited parallel
training data is available, making it impossible to apply fully end-to-end
solutions like Tacotron2. In this paper, we experimented with transfer learning
and adaptation of a Tacotron2 text-to-speech model to improve the final
synthesis quality of ultrasound-based articulatory-to-acoustic mapping with a
limited database. We use a multi-speaker pre-trained Tacotron2 TTS model and a
pre-trained WaveGlow neural vocoder. The articulatory-to-acoustic conversion
contains three steps: 1) from a sequence of ultrasound tongue image recordings,
a 3D convolutional neural network predicts the inputs of the pre-trained
Tacotron2 model, 2) the Tacotron2 model converts this intermediate
representation to an 80-dimensional mel-spectrogram, and 3) the WaveGlow model
is applied for final inference. This generated speech contains the timing of
the original articulatory data from the ultrasound recording, but the F0
contour and the spectral information is predicted by the Tacotron2 model. The
F0 values are independent of the original ultrasound images, but represent the
target speaker, as they are inferred from the pre-trained Tacotron2 model. In
our experiments, we demonstrated that the synthesized speech quality is more
natural with the proposed solutions than with our earlier model.Comment: accepted at SSW11. arXiv admin note: text overlap with
arXiv:2008.0315
ρan-ρan
"With the peristaltic gurglings of this gastēr-investigative procedural – a soooo welcomed addition to the ballooning corpus of slot-versatile bad eggs The Confraternity of Neoflagellants (CoN) – [users] and #influencers everywhere will be belly-joyed to hold hands with neomedieval mutter-matter that literally sticks and branches, available from punctum in both frictionless and grip-gettable boke-shaped formats.
A game-changer in Brownian temp-controlled phoneme capture, ρan-ρan’s writhing paginations are completely oxygen-soaked, overwriting the flavour profiles of 2013’s thN Lng folk 2go with no-holds-barred argumentations on all voice-like and lung-adjacent functions. Rumoured by experts to be dead to the World™, CoN has clearly turned its ear canal arrays towards the jabbering OMFG feedback signals from their scores of naive listeners, scrapping all lenticular exegesis and content profiles to construct taped-together vernacular dwellings housing ‘shrooming atmospheric awarenesses and pan-dimensional cross-talkers, making this anticipatory sequel a serious competitor across ambient markets, and a crowded kitchen in its own right.
An utterly mondegreen-infested deep end may deter would-be study buddies from taking the plunge, but feet-wetted Dog Heads eager to sniff around for temporal folds and whiff past the stank of hastily proscribed future fogs ought to ©k no further than the roll-upable-rim of ρan-ρan’s bleeeeeding premodern lagoon. Arrange yerself cannonball-wise or lead with the #gut and you’ll be kersplashing in no times.
Play Among Books
How does coding change the way we think about architecture? Miro Roman and his AI Alice_ch3n81 develop a playful scenario in which they propose coding as the new literacy of information. They convey knowledge in the form of a project model that links the fields of architecture and information through two interwoven narrative strands in an “infinite flow” of real books
Play Among Books
How does coding change the way we think about architecture? Miro Roman and his AI Alice_ch3n81 develop a playful scenario in which they propose coding as the new literacy of information. They convey knowledge in the form of a project model that links the fields of architecture and information through two interwoven narrative strands in an “infinite flow” of real books