172 research outputs found
The operation of an Emergency Medical Department in a county hospital from a logistic point of view
The purpose of this article is to give an overview of the actual emergency medical attendance through an exemplary hospital in Hungary, highlighting its possible imperfections which could perhaps be improved through further structural developments. In order to be expressive, the article follows through the journey of two nominal patients who turned up in the emergency department of the hospital. The importance of this topic is expressed by the fitful judgment of the emergency attendance. Emergency service had already existed in the United States, only later then did the one-entrance service system start to develop Hungary. In some places this system has been working well for decades, but for instance at the University of Szeged – due to the uncertain judgment of the system – the construction is just being finalized, right at the time when such studies are published that question the reason of existence of the emergency departments – at least in their actual form
Speech Synthesis from Text and Ultrasound Tongue Image-based Articulatory Input
Articulatory information has been shown to be effective in improving the
performance of HMM-based and DNN-based text-to-speech synthesis. Speech
synthesis research focuses traditionally on text-to-speech conversion, when the
input is text or an estimated linguistic representation, and the target is
synthesized speech. However, a research field that has risen in the last decade
is articulation-to-speech synthesis (with a target application of a Silent
Speech Interface, SSI), when the goal is to synthesize speech from some
representation of the movement of the articulatory organs. In this paper, we
extend traditional (vocoder-based) DNN-TTS with articulatory input, estimated
from ultrasound tongue images. We compare text-only, ultrasound-only, and
combined inputs. Using data from eight speakers, we show that that the combined
text and articulatory input can have advantages in limited-data scenarios,
namely, it may increase the naturalness of synthesized speech compared to
single text input. Besides, we analyze the ultrasound tongue recordings of
several speakers, and show that misalignments in the ultrasound transducer
positioning can have a negative effect on the final synthesis performance.Comment: accepted at SSW11 (11th Speech Synthesis Workshop
Neural Speaker Embeddings for Ultrasound-based Silent Speech Interfaces
Articulatory-to-acoustic mapping seeks to reconstruct speech from a recording
of the articulatory movements, for example, an ultrasound video. Just like
speech signals, these recordings represent not only the linguistic content, but
are also highly specific to the actual speaker. Hence, due to the lack of
multi-speaker data sets, researchers have so far concentrated on
speaker-dependent modeling. Here, we present multi-speaker experiments using
the recently published TaL80 corpus. To model speaker characteristics, we
adjusted the x-vector framework popular in speech processing to operate with
ultrasound tongue videos. Next, we performed speaker recognition experiments
using 50 speakers from the corpus. Then, we created speaker embedding vectors
and evaluated them on the remaining speakers. Finally, we examined how the
embedding vector influences the accuracy of our ultrasound-to-speech conversion
network in a multi-speaker scenario. In the experiments we attained speaker
recognition error rates below 3%, and we also found that the embedding vectors
generalize nicely to unseen speakers. Our first attempt to apply them in a
multi-speaker silent speech framework brought about a marginal reduction in the
error rate of the spectral estimation step.Comment: 5 pages, 3 figures, 3 table
- …