24,922 research outputs found
Effects of Lombard Reflex on the Performance of Deep-Learning-Based Audio-Visual Speech Enhancement Systems
Humans tend to change their way of speaking when they are immersed in a noisy
environment, a reflex known as Lombard effect. Current speech enhancement
systems based on deep learning do not usually take into account this change in
the speaking style, because they are trained with neutral (non-Lombard) speech
utterances recorded under quiet conditions to which noise is artificially
added. In this paper, we investigate the effects that the Lombard reflex has on
the performance of audio-visual speech enhancement systems based on deep
learning. The results show that a gap in the performance of as much as
approximately 5 dB between the systems trained on neutral speech and the ones
trained on Lombard speech exists. This indicates the benefit of taking into
account the mismatch between neutral and Lombard speech in the design of
audio-visual speech enhancement systems
The listening talker: A review of human and algorithmic context-induced modifications of speech
International audienceSpeech output technology is finding widespread application, including in scenarios where intelligibility might be compromised - at least for some listeners - by adverse conditions. Unlike most current algorithms, talkers continually adapt their speech patterns as a response to the immediate context of spoken communication, where the type of interlocutor and the environment are the dominant situational factors influencing speech production. Observations of talker behaviour can motivate the design of more robust speech output algorithms. Starting with a listener-oriented categorisation of possible goals for speech modification, this review article summarises the extensive set of behavioural findings related to human speech modification, identifies which factors appear to be beneficial, and goes on to examine previous computational attempts to improve intelligibility in noise. The review concludes by tabulating 46 speech modifications, many of which have yet to be perceptually or algorithmically evaluated. Consequently, the review provides a roadmap for future work in improving the robustness of speech output
Multimedia information technology and the annotation of video
The state of the art in multimedia information technology has not progressed to the point where a single solution is available to meet all reasonable needs of documentalists and users of video archives. In general, we do not have an optimistic view of the usability of new technology in this domain, but digitization and digital power can be expected to cause a small revolution in the area of video archiving. The volume of data leads to two views of the future: on the pessimistic side, overload of data will cause lack of annotation capacity, and on the optimistic side, there will be enough data from which to learn selected concepts that can be deployed to support automatic annotation. At the threshold of this interesting era, we make an attempt to describe the state of the art in technology. We sample the progress in text, sound, and image processing, as well as in machine learning
- ā¦