2 research outputs found
Design and Optimization of a Speech Recognition Front-End for Distant-Talking Control of a Music Playback Device
This paper addresses the challenging scenario for the distant-talking control
of a music playback device, a common portable speaker with four small
loudspeakers in close proximity to one microphone. The user controls the device
through voice, where the speech-to-music ratio can be as low as -30 dB during
music playback. We propose a speech enhancement front-end that relies on known
robust methods for echo cancellation, double-talk detection, and noise
suppression, as well as a novel adaptive quasi-binary mask that is well suited
for speech recognition. The optimization of the system is then formulated as a
large scale nonlinear programming problem where the recognition rate is
maximized and the optimal values for the system parameters are found through a
genetic algorithm. We validate our methodology by testing over the TIMIT
database for different music playback levels and noise types. Finally, we show
that the proposed front-end allows a natural interaction with the device for
limited-vocabulary voice commands
Trends and Perspectives for Signal Processing in Consumer Audio
The trend in media consumption towards streaming and portability offers new
challenges and opportunities for signal processing in audio and acoustics. The
most significant embodiment of this trend is that most music consumption now
happens on-the-go which has recently led to an explosion in headphone sales and
small portable speakers. In particular, premium headphones offer a gateway for
a younger generation to experience high quality sound. Additionally, through
technologies incorporating head-related transfer functions headphones can also
offer unique new experiences in gaming, augmented reality, and surround sound
listening. Home audio has also seen a transition to smaller sound systems in
the form of sound bars. This speaker configuration offers many exciting
challenges for surround sound reproduction which has traditionally used five
speakers surrounding the listener. Furthermore, modern home entertainment
systems offer more than just content delivery; users now expect wireless and
connected smart devices with video conferencing, gaming, and other interactive
capabilities. With this comes challenges for voice interaction at a distance
and in demanding conditions, e.g., during content playback, and opportunities
for new smart interactive experiences based on awareness of environment and
user biometrics.Comment: IEEE Audio and Acoustic Signal Processing Technical Committee
Newsletter, May 201