Search CORE

664 research outputs found

Source and Filter Estimation for Throat-Microphone Speech Enhancement

Author: Erzin E.
Sr.
Turan M.A.T.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

In this paper, we propose a new statistical enhancement system for throat microphone recordings through source and filter separation. Throat microphones (TM) are skin-attached piezoelectric sensors that can capture speech sound signals in the form of tissue vibrations. Due to their limited bandwidth, TM recorded speech suffers from intelligibility and naturalness. In this paper, we investigate learning phone-dependent Gaussian mixture model (GMM)-based statistical mappings using parallel recordings of acoustic microphone (AM) and TM for enhancement of the spectral envelope and excitation signals of the TM speech. The proposed mappings address the phone-dependent variability of tissue conduction with TM recordings. While the spectral envelope mapping estimates the line spectral frequency (LSF) representation of AM from TM recordings, the excitation mapping is constructed based on the spectral energy difference (SED) of AM and TM excitation signals. The excitation enhancement is modeled as an estimation of the SED features from the TM signal. The proposed enhancement system is evaluated using both objective and subjective tests. Objective evaluations are performed with the log-spectral distortion (LSD), the wideband perceptual evaluation of speech quality (PESQ) and mean-squared error (MSE) metrics. Subjective evaluations are performed with an A/B comparison test. Experimental results indicate that the proposed phone-dependent mappings exhibit enhancements over phone-independent mappings. Furthermore enhancement of the TM excitation through statistical mappings of the SED features introduces significant objective and subjective performance improvements to the enhancement of TM recordings. ©2015 IEEE

Robust Voice Liveness Detection and Speaker Verification Using Throat Microphones

Author: Hautamaki Rosa Gonzalez
Kinnunen Tomi
Parts Robert
Pitkänen Martti
Sahidullah Md
Tan Zheng-Hua
Thomsen Dennis Alexander Lehmann
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

VBN

Silent-speech enhancement using body-conducted vocal-tract resonance signals

Author: Hirahara Tatsuya
Nakajima Yoshitaka
Nakamura Keigo
Otani Makoto
Shikano Kiyohiro
Shimizu Shota
Toda Tomoki
Publication venue: 'Elsevier BV'
Publication date: 01/04/2010
Field of study

The physical characteristics of weak body-conducted vocal-tract resonance signals called non-audible murmur (NAM) and the acoustic characteristics of three sensors developed for detecting these signals have been investigated. NAM signals attenuate 50 dB at 1 kHz; this attenuation consists of 30-dB full-range attenuation due to air-to-body transmission loss and 10 dB/octave spectral decay due to a sound propagation loss within the body. These characteristics agree with the spectral characteristics of measured NAM signals. The sensors have a sensitivity of between 41 and 58 dB [V/Pa] at I kHz, and the mean signal-to-noise ratio of the detected signals was 15 dB. On the basis of these investigations, three types of silent-speech enhancement systems were developed: (1) simple, direct amplification of weak vocal-tract resonance signals using a wired urethane-elastomer NAM microphone, (2) simple, direct amplification using a wireless urethane-elastomer-duplex NAM microphone, and (3) transformation of the weak vocal-tract resonance signals sensed by a soft-silicone NAM microphone into whispered speech using statistical conversion. Field testing of the systems showed that they enable voice impaired people to communicate verbally using body-conducted vocal-tract resonance signals. Listening tests demonstrated that weak body-conducted vocal-tract resonance sounds can be transformed into intelligible whispered speech sounds. Using these systems, people with voice impairments can re-acquire speech communication with less effort. (C) 2009 Elsevier B.V. All rights reserved.ArticleSPEECH COMMUNICATION. 52(4):301-313 (2010)journal articl

Shinshu University Institutional Repository

Blind identification of acoustic systems and enhancement of reverberant speech

Author: Gaubitch Nikolay Dian
Gaubitch Nikolay Dian
Publication venue
Publication date: 01/01/2007
Field of study

Imperial Users onl

Spiral - Imperial College Digital Repository

A Novel Radar Sensor for the Non-Contact Detection of Speech Signals

Author: Achermann
Akargün
Brown
Denby
Ding
Droitcour
Erzin
Goldstein
Gu
Guohua Lu
Hanamitsu
Hemmerling
Holzrichter
Jianqi Wang
Kempka
Kubel
Kwon
Lawrence
Li
Lin
Lu
Mingke Jiao
Plant
Quatieri
Ramm
Salza
Shen
Sheng Li
Stagray
van der Donk
Viswanathan
Wang
Weik
Wu
Xijing Jing
Yanfeng Li
Yu
Zha
Publication venue: Molecular Diversity Preservation International (MDPI)
Publication date: 01/05/2010
Field of study

Different speech detection sensors have been developed over the years but they are limited by the loss of high frequency speech energy, and have restricted non-contact detection due to the lack of penetrability. This paper proposes a novel millimeter microwave radar sensor to detect speech signals. The utilization of a high operating frequency and a superheterodyne receiver contributes to the high sensitivity of the radar sensor for small sound vibrations. In addition, the penetrability of microwaves allows the novel sensor to detect speech signals through nonmetal barriers. Results show that the novel sensor can detect high frequency speech energies and that the speech quality is comparable to traditional microphone speech. Moreover, the novel sensor can detect speech signals through a nonmetal material of a certain thickness between the sensor and the subject. Thus, the novel speech sensor expands traditional speech detection techniques and provides an exciting alternative for broader application prospects

Directory of Open Access Journals

Configurable EBEN: Extreme Bandwidth Extension Network to enhance body-conducted speech capture

Author: Bavu Éric
Hauret Julien
Joubaud Thomas
Zimpfer Véronique
Publication venue
Publication date: 12/09/2023
Field of study

This paper presents a configurable version of Extreme Bandwidth Extension Network (EBEN), a Generative Adversarial Network (GAN) designed to improve audio captured with body-conduction microphones. We show that although these microphones significantly reduce environmental noise, this insensitivity to ambient noise happens at the expense of the bandwidth of the speech signal acquired by the wearer of the devices. The obtained captured signals therefore require the use of signal enhancement techniques to recover the full-bandwidth speech. EBEN leverages a configurable multiband decomposition of the raw captured signal. This decomposition allows the data time domain dimensions to be reduced and the full band signal to be better controlled. The multiband representation of the captured signal is processed through a U-Net-like model, which combines feature and adversarial losses to generate an enhanced speech signal. We also benefit from this original representation in the proposed configurable discriminators architecture. The configurable EBEN approach can achieve state-of-the-art enhancement results on synthetic data with a lightweight generator that allows real-time processing.Comment: Accepted in IEEE/ACM Transactions on Audio, Speech and Language Processing on 14/08/202

arXiv.org e-Print Archive

Improvement of Sound Quality on the Body Conducted Speech Using Differential Acceleration

Author: Masashi Nakayama
Seiji Nakagawa
Shunsuke Ishimitsu
Publication venue: 'IntechOpen'
Publication date: 13/06/2011
Field of study

Adaptation for Soft Whisper Recognition

Author: Jou Szu-Chen (Stan)
Schultz Tanja
Waibel Alex
Publication venue
Publication date: 16/06/2008
Field of study

Glottal-synchronous speech processing

Author: Thomas Mark R P
Thomas Mark R P
Publication venue: Electrical and Electronic Engineering, Imperial College London
Publication date: 01/01/2010
Field of study

Glottal-synchronous speech processing is a field of speech science where the pseudoperiodicity of voiced speech is exploited. Traditionally, speech processing involves segmenting and processing short speech frames of predefined length; this may fail to exploit the inherent periodic structure of voiced speech which glottal-synchronous speech frames have the potential to harness. Glottal-synchronous frames are often derived from the glottal closure instants (GCIs) and glottal opening instants (GOIs). The SIGMA algorithm was developed for the detection of GCIs and GOIs from the Electroglottograph signal with a measured accuracy of up to 99.59%. For GCI and GOI detection from speech signals, the YAGA algorithm provides a measured accuracy of up to 99.84%. Multichannel speech-based approaches are shown to be more robust to reverberation than single-channel algorithms. The GCIs are applied to real-world applications including speech dereverberation, where SNR is improved by up to 5 dB, and to prosodic manipulation where the importance of voicing detection in glottal-synchronous algorithms is demonstrated by subjective testing. The GCIs are further exploited in a new area of data-driven speech modelling, providing new insights into speech production and a set of tools to aid deployment into real-world applications. The technique is shown to be applicable in areas of speech coding, identification and artificial bandwidth extension of telephone speec

Spiral - Imperial College Digital Repository