471 research outputs found

    Source and Filter Estimation for Throat-Microphone Speech Enhancement

    Get PDF
    In this paper, we propose a new statistical enhancement system for throat microphone recordings through source and filter separation. Throat microphones (TM) are skin-attached piezoelectric sensors that can capture speech sound signals in the form of tissue vibrations. Due to their limited bandwidth, TM recorded speech suffers from intelligibility and naturalness. In this paper, we investigate learning phone-dependent Gaussian mixture model (GMM)-based statistical mappings using parallel recordings of acoustic microphone (AM) and TM for enhancement of the spectral envelope and excitation signals of the TM speech. The proposed mappings address the phone-dependent variability of tissue conduction with TM recordings. While the spectral envelope mapping estimates the line spectral frequency (LSF) representation of AM from TM recordings, the excitation mapping is constructed based on the spectral energy difference (SED) of AM and TM excitation signals. The excitation enhancement is modeled as an estimation of the SED features from the TM signal. The proposed enhancement system is evaluated using both objective and subjective tests. Objective evaluations are performed with the log-spectral distortion (LSD), the wideband perceptual evaluation of speech quality (PESQ) and mean-squared error (MSE) metrics. Subjective evaluations are performed with an A/B comparison test. Experimental results indicate that the proposed phone-dependent mappings exhibit enhancements over phone-independent mappings. Furthermore enhancement of the TM excitation through statistical mappings of the SED features introduces significant objective and subjective performance improvements to the enhancement of TM recordings. ©2015 IEEE

    Fir filter for makhraj recognition system

    Get PDF
    Audio and speech processing systems have steadily risen in importance in the everyday of most people in developed countries. Speech recognition is the process of converting an acoustic signal, captured by a microphone to a set of words. Recognition is generally more difficult when vocabularies are larger or have many similar-sounding words. There are some external parameters that can effects speech recognition system performance, including the characteristics of the environmental noise and the type and also the placement of the microphone. A particular objective of the invention is to recognize the correct makhraj pronounce for the recognition analysis using pre-processing data base Matlab. In this project, speech processing for makhraj recognition is built using Finite Impulse Response (FIR) filter. The speech that was collects all of data from respondent. It requires the simultaneously recording of the speech wave as many parameters as possible. Then, get the correct makhraj pronounce example (alif), (ba), (ta), (tsa), (jim), (ha) and others. After that, the project will built using Matlab softwar

    Towards a Multimodal Silent Speech Interface for European Portuguese

    Get PDF
    Automatic Speech Recognition (ASR) in the presence of environmental noise is still a hard problem to tackle in speech science (Ng et al., 2000). Another problem well described in the literature is the one concerned with elderly speech production. Studies (Helfrich, 1979) have shown evidence of a slower speech rate, more breaks, more speech errors and a humbled volume of speech, when comparing elderly with teenagers or adults speech, on an acoustic level. This fact makes elderly speech hard to recognize, using currently available stochastic based ASR technology. To tackle these two problems in the context of ASR for HumanComputer Interaction, a novel Silent Speech Interface (SSI) in European Portuguese (EP) is envisioned.info:eu-repo/semantics/acceptedVersio

    EMG-to-Speech: Direct Generation of Speech from Facial Electromyographic Signals

    Get PDF
    The general objective of this work is the design, implementation, improvement and evaluation of a system that uses surface electromyographic (EMG) signals and directly synthesizes an audible speech output: EMG-to-speech
    • …
    corecore