43 research outputs found

    Review of Noise Reduction Techniques in Speech Processing

    Get PDF
    Present systems advances in speech processing systems aim at providing sturdy and reliable interfaces for sensible preparation. Achieving sturdy performance of those systems in adverse and screeching environments is one in every of the most important challenges in applications like dictation, voice-controlled devices, human-computer dialog systems and navigation systems. Performance of speech recognition systems powerfully degrades within the presence of background, just like the driving noise within a automobile. In distinction to existing works, we have a tendency to reduce the boost in noise strength that present in levels of speech recognition: feature extraction, feature improvement, speech modelling, and coaching. Thereby, we offer a summary of noise modelling ideas, speech improvement techniques, coaching ways, and model design, that square measure enforced in speech orthography recognition task considering noises created by numerous conditions. DOI: 10.17762/ijritcc2321-8169.15075

    Bimodal Fusion in Audio-Visual Speech Recognition

    Get PDF
    Extending automatic speech recognition (ASR) to the visual modality has been shown to greatly increase recognition accuracy and improve system robustness over purely acoustic systems. especially in acoustically hostile environments. An important aspect of designing such systems is how to incorporate the visual component Into the acoustic speech recognizer to achieve optimal performance. In this paper, we investigate methods of Integrating the audio and visual modalities within HMM-based classification models. We examine existing integration schemes and propose the use of a coupled hidden Markov model (CHMM) to exploit audio-visual interaction. Our experimental results demonstrate that the CHMM consistently outperforms other integration models for a large range of acoustic noise levels and suggest that it better captures temporal correlations between the two streams of information

    Learning-based auditory encoding for robust speech recognition

    Full text link
    This paper describes ways of speeding up the optimization process for learning physiologically-motivated components of a feature computation module directly from data. During training, word lattices generated by the speech decoder and conjugate gradient descent were included to train the parameters of logistic functions in a fashion that maximizes the a posteriori probability of the correct class in the training data. These functions represent the rate-level nonlinearities found in most mammalian auditory systems. Experiments conducted using the CMU SPHINX-III system on the DARPA Resource Management and Wall Street Journal tasks show that the use of discriminative training to estimate the shape of the rate-level nonlinearity provides better recognition accuracy in the presence of background noise than traditional procedures which do not employ learning. More importantly, the inclusion of conjugate gradient descent optimization and a word lattice to reduce the number of hypotheses considered greatly increases the training speed, which makes training with much more complicated models possible. Index Terms — automatic speech recognition, discriminative training, auditory models, data analysis 1

    Evaluation of PNCC and extended spectral subtraction methods for robust speech recognition

    Get PDF
    International audienceThis paper evaluates the robustness of different approaches for speech recognition with respect to signal-to-noise ratio (SNR), to signal level and to presence of non-speech data before and after utterances to be recognized. Three types of noise robust features are considered: Power Normalized Cepstral Coefficients (PNCC), Mel-Frequency Cepstral Coefficients (MFCC) after applying an extended spectral subtraction method, and Sphinx embedded denoising features from recent sphinx versions. Although removing C0 in MFCC-based features leads to a slight decrease in speech recognition performance, it makes the speech recognition system independent on the speech signal level. With multi-condition training, the three sets of noise-robust features lead to a rather similar behavior of performance with respect to SNR and presence of non-speech data. Overall, best performance is achieved with the extended spectral subtraction approach. Also, the performance of the PNCC features appears to be dependent on the initialization of the normalization factor

    Learnable Nonlinear Compression for Robust Speaker Verification

    Get PDF
    International audienceIn this study, we focus on nonlinear compression methods in spectral features for speaker verification based on deep neural network. We consider different kinds of channel-dependent (CD) nonlinear compression methods optimized in a data-driven manner. Our methods are based on power nonlinearities and dynamic range compression (DRC). We also propose multi-regime (MR) design on the nonlinearities, at improving robustness. Results on VoxCeleb1 and Vox-Movies data demonstrate improvements brought by proposed compression methods over both the commonly-used logarithm and their static counterparts, especially for ones based on power function. While CD generalization improves performance on VoxCeleb1, MR provides more robustness on VoxMovies, with a maximum relative equal error rate reduction of 21.6%

    Diarization for the annotation of legal videos

    Get PDF
    In this paper we analyze legal hearing recordings generated at the Spanish Civil Law Courts, and we present a tool for annotation and navigation across these records. This tool is based on data recovering the legal structure of hearings. To grasp this structure automatically, we apply and compare different audio diarization algorithms to obtain the temporal boundaries of the speakers and their tracking across the hearing. Previous work on legal data will help us to apply diarization techniques into web services platforms (Ontomedia)

    Learning-Based Auditory Encoding for Robust Speech Recognition

    Full text link
    corecore