10 research outputs found

    Environmental Noise Reduction based on Deep Denoising Autoencoder

    Get PDF
    Speech enhancement plays an important role in Automatic Speech Recognition (ASR) even though this task remains challenging in real-world scenarios of human-level performance. To cope with this challenge, an explicit denoising framework called Deep Denoising Autoencoder (DDAE) is introduced in this paper. The parameters of DDAE encoder and decoder are optimized based on the backpropagation criterion, where all denoising autoencoders are stacked up instead of recurrent connections. For better speech estimation in real and noisy environments, we include matched and mismatched noisy and clean pairs of speech data to train the DDAE. The DDAE has the ability to achieve optimal results even for a limited amount of training data. Our experimental results show that the proposed DDAE outperformed the baseline algorithms. The DDAE shows superior performances based on three-evaluation metrics in noisy and clean pairs of speech data compared to three baseline algorithms

    On the application of minimum noise tracking to cancel cosine shaped residual noise

    Get PDF
    It has been shown recently that for coherence based dual microphone array speech enhancement systems, cross-spectral subtraction is an efficient technique aimed to reduce the correlated noise components. The zero-phase filtering criterion employed in these methods is derived from the standard coherence function that is modified to incorporate the noise cross power spectrum between the two channels. However, there has been limited success at applying coherence based filters when speech processing is carried out under relatively harsh acoustic conditions (SNR below -5dB) or when the speech and noise sources are closely spaced. We propose an alternative method that is effective, and that attempts to use a phase-based filtering criterion by substituting the cross power spectrum of the noisy signals received on the two channels by its real part. Then, a variant of the running minimum noise tracking procedure is applied on the estimated speech spectrum as an adaptive postfiltering to reduce the cosine shaped power spectrum of the remaining residual musical noise to a minimum spectral floor. Using that adaptive postfilter, a softdecision scheme is implemented to control the amount of noise suppression. Our preliminary results based on experiments conducted on real speech signals show an improved performance of the proposed method over the coherence based approaches. These results also show that it performs well on speech while producing less spectral distortion even in severe noisy conditions

    Non-intrusive speech quality assessment using context-aware neural networks

    Get PDF
    To meet the human perceived quality of experience (QoE) while communicating over various Voice over Internet protocol (VoIP) applications, for example Google Meet, Microsoft Skype, Apple FaceTime, etc. a precise speech quality assessment metric is needed. The metric should be able to detect and segregate different types of noise degradations present in the surroundings before measuring and monitoring the quality of speech in real-time. Our research is motivated by the lack of clear evidence presenting speech quality metric that can firstly distinguish different types of noise degradations before providing speech quality prediction decision. To that end, this paper presents a novel non-intrusive speech quality assessment metric using context-aware neural networks in which the noise class (context) of the degraded or noisy speech signal is first identified using a classifier then deep neutral networks (DNNs) based speech quality metrics (SQMs) are trained and optimized for each noise class to obtain the noise class-specific (context-specific) optimized speech quality predictions (MOS scores). The noisy speech signals, that is, clean speech signals degraded by different types of background noises are taken from the NOIZEUS speech corpus. Results demonstrate that even in the presence of less number of speech samples available from the NOIZEUS speech corpus, the proposed metric outperforms in different contexts compared to the metric where the contexts are not classified before speech quality prediction.publishedVersio

    Information-based Analysis and Control of Recurrent Linear Networks and Recurrent Networks with Sigmoidal Nonlinearities

    Get PDF
    Linear dynamical models have served as an analytically tractable approximation for a variety of natural and engineered systems. Recently, such models have been used to describe high-level diffusive interactions in the activation of complex networks, including those in the brain. In this regard, classical tools from control theory, including controllability analysis, have been used to assay the extent to which such networks might respond to their afferent inputs. However, for natural systems such as brain networks, it is not clear whether advantageous control properties necessarily correspond to useful functionality. That is, are systems that are highly controllable (according to certain metrics) also ones that are suited to computational goals such as representing, preserving and categorizing stimuli? This dissertation will introduce analysis methods that link the systems-theoretic properties of linear systems with informational measures that describe these functional characterizations. First, we assess sensitivity of a linear system to input orientation and novelty by deriving a measure of how networks translate input orientation differences into readable state trajectories. Next, we explore the implications of this novelty-sensitivity for endpoint-based input discrimination, wherein stimuli are decoded in terms of their induced representation in the state space. We develop a theoretical framework for the exploration of how networks utilize excess input energy to enhance orientation sensitivity (and thus enhanced discrimination ability). Next, we conduct a theoretical study to reveal how the background or default state of a network with linear dynamics allows it to best promote discrimination over a continuum of stimuli. Specifically, we derive a measure, based on the classical notion of a Fisher discriminant, quantifying the extent to which the state of a network encodes information about its afferent inputs. This measure provides an information value quantifying the knowablility of an input based on its projection onto the background state. We subsequently optimize this background state, and characterize both the optimal background and the inputs giving it rise. Finally, we extend this information-based network analysis to include networks with nonlinear dynamics--specifically, ones involving sigmoidal saturating functions. We employ a quasilinear approximation technique, novel here in terms of its multidimensionality and specific application, to approximate the nonlinear dynamics by scaling a corresponding linear system and biasing by an offset term. A Fisher information-based metric is derived for the quasilinear system, with analytical and numerical results showing that Fisher information is better for the quasilinear (hence sigmoidal) system than for an unconstrained linear system. Interestingly, this relation reverses when the noise is placed outside the sigmoid in the model, supporting conclusions extant in the literature that the relative alignment of the state and noise covariance is predictive of Fisher information. We show that there exists a clear trade-off between informational advantage, as conferred by the presence of sigmoidal nonlinearities, and speed of dynamics

    Algorithm and architecture for simultaneous diagonalization of matrices applied to subspace-based speech enhancement

    Get PDF
    This thesis presents algorithm and architecture for simultaneous diagonalization of matrices. As an example, a subspace-based speech enhancement problem is considered, where in the covariance matrices of the speech and noise are diagonalized simultaneously. In order to compare the system performance of the proposed algorithm, objective measurements of speech enhancement is shown in terms of the signal to noise ratio and mean bark spectral distortion at various noise levels. In addition, an innovative subband analysis technique for subspace-based time-domain constrained speech enhancement technique is proposed. The proposed technique analyses the signal in its subbands to build accurate estimates of the covariance matrices of speech and noise, exploiting the inherent low varying characteristics of speech and noise signals in narrow bands. The subband approach also decreases the computation time by reducing the order of the matrices to be simultaneously diagonalized. Simulation results indicate that the proposed technique performs well under extreme low signal-to-noise-ratio conditions. Further, an architecture is proposed to implement the simultaneous diagonalization scheme. The architecture is implemented on an FPGA primarily to compare the performance measures on hardware and the feasibility of the speech enhancement algorithm in terms of resource utilization, throughput, etc. A Xilinx FPGA is targeted for implementation. FPGA resource utilization re-enforces on the practicability of the design. Also a projection of the design feasibility for an ASIC implementation in terms of transistor count only is include

    Traitement paramétrique des signaux audio dans le contexte des prothèses auditives

    Get PDF
    Modèle à moyenne mobile > -- Modèle autorégressif > -- Modèle autorégressif à moyenne mobile > -- Remarque sur le lien entre AR, MA et ARMA -- Evaluation des paramètres d'un processus AR(p) -- Critères de sélection de l'ordre d'un modèle AR(p) -- Notion d'enveloppe spectrale -- Méthodes élaborées dans le domaine fréquentiel -- Méthodes élaborées dans le domaine de corrélation -- Réduction de bruit dans le domaine fréquentiel -- A two-microphone algorithm for speech enhancement -- State of the art -- Zelinski's approach in the case of two-microphone arrangement -- Two-microphone speech enhancement system -- Performance evaluation and results -- Réduction de bruit dans le domaine de corrélation -- Estimation de la puissance du bruit -- Compensation des effets du bruit -- Amélioration de la procédure de compensation -- Perspectives de développement -- Traitement paramétrique en présence de bruit -- Disposition du traitement combiné -- Amélioration de la précision de l'estimateur de variance du bruit

    Enhancement of speech signals - with a focus on voiced speech models

    Get PDF
    corecore