6 research outputs found

    Improved Convolutive and Under-Determined Blind Audio Source Separation with MRF Smoothing

    Get PDF
    Convolutive and under-determined blind audio source separation from noisy recordings is a challenging problem. Several computational strategies have been proposed to address this problem. This study is concerned with several modifications to the expectation-minimization-based algorithm, which iteratively estimates the mixing and source parameters. This strategy assumes that any entry in each source spectrogram is modeled using superimposed Gaussian components, which are mutually and individually independent across frequency and time bins. In our approach, we resolve this issue by considering a locally smooth temporal and frequency structure in the power source spectrograms. Local smoothness is enforced by incorporating a Gibbs prior in the complete data likelihood function, which models the interactions between neighboring spectrogram bins using a Markov random field. Simulations using audio files derived from stereo audio source separation evaluation campaign 2008 demonstrate high efficiency with the proposed improvement

    Estimating number of speakers via density-based clustering and classification decision

    Get PDF
    It is crucial to robustly estimate the number of speakers (NoS) from the recorded audio mixtures in a reverberant environment. Some popular time-frequency (TF) methods approach this NoS estimation problem by assuming that only one of the speech components is active at each TF slot. However, this condition is violated in many scenarios where the speeches are convolved with long length of room impulse response coefficients, which causes degenerated performance of NoS estimation. To tackle this problem, a density-based clustering strategy is proposed to estimate NoS based on a local dominance assumption of speeches. Our method consists of several steps from clustering to classification of speakers with the consideration of robustness. First, the leading eigenvectors are extracted from the local covariance matrices of mixture TF components and ranked by the combination of local density and minimum distance to other leading eigenvectors with higher density. Second, a gap-based method is employed to determine the cluster centers from the ranked leading eigenvectors at each frequency bin. Third, a criterion based on averaged volume of cluster centers is proposed to select reliable clustering results at some frequency bins for the classification decision of NoS. The experiment results demonstrate that the proposed algorithm is superior to the existing methods in various reverberation cases with noise-free condition or noise condition

    Underdetermined convolutive blind source separation via time-frequency masking

    No full text
    In this paper, we consider the problem of separation of unknown number of sources from their underdetermined convolutive mixtures via time-frequency (TF) masking. We propose two algorithms, one for the estimation of the masks which are to be applied to the mixture in the TF domain for the separation of signals in the frequency domain, and the other for solving the permutation problem. The algorithm for mask estimation is based on the concept of angles in complex vector space. Unlike the previously reported methods, the algorithm does not require any estimation of the mixing matrix or the source positions for mask estimation. The algorithm clusters the mixture samples in the TF domain based on the Hermitian angle between the sample vector and a reference vector using the well known k -means or fuzzy c -means clustering algorithms. The membership functions so obtained from the clustering algorithms are directly used as the masks. The algorithm for solving the permutation problem clusters the estimated masks by using k-means clustering of small groups of nearby masks with overlap. The effectiveness of the algorithm in separating the sources, including collinear sources, from their underdetermined convolutive mixtures obtained in a real room environment, is demonstrated.Accepted versio

    RĂ©duction du bruit ambiant dans les sons acoustiques respiratoires

    Get PDF
    RÉSUMÉ : Le bruit ambiant présent dans les sons respiratoires cause beaucoup de problèmes aux médecins lors de l'auscultation des patients atteints de maladies respiratoires. Il est difficile dans ces conditions de bruit ambiant de faire un diagnostic efficace et juste des maladies respiratoires chez les patients auscultés. Les médecins ayant moins d'expériences dans l'écoute des sons respiratoires pourront difficilement détecter une maladie respiratoire dans les conditions ambiantes bruyantes perturbant l'écoute du son respiratoire. L'objectif de notre recherche est de trouver un algorithme robuste et efficace qui réduit le bruit ambiant dans les sons acoustiques respiratoires des patients hospitalisés. Les algorithmes sélectionnés doivent réduire le bruit ambiant au maximum tout en préservant la qualité du son acoustique respiratoire du patient. En se basant sur les ressemblances acoustiques entre les sons respiratoires et les sons de paroles, les méthodes utilisées dans le domaine du rehaussement de la parole à savoir : le filtrage adaptatif (AF), la soustraction spectrale (SS) et la séparation aveugle de source (BSS) ont été utilisées dans nos travaux. Des données de test obtenues par un mélange additif, un mélange convolutif et des enregistrements réels ont été utilisées pour l'évaluation des performances de ces techniques. L'évaluation des différents algorithmes est réalisée par des méthodes subjectives caractérisées par un test d'écoute utilisant le score d'opinion moyen (MOS) et objectives basées sur le rapport signal-sur-bruit (SNR) avant et après filtrage, la corrélation croisée (CC), l'erreur quadratique moyenne normalisée (NMSE) et le rapport signal-sur-interférence (SIR). Les résultats des évaluations ont révélé une préférence des participants, pour ce qui est du test d'écoute, à l'utilisation de la méthode du filtrage adaptatif (AF) pour la réduction du bruit de type additif dans les sons acoustiques respiratoires. Pour le type de mélange convolutif et des enregistrements réels, le choix des participants s'est porté sur l'utilisation de la méthode de la séparation aveugle de sources (BSS). -- Mot(s) clé(s) en français : Sons acoustiques respiratoires, bruit ambiant, réduction du bruit, filtrage adaptatif, soustraction spectrale, séparation aveugle des sources. -- ABSTRACT : Ambient noise in respiratory sounds causes many problems for physicians when ausculting patients with respiratory diseases. It is difficult under these ambient noise conditions to make an effective and accurate diagnosis of respiratory diseases in hospital patients. Physicians with less experience in listening to respiratory sounds will have difficulty detecting respiratory disease in the noisy ambient conditions that interfere with listening to respiratory sounds. Our research goal is to find a robust and efficient algorithm that reduces the ambient noise in the respiratory acoustic sounds of hospitalized patients. The selected algorithms should reduce the ambient noise as much as possible while preserving the quality of the patient's breath sounds. Based on the acoustic similarities between breath sounds and speech sounds, the methods used in the field of speech enhancement known as: adaptive filtering (AF), spectral subtraction (SS), and blind source separation (BSS) were used in our project. Testing data obtained from additive mixing, convolutional mixing and real recordings were used to evaluate the performance of these techniques. The evaluation of the different algorithms is carried out by subjective methods characterized by a listening test using the mean opinion score (MOS) and objective methods based on the signal-to-noise ratio (SNR) before and after filtering, the cross-correlation (CC), the normalised mean square error (NMSE), and the signal-to-interference ratio (SIR). The evaluation results revealed a preference of the participants, in terms of the listening test, to use the adaptive filter (AF) method for the reduction of additive type noise in breath sounds. For the convolutional mixing type and real recordings, the participants' choice was to use the blind source separation method (BSS). -- Mot(s) clé(s) en anglais : Respiratory acoustic sounds, ambient noise, denoising, adaptive filtering, spectral subtraction, blind source separation

    Single channel overlapped-speech detection and separation of spontaneous conversations

    Get PDF
    PhD ThesisIn the thesis, spontaneous conversation containing both speech mixture and speech dialogue is considered. The speech mixture refers to speakers speaking simultaneously (i.e. the overlapped-speech). The speech dialogue refers to only one speaker is actively speaking and the other is silent. That Input conversation is firstly processed by the overlapped-speech detection. Two output signals are then segregated into dialogue and mixture formats. The dialogue is processed by speaker diarization. Its outputs are the individual speech of each speaker. The mixture is processed by speech separation. Its outputs are independent separated speech signals of the speaker. When the separation input contains only the mixture, blind speech separation approach is used. When the separation is assisted by the outputs of the speaker diarization, it is informed speech separation. The research presents novel: overlapped-speech detection algorithm, and two speech separation algorithms. The proposed overlapped-speech detection is an algorithm to estimate the switching instants of the input. Optimization loop is adapted to adopt the best capsulated audio features and to avoid the worst. The optimization depends on principles of the pattern recognition, and k-means clustering. For of 300 simulated conversations, averages of: False-Alarm Error is 1.9%, Missed-Speech Error is 0.4%, and Overlap-Speaker Error is 1%. Approximately, these errors equal the errors of best recent reliable speaker diarization corpuses. The proposed blind speech separation algorithm consists of four sequential techniques: filter-bank analysis, Non-negative Matrix Factorization (NMF), speaker clustering and filter-bank synthesis. Instead of the required speaker segmentation, effective standard framing is contributed. Average obtained objective tests (SAR, SDR and SIR) of 51 simulated conversations are: 5.06dB, 4.87dB and 12.47dB respectively. For the proposed informed speech separation algorithm, outputs of the speaker diarization are a generated-database. The database associated the speech separation by creating virtual targeted-speech and mixture. The contributed virtual signals are trained to facilitate the separation by homogenising them with the NMF-matrix elements of the real mixture. Contributed masking optimized the resulting speech. Average obtained SAR, SDR and SIR of 341 simulated conversations are 9.55dB, 1.12dB, and 2.97dB respectively. Per the objective tests of the two speech separation algorithms, they are in the mid-range of the well-known NMF-based audio and speech separation methods
    corecore