134 research outputs found

    Guided Matching Pursuit and its Application to Sound Source Separation

    Get PDF
    In the last couple of decades there has been an increasing interest in the application of source separation technologies to musical signal processing. Given a signal that consists of a mixture of musical sources, source separation aims at extracting and/or isolating the signals that correspond to the original sources. A system capable of high quality source separation could be an invaluable tool for the sound engineer as well as the end user. Applications of source separation include, but are not limited to, remixing, up-mixing, spatial re-configuration, individual source modification such as filtering, pitch detection/correction and time stretching, music transcription, voice recognition and source-specific audio coding to name a few. Of particular interest is the problem of separating sources from a mixture comprising two channels (2.0 format) since this is still the most commonly used format in the music industry and most domestic listening environments. When the number of sources is greater than the number of mixtures (which is usually the case with stereophonic recordings) then the problem of source separation becomes under-determined and traditional source separation techniques, such as “Independent Component Analysis” (ICA) cannot be successfully applied. In such cases a family of techniques known as “Sparse Component Analysis” (SCA) are better suited. In short a mixture signal is decomposed into a new domain were the individual sources are sparsely represented which implies that their corresponding coefficients will have disjoint (or almost) disjoint supports. Taking advantage of this property along with the spatial information within the mixture and other prior information that could be available, it is possible to identify the sources in the new domain and separate them by going back to the time domain. It is a fact that sparse representations lead to higher quality separation. Regardless, the most commonly used front-end for a SCA system is the ubiquitous short-time Fourier transform (STFT) which although is a sparsifying transform it is not the best choice for this job. A better alternative is the matching pursuit (MP) decomposition. MP is an iterative algorithm that decomposes a signal into a set of elementary waveforms called atoms chosen from an over-complete dictionary in such a way so that they represent the inherent signal structures. A crucial part of MP is the creation of the dictionary which directly affects the results of the decomposition and subsequently the quality of source separation. Selecting an appropriate dictionary could prove a difficult task and an adaptive approach would be appropriate. This work proposes a new MP variant termed guided matching pursuit (GMP) which adds a new pre-processing step into the main sequence of the MP algorithm. The purpose of this step is to perform an analysis of the signal and extract important features, termed guide maps, that are used to create dynamic mini-dictionaries comprising atoms which are expected to correlate well with the underlying signal structures thus leading to focused and more efficient searches around particular supports of the signal. This algorithm is accompanied by a modular and highly flexible MATLAB implementation which is suited to the processing of long duration audio signals. Finally the new algorithm is applied to the source separation of two-channel linear instantaneous mixtures and preliminary testing demonstrates that the performance of GMP is on par with the performance of state of the art systems

    ベイズ法によるマイクロフォンアレイ処理

    Get PDF
    京都大学0048新制・課程博士博士(情報学)甲第18412号情博第527号新制||情||93(附属図書館)31270京都大学大学院情報学研究科知能情報学専攻(主査)教授 奥乃 博, 教授 河原 達也, 准教授 CUTURI CAMETO Marco, 講師 吉井 和佳学位規則第4条第1項該当Doctor of InformaticsKyoto UniversityDFA

    Single channel blind source separation

    Get PDF
    Single channel blind source separation (SCBSS) is an intensively researched field with numerous important applications. This research sets out to investigate the separation of monaural mixed audio recordings without relying on training knowledge. This research proposes a novel method based on variable regularised sparse nonnegative matrix factorization which decomposes an information-bearing matrix into two-dimensional convolution of factor matrices that represent the spectral basis and temporal code of the sources. In this work, a variational Bayesian approach has been developed for computing the sparsity parameters of the matrix factorization. To further improve the previous work, this research proposes a new method based on decomposing the mixture into a series of oscillatory components termed as the intrinsic mode functions (IMF). It is shown that IMFs have several desirable properties unique to SCBSS problem and how these properties can be advantaged to relax the constraints posed by the problem. In addition, this research develops a novel method for feature extraction using psycho-acoustic model. The monaural mixed signal is transformed to a cochleagram using the gammatone filterbank, whose bandwidths increase incrementally as the center frequency increases; thus resulting to non-uniform time-frequency (TF) resolution in the analysis of audio signal. Within this domain, a family of Itakura-Saito (IS) divergence based novel two-dimensional matrix factorization has been developed. The proposed matrix factorizations have the property of scale invariant which enables lower energy components in the cochleagram to be treated with equal importance as the high energy ones. Results show that all the developed algorithms presented in this thesis have outperformed conventional methods.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Efficient Blind Source Separation Algorithms with Applications in Speech and Biomedical Signal Processing

    Get PDF
    Blind source separation/extraction (BSS/BSE) is a powerful signal processing method and has been applied extensively in many fields such as biomedical sciences and speech signal processing, to extract a set of unknown input sources from a set of observations. Different algorithms of BSS were proposed in the literature, that need more investigations, related to the extraction approach, computational complexity, convergence speed, type of domain (time or frequency), mixture properties, and extraction performances. This work presents a three new BSS/BSE algorithms based on computing new transformation matrices used to extract the unknown signals. Type of signals considered in this dissertation are speech, Gaussian, and ECG signals. The first algorithm, named as the BSE-parallel linear predictor filter (BSE-PLP), computes a transformation matrix from the the covariance matrix of the whitened data. Then, use the matrix as an input to linear predictor filters whose coefficients being the unknown sources. The algorithm has very fast convergence in two iterations. Simulation results, using speech, Gaussian, and ECG signals, show that the model is capable of extracting the unknown source signals and removing noise when the input signal to noise ratio is varied from -20 dB to 80 dB. The second algorithm, named as the BSE-idempotent transformation matrix (BSE-ITM), computes its transformation matrix in iterative form, with less computational complexity. The proposed method is tested using speech, Gaussian, and ECG signals. Simulation results show that the proposed algorithm significantly separate the source signals with better performance measures as compared with other approaches used in the dissertation. The third algorithm, named null space idempotent transformation matrix (NSITM) has been designed using the principle of null space of the ITM, to separate the unknown sources. Simulation results show that the method is successfully separating speech, Gaussian, and ECG signals from their mixture. The algorithm has been used also to estimate average FECG heart rate. Results indicated considerable improvement in estimating the peaks over other algorithms used in this work

    Design of large polyphase filters in the Quadratic Residue Number System

    Full text link

    Temperature aware power optimization for multicore floating-point units

    Full text link

    An investigation of the utility of monaural sound source separation via nonnegative matrix factorization applied to acoustic echo and reverberation mitigation for hands-free telephony

    Get PDF
    In this thesis we investigate the applicability and utility of Monaural Sound Source Separation (MSSS) via Nonnegative Matrix Factorization (NMF) for various problems related to audio for hands-free telephony. We first investigate MSSS via NMF as an alternative acoustic echo reduction approach to existing approaches such as Acoustic Echo Cancellation (AEC). To this end, we present the single-channel acoustic echo problem as an MSSS problem, in which the objective is to extract the users signal from a mixture also containing acoustic echo and noise. To perform separation, NMF is used to decompose the near-end microphone signal onto the union of two nonnegative bases in the magnitude Short Time Fourier Transform domain. One of these bases is for the spectral energy of the acoustic echo signal, and is formed from the in- coming far-end user’s speech, while the other basis is for the spectral energy of the near-end speaker, and is trained with speech data a priori. In comparison to AEC, the speaker extraction approach obviates Double-Talk Detection (DTD), and is demonstrated to attain its maximal echo mitigation performance immediately upon initiation and to maintain that performance during and after room changes for similar computational requirements. Speaker extraction is also shown to introduce distortion of the near-end speech signal during double-talk, which is quantified by means of a speech distortion measure and compared to that of AEC. Subsequently, we address Double-Talk Detection (DTD) for block-based AEC algorithms. We propose a novel block-based DTD algorithm that uses the available signals and the estimate of the echo signal that is produced by NMF-based speaker extraction to compute a suitably normalized correlation-based decision variable, which is compared to a fixed threshold to decide on doubletalk. Using a standard evaluation technique, the proposed algorithm is shown to have comparable detection performance to an existing conventional block-based DTD algorithm. It is also demonstrated to inherit the room change insensitivity of speaker extraction, with the proposed DTD algorithm generating minimal false doubletalk indications upon initiation and in response to room changes in comparison to the existing conventional DTD. We also show that this property allows its paired AEC to converge at a rate close to the optimum. Another focus of this thesis is the problem of inverting a single measurement of a non- minimum phase Room Impulse Response (RIR). We describe the process by which percep- tually detrimental all-pass phase distortion arises in reverberant speech filtered by the inverse of the minimum phase component of the RIR; in short, such distortion arises from inverting the magnitude response of the high-Q maximum phase zeros of the RIR. We then propose two novel partial inversion schemes that precisely mitigate this distortion. One of these schemes employs NMF-based MSSS to separate the all-pass phase distortion from the target speech in the magnitude STFT domain, while the other approach modifies the inverse minimum phase filter such that the magnitude response of the maximum phase zeros of the RIR is not fully compensated. Subjective listening tests reveal that the proposed schemes generally produce better quality output speech than a comparable inversion technique

    An investigation of the utility of monaural sound source separation via nonnegative matrix factorization applied to acoustic echo and reverberation mitigation for hands-free telephony

    Get PDF
    In this thesis we investigate the applicability and utility of Monaural Sound Source Separation (MSSS) via Nonnegative Matrix Factorization (NMF) for various problems related to audio for hands-free telephony. We first investigate MSSS via NMF as an alternative acoustic echo reduction approach to existing approaches such as Acoustic Echo Cancellation (AEC). To this end, we present the single-channel acoustic echo problem as an MSSS problem, in which the objective is to extract the users signal from a mixture also containing acoustic echo and noise. To perform separation, NMF is used to decompose the near-end microphone signal onto the union of two nonnegative bases in the magnitude Short Time Fourier Transform domain. One of these bases is for the spectral energy of the acoustic echo signal, and is formed from the in- coming far-end user’s speech, while the other basis is for the spectral energy of the near-end speaker, and is trained with speech data a priori. In comparison to AEC, the speaker extraction approach obviates Double-Talk Detection (DTD), and is demonstrated to attain its maximal echo mitigation performance immediately upon initiation and to maintain that performance during and after room changes for similar computational requirements. Speaker extraction is also shown to introduce distortion of the near-end speech signal during double-talk, which is quantified by means of a speech distortion measure and compared to that of AEC. Subsequently, we address Double-Talk Detection (DTD) for block-based AEC algorithms. We propose a novel block-based DTD algorithm that uses the available signals and the estimate of the echo signal that is produced by NMF-based speaker extraction to compute a suitably normalized correlation-based decision variable, which is compared to a fixed threshold to decide on doubletalk. Using a standard evaluation technique, the proposed algorithm is shown to have comparable detection performance to an existing conventional block-based DTD algorithm. It is also demonstrated to inherit the room change insensitivity of speaker extraction, with the proposed DTD algorithm generating minimal false doubletalk indications upon initiation and in response to room changes in comparison to the existing conventional DTD. We also show that this property allows its paired AEC to converge at a rate close to the optimum. Another focus of this thesis is the problem of inverting a single measurement of a non- minimum phase Room Impulse Response (RIR). We describe the process by which percep- tually detrimental all-pass phase distortion arises in reverberant speech filtered by the inverse of the minimum phase component of the RIR; in short, such distortion arises from inverting the magnitude response of the high-Q maximum phase zeros of the RIR. We then propose two novel partial inversion schemes that precisely mitigate this distortion. One of these schemes employs NMF-based MSSS to separate the all-pass phase distortion from the target speech in the magnitude STFT domain, while the other approach modifies the inverse minimum phase filter such that the magnitude response of the maximum phase zeros of the RIR is not fully compensated. Subjective listening tests reveal that the proposed schemes generally produce better quality output speech than a comparable inversion technique

    Localization of brain signal sources using blind source separation

    Get PDF
    Reliable localization of brain signal sources by using convenient, easy, and hazardless data acquisition techniques can potentially play a key role in the understanding, analysis, and tracking of brain activities for determination of physiological, pathological, and functional abnormalities. The sources can be due to normal brain activities, mental disorders, stimulation of the brain, or movement related tasks. The focus of this thesis is therefore the development of novel source localization techniques based upon EEG measurements. Independent component analysis is used in blind separation (BSS) of the EEG sources to yield three different approaches for source localization. In the first method the sources are localized over the scalp pattern using BSS in various subbands, and by investigating the number of components which are likely to be the true sources. In the second method, the sources are separated and their corresponding topographical information is used within a least-squares algorithm to localize the sources within the brain region. The locations of the known sources, such as some normal brain rhythms, are also utilized to help in determining the unknown sources. The final approach is an effective BSS algorithm partially constrained by information related to the known sources. In addition, some investigation have been undertaken to incorporate non-homogeneity of the head layers in terms of the changes in electrical and magnetic characteristics and also with respect to the noise level within the processing methods. Experimental studies with real and synthetic data sets are undertaken using MATLAB and the efficacy of each method discussed
    corecore