10 research outputs found

    Time-frequency processing - Spectral properties

    Get PDF
    International audienceMany audio signal processing algorithms typically do not operate on raw time-domain audio signals, but rather on time-frequency representations. A raw audio signal encodes the amplitude of a sound as a function of time. Its Fourier spectrum represents it as a function of frequency, but does not represent variations over time. A time-frequency representation presents the amplitude of a sound as a function of both time and frequency, and is able to jointly account for its temporal and spectral characteristics (Gröchenig, 2001). Time-frequency representations are appropriate for three reasons in our context. First, separation and enhancement often require modeling the structure of sound sources. Natural sound sources have a prominent structure both in time and frequency , which can be easily modeled in the time-frequency domain. Second, the sound sources are often mixed convolutively, and this convolutive mixing process can be approximated with simpler operations in the time-frequency domain. Third natural sounds are more sparsely distributed and overlap less with each other in the time-frequency domain than in the time or frequency domain, which facilitates their separation. In this chapter we introduce the most common time-frequency representations used for source separation and speech enhancement. Section 2.1 describes the procedure for calculating a time-frequency representation and converting it back to the time domain, using the short-time Fourier transform (STFT) as an example. It also presents other common time-frequency representations and their relevance for separation and enhancement. Section 2.2 discusses the properties of sound sources in the time-frequency domain, including sparsity, disjointness, and more complex structures such as harmonicity. Section 2.3 explains how to achieve separation by time-varying filtering in the time-frequency domain. We summarize the main concepts and provide links to other chapters and more advanced topics in Section 2.4

    Online source separation in reverberant environments exploiting known speaker locations

    Get PDF
    This thesis concerns blind source separation techniques using second order statistics and higher order statistics for reverberant environments. A focus of the thesis is algorithmic simplicity with a view to the algorithms being implemented in their online forms. The main challenge of blind source separation applications is to handle reverberant acoustic environments; a further complication is changes in the acoustic environment such as when human speakers physically move. A novel time-domain method which utilises a pair of finite impulse response filters is proposed. The method of principle angles is defined which exploits a singular value decomposition for their design. The pair of filters are implemented within a generalised sidelobe canceller structure, thus the method can be considered as a beamforming method which cancels one source. An adaptive filtering stage is then employed to recover the remaining source, by exploiting the output of the beamforming stage as a noise reference. A common approach to blind source separation is to use methods that use higher order statistics such as independent component analysis. When dealing with realistic convolutive audio and speech mixtures, processing in the frequency domain at each frequency bin is required. As a result this introduces the permutation problem, inherent in independent component analysis, across the frequency bins. Independent vector analysis directly addresses this issue by modeling the dependencies between frequency bins, namely making use of a source vector prior. An alternative source prior for real-time (online) natural gradient independent vector analysis is proposed. A Student's t probability density function is known to be more suited for speech sources, due to its heavier tails, and is incorporated into a real-time version of natural gradient independent vector analysis. The final algorithm is realised as a real-time embedded application on a floating point Texas Instruments digital signal processor platform. Moving sources, along with reverberant environments, cause significant problems in realistic source separation systems as mixing filters become time variant. A method which employs the pair of cancellation filters, is proposed to cancel one source coupled with an online natural gradient independent vector analysis technique to improve average separation performance in the context of step-wise moving sources. This addresses `dips' in performance when sources move. Results show the average convergence time of the performance parameters is improved. Online methods introduced in thesis are tested using impulse responses measured in reverberant environments, demonstrating their robustness and are shown to perform better than established methods in a variety of situations

    Blind identification of possibly under-determined convolutive MIMO systems

    Get PDF
    Blind identi¯cation of a Linear Time Invariant (LTI) Multiple-Input Multiple-Output (MIMO) system is of great importance in many applications, such as speech processing, multi-access communication, multi-sensor sonar/radar systems, and biomedical applications. The objective of blind identi¯cation for a MIMO system is to identify an unknown system, driven by Ni unobservable inputs, based on the No system outputs. We ¯rst present a novel blind approach for the identi¯cation of a over-determined (No ¸ Ni) MIMO system driven by white, mutually independent unobservable inputs. Samples of the system frequency response are obtained based on Parallel Factorization (PARAFAC) of three- or four-way tensors constructed respectively based on third- or fourth-order cross-spectra of the system outputs. We show that the information available in the higher-order spectra allows for the system response to be identi¯ed up to a constant scaling and permutation ambiguities and a linear phase ambiguity. Important features of the proposed approaches are that they do not require channel length information, need no phase unwrapping, and unlike the majority of existing methods, need no pre-whitening of the system outputs.While several methods have been proposed to blindly identify over-determined convolutive MIMO systems, very scarce results exist for under-determined (No < Ni) case, all of which refer to systems that either have some special structure, or special No, Ni values. We propose a novel approach for blind identi¯cation of under-determined convolutive MIMO systems of general dimensions. As long as min(No;Ni) ¸ 2, we can always ¯nd the appropriate order of statistics that guarantees identi¯ability of the system response within trivial ambiguities. We provide the description of the class of identi¯able MIMO systems for a certain order of statistics K, and an algorithm to reach the solution.Finally we propose a novel approach for blind identi¯cation and symbol recovery of a distributed antenna system with multiple carrier-frequency o®sets (CFO), arising due to mismatch between the oscillators of transmitters and receivers. The received base-band signal is over-sampled, and its polyphase components are used to formulate a virtual MIMO problem. By applying blind MIMO system estimation techniques, the system response is estimated and used to subsequently decouple the users and transform the multiple CFOs estimation problem into a set of independent single CFO estimation problems.Ph.D., Electrical Engineering -- Drexel University, 200

    Enhanced IVA for audio separation in highly reverberant environments

    Get PDF
    Blind Audio Source Separation (BASS), inspired by the "cocktail-party problem", has been a leading research application for blind source separation (BSS). This thesis concerns the enhancement of frequency domain convolutive blind source separation (FDCBSS) techniques for audio separation in highly reverberant room environments. Independent component analysis (ICA) is a higher order statistics (HOS) approach commonly used in the BSS framework. When applied to audio FDCBSS, ICA based methods suffer from the permutation problem across the frequency bins of each source. Independent vector analysis (IVA) is an FD-BSS algorithm that theoretically solves the permutation problem by using a multivariate source prior, where the sources are considered to be random vectors. The algorithm allows independence between multivariate source signals, and retains dependency between the source signals within each source vector. The source prior adopted to model the nonlinear dependency structure within the source vectors is crucial to the separation performance of the IVA algorithm. The focus of this thesis is on improving the separation performance of the IVA algorithm in the application of BASS. An alternative multivariate Student's t distribution is proposed as the source prior for the batch IVA algorithm. A Student's t probability density function can better model certain frequency domain speech signals due to its tail dependency property. Then, the nonlinear score function, for the IVA, is derived from the proposed source prior. A novel energy driven mixed super Gaussian and Student's t source prior is proposed for the IVA and FastIVA algorithms. The Student's t distribution, in the mixed source prior, can model the high amplitude data points whereas the super Gaussian distribution can model the lower amplitude information in the speech signals. The ratio of both distributions can be adjusted according to the energy of the observed mixtures to adapt for different types of speech signals. A particular multivariate generalized Gaussian distribution is adopted as the source prior for the online IVA algorithm. The nonlinear score function derived from this proposed source prior contains fourth order relationships between different frequency bins, which provides a more informative and stronger dependency structure and thereby improves the separation performance. An adaptive learning scheme is developed to improve the performance of the online IVA algorithm. The scheme adjusts the learning rate as a function of proximity to the target solutions. The scheme is also accompanied with a novel switched source prior technique taking the best performance properties of the super Gaussian source prior and the generalized Gaussian source prior as the algorithm converges. The methods and techniques, proposed in this thesis, are evaluated with real speech source signals in different simulated and real reverberant acoustic environments. A variety of measures are used within the evaluation criteria of the various algorithms. The experimental results demonstrate improved performance of the proposed methods and their robustness in a wide range of situations

    Enhanced independent vector analysis for speech separation in room environments

    Get PDF
    PhD ThesisThe human brain has the ability to focus on a desired sound source in the presence of several active sound sources. The machine based method lags behind in mimicking this particular skill of human beings. In the domain of digital signal processing this problem is termed as the cocktail party problem. This thesis thus aims to further the eld of acoustic source separation in the frequency domain based on exploiting source independence. The main challenge in such frequency domain algorithms is the permutation problem. Independent vector analysis (IVA) is a frequency domain blind source separation algorithm which can theoretically obviate the permutation problem by preserving the dependency structure within each source vector whilst eliminating the dependency between the frequency bins of di erent source vectors. This thesis in particular focuses on improving the separation performance of IVA algorithms which are used for frequency domain acoustic source separation in real room environments. The source prior is crucial to the separation performance of the IVA algorithm as it is used to model the nonlinear dependency structure within the source vectors. An alternative multivariate Student's t distribution source prior is proposed for the IVA algorithm as it is known to be well suited for modelling certain speech signals due to its heavy tail nature. Therefore the nonlinear score function that is derived from the proposed Student's t source prior can better model the dependency structure within the frequency bins and thereby enhance the separation performance and the convergence speed of the IVA and the Fast version of the IVA (FastIVA) algorithms. 4 5 A novel energy driven mixed Student's t and the original super Gaussian source prior is also proposed for the IVA algorithms. As speech signals can be composed of many high and low amplitude data points, therefore the Student's t distribution in the mixed source prior can account for the high amplitude data points whereas the original su- per Gaussian distribution can cater for the other information in the speech signals. Furthermore, the weight of both distributions in the mixed source prior can be ad- justed according to the energy of the observed mixtures. Therefore the mixed source prior adapts the measured signals and further enhances the performance of the IVA algorithm. A common approach within the IVA algorithm is to model di erent speech sources with an identical source prior, however this does not account for the unique characteristics of each speech signal. Therefore dependency modelling for di erent speech sources can be improved by modelling di erent speech sources with di erent source priors. Hence, the Student's t mixture model (SMM) is introduced as a source prior for the IVA algorithm. This new source prior can adapt according to the nature of di erent speech signals and the parameters for the proposed SMM source prior are estimated by deriving an e cient expectation maximization (EM) algorithm. As a result of this study, a novel EM framework for the IVA algorithm with the SMM as a source prior is proposed which is capable of separating the sources in an e cient manner. The proposed algorithms are tested in various realistic reverberant room environments with real speech signals. All the experiments and evaluation demonstrate the robustness and enhanced separation performance of the proposed algorithms

    Blind image deconvolution: nonstationary Bayesian approaches to restoring blurred photos

    Get PDF
    High quality digital images have become pervasive in modern scientific and everyday life — in areas from photography to astronomy, CCTV, microscopy, and medical imaging. However there are always limits to the quality of these images due to uncertainty and imprecision in the measurement systems. Modern signal processing methods offer the promise of overcoming some of these problems by postprocessing these blurred and noisy images. In this thesis, novel methods using nonstationary statistical models are developed for the removal of blurs from out of focus and other types of degraded photographic images. The work tackles the fundamental problem blind image deconvolution (BID); its goal is to restore a sharp image from a blurred observation when the blur itself is completely unknown. This is a “doubly illposed” problem — extreme lack of information must be countered by strong prior constraints about sensible types of solution. In this work, the hierarchical Bayesian methodology is used as a robust and versatile framework to impart the required prior knowledge. The thesis is arranged in two parts. In the first part, the BID problem is reviewed, along with techniques and models for its solution. Observation models are developed, with an emphasis on photographic restoration, concluding with a discussion of how these are reduced to the common linear spatially-invariant (LSI) convolutional model. Classical methods for the solution of illposed problems are summarised to provide a foundation for the main theoretical ideas that will be used under the Bayesian framework. This is followed by an indepth review and discussion of the various prior image and blur models appearing in the literature, and then their applications to solving the problem with both Bayesian and nonBayesian techniques. The second part covers novel restoration methods, making use of the theory presented in Part I. Firstly, two new nonstationary image models are presented. The first models local variance in the image, and the second extends this with locally adaptive noncausal autoregressive (AR) texture estimation and local mean components. These models allow for recovery of image details including edges and texture, whilst preserving smooth regions. Most existing methods do not model the boundary conditions correctly for deblurring of natural photographs, and a Chapter is devoted to exploring Bayesian solutions to this topic. Due to the complexity of the models used and the problem itself, there are many challenges which must be overcome for tractable inference. Using the new models, three different inference strategies are investigated: firstly using the Bayesian maximum marginalised a posteriori (MMAP) method with deterministic optimisation; proceeding with the stochastic methods of variational Bayesian (VB) distribution approximation, and simulation of the posterior distribution using the Gibbs sampler. Of these, we find the Gibbs sampler to be the most effective way to deal with a variety of different types of unknown blurs. Along the way, details are given of the numerical strategies developed to give accurate results and to accelerate performance. Finally, the thesis demonstrates state of the art results in blind restoration of synthetic and real degraded images, such as recovering details in out of focus photographs

    Glottal-synchronous speech processing

    No full text
    Glottal-synchronous speech processing is a field of speech science where the pseudoperiodicity of voiced speech is exploited. Traditionally, speech processing involves segmenting and processing short speech frames of predefined length; this may fail to exploit the inherent periodic structure of voiced speech which glottal-synchronous speech frames have the potential to harness. Glottal-synchronous frames are often derived from the glottal closure instants (GCIs) and glottal opening instants (GOIs). The SIGMA algorithm was developed for the detection of GCIs and GOIs from the Electroglottograph signal with a measured accuracy of up to 99.59%. For GCI and GOI detection from speech signals, the YAGA algorithm provides a measured accuracy of up to 99.84%. Multichannel speech-based approaches are shown to be more robust to reverberation than single-channel algorithms. The GCIs are applied to real-world applications including speech dereverberation, where SNR is improved by up to 5 dB, and to prosodic manipulation where the importance of voicing detection in glottal-synchronous algorithms is demonstrated by subjective testing. The GCIs are further exploited in a new area of data-driven speech modelling, providing new insights into speech production and a set of tools to aid deployment into real-world applications. The technique is shown to be applicable in areas of speech coding, identification and artificial bandwidth extension of telephone speec

    Blind channel identification/equalization with applications in wireless communications

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Extraction et débruitage de signaux ECG du foetus.

    Get PDF
    Les malformations cardiaques congénitales sont la première cause de décès liés à une anomalie congénitale. L electrocardiogramme du fœtus (ECGf), qui est censé contenir beaucoup plus d informations par rapport aux méthodes échographiques conventionnelles, peut être mesuré e par des électrodes sur l abdomen de la mère. Cependant, il est tres faible et mélangé avec plusieurs sources de bruit et interférence y compris l ECG de la mère (ECGm) dont le niveau est très fort. Dans les études précédentes, plusieurs méthodes ont été proposées pour l extraction de l ECGf à partir des signaux enregistrés par des électrodes placées à la surface du corps de la mère. Cependant, ces méthodes nécessitent un nombre de capteurs important, et s avèrent inefficaces avec un ou deux capteurs. Dans cette étude trois approches innovantes reposant sur une paramétrisation algébrique, statistique ou par variables d état sont proposées. Ces trois méthodes mettent en œuvre des modélisations différentes de la quasi-périodicité du signal cardiaque. Dans la première approche, le signal cardiaque et sa variabilité sont modélisés par un filtre de Kalman. Dans la seconde approche, le signal est découpé en fenêtres selon les battements, et l empilage constitue un tenseur dont on cherchera la décomposition. Dans la troisième approche, le signal n est pas modélisé directement, mais il est considéré comme un processus Gaussien, caractérisé par ses statistiques à l ordre deux. Dans les différentes modèles, contrairement aux études précédentes, l ECGm et le (ou les) ECGf sont modélisés explicitement. Les performances des méthodes proposées, qui utilisent un nombre minimum de capteurs, sont évaluées sur des données synthétiques et des enregistrements réels, y compris les signaux cardiaques des fœtus jumeaux.Congenital heart defects are the leading cause of birth defect-related deaths. The fetal electrocardiogram (fECG), which is believed to contain much more information as compared with conventional sonographic methods, can be measured by placing electrodes on the mother s abdomen. However, it has very low power and is mixed with several sources of noise and interference, including the strong maternal ECG (mECG). In previous studies, several methods have been proposed for the extraction of fECG signals recorded from the maternal body surface. However, these methods require a large number of sensors, and are ineffective with only one or two sensors. In this study, state modeling, statistical and deterministic approaches are proposed for capturing weak traces of fetal cardiac signals. These three methods implement different models of the quasi-periodicity of the cardiac signal. In the first approach, the heart rate and its variability are modeled by a Kalman filter. In the second approach, the signal is divided into windows according to the beats. Stacking the windows constructs a tensor that is then decomposed. In a third approach, the signal is not directly modeled, but it is considered as a Gaussian process characterized by its second order statistics. In all the different proposed methods, unlike previous studies, mECG and fECG(s) are explicitly modeled. The performances of the proposed methods, which utilize a minimal number of electrodes, are assessed on synthetic data and actual recordings including twin fetal cardiac signals.SAVOIE-SCD - Bib.électronique (730659901) / SudocGRENOBLE1/INP-Bib.électronique (384210012) / SudocGRENOBLE2/3-Bib.électronique (384219901) / SudocSudocFranceF

    Speech enhancement in binaural hearing protection devices

    Get PDF
    The capability of people to operate safely and effective under extreme noise conditions is dependent on their accesses to adequate voice communication while using hearing protection. This thesis develops speech enhancement algorithms that can be implemented in binaural hearing protection devices to improve communication and situation awareness in the workplace. The developed algorithms which emphasize low computational complexity, come with the capability to suppress noise while enhancing speech
    corecore