149 research outputs found

    Application of sound source separation methods to advanced spatial audio systems

    Full text link
    This thesis is related to the field of Sound Source Separation (SSS). It addresses the development and evaluation of these techniques for their application in the resynthesis of high-realism sound scenes by means of Wave Field Synthesis (WFS). Because the vast majority of audio recordings are preserved in twochannel stereo format, special up-converters are required to use advanced spatial audio reproduction formats, such as WFS. This is due to the fact that WFS needs the original source signals to be available, in order to accurately synthesize the acoustic field inside an extended listening area. Thus, an object-based mixing is required. Source separation problems in digital signal processing are those in which several signals have been mixed together and the objective is to find out what the original signals were. Therefore, SSS algorithms can be applied to existing two-channel mixtures to extract the different objects that compose the stereo scene. Unfortunately, most stereo mixtures are underdetermined, i.e., there are more sound sources than audio channels. This condition makes the SSS problem especially difficult and stronger assumptions have to be taken, often related to the sparsity of the sources under some signal transformation. This thesis is focused on the application of SSS techniques to the spatial sound reproduction field. As a result, its contributions can be categorized within these two areas. First, two underdetermined SSS methods are proposed to deal efficiently with the separation of stereo sound mixtures. These techniques are based on a multi-level thresholding segmentation approach, which enables to perform a fast and unsupervised separation of sound sources in the time-frequency domain. Although both techniques rely on the same clustering type, the features considered by each of them are related to different localization cues that enable to perform separation of either instantaneous or real mixtures.Additionally, two post-processing techniques aimed at improving the isolation of the separated sources are proposed. The performance achieved by several SSS methods in the resynthesis of WFS sound scenes is afterwards evaluated by means of listening tests, paying special attention to the change observed in the perceived spatial attributes. Although the estimated sources are distorted versions of the original ones, the masking effects involved in their spatial remixing make artifacts less perceptible, which improves the overall assessed quality. Finally, some novel developments related to the application of time-frequency processing to source localization and enhanced sound reproduction are presented.Cobos Serrano, M. (2009). Application of sound source separation methods to advanced spatial audio systems [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/8969Palanci

    Blind Source Separation for the Processing of Contact-Less Biosignals

    Get PDF
    (Spatio-temporale) Blind Source Separation (BSS) eignet sich für die Verarbeitung von Multikanal-Messungen im Bereich der kontaktlosen Biosignalerfassung. Ziel der BSS ist dabei die Trennung von (z.B. kardialen) Nutzsignalen und Störsignalen typisch für die kontaktlosen Messtechniken. Das Potential der BSS kann praktisch nur ausgeschöpft werden, wenn (1) ein geeignetes BSS-Modell verwendet wird, welches der Komplexität der Multikanal-Messung gerecht wird und (2) die unbestimmte Permutation unter den BSS-Ausgangssignalen gelöst wird, d.h. das Nutzsignal praktisch automatisiert identifiziert werden kann. Die vorliegende Arbeit entwirft ein Framework, mit dessen Hilfe die Effizienz von BSS-Algorithmen im Kontext des kamera-basierten Photoplethysmogramms bewertet werden kann. Empfehlungen zur Auswahl bestimmter Algorithmen im Zusammenhang mit spezifischen Signal-Charakteristiken werden abgeleitet. Außerdem werden im Rahmen der Arbeit Konzepte für die automatisierte Kanalauswahl nach BSS im Bereich der kontaktlosen Messung des Elektrokardiogramms entwickelt und bewertet. Neuartige Algorithmen basierend auf Sparse Coding erwiesen sich dabei als besonders effizient im Vergleich zu Standard-Methoden.(Spatio-temporal) Blind Source Separation (BSS) provides a large potential to process distorted multichannel biosignal measurements in the context of novel contact-less recording techniques for separating distortions from the cardiac signal of interest. This potential can only be practically utilized (1) if a BSS model is applied that matches the complexity of the measurement, i.e. the signal mixture and (2) if permutation indeterminacy is solved among the BSS output components, i.e the component of interest can be practically selected. The present work, first, designs a framework to assess the efficacy of BSS algorithms in the context of the camera-based photoplethysmogram (cbPPG) and characterizes multiple BSS algorithms, accordingly. Algorithm selection recommendations for certain mixture characteristics are derived. Second, the present work develops and evaluates concepts to solve permutation indeterminacy for BSS outputs of contact-less electrocardiogram (ECG) recordings. The novel approach based on sparse coding is shown to outperform the existing concepts of higher order moments and frequency-domain features

    Dictionary Learning for Sparse Representations With Applications to Blind Source Separation.

    Get PDF
    During the past decade, sparse representation has attracted much attention in the signal processing community. It aims to represent a signal as a linear combination of a small number of elementary signals called atoms. These atoms constitute a dictionary so that a signal can be expressed by the multiplication of the dictionary and a sparse coefficients vector. This leads to two main challenges that are studied in the literature, i.e. sparse coding (find the coding coefficients based on a given dictionary) and dictionary design (find an appropriate dictionary to fit the data). Dictionary design is the focus of this thesis. Traditionally, the signals can be decomposed by the predefined mathematical transform, such as discrete cosine transform (DCT), which forms the so-called analytical approach. In recent years, learning-based methods have been introduced to adapt the dictionary from a set of training data, leading to the technique of dictionary learning. Although this may involve a higher computational complexity, learned dictionaries have the potential to offer improved performance as compared with predefined dictionaries. Dictionary learning algorithm is often achieved by iteratively executing two operations: sparse approximation and dictionary update. We focus on the dictionary update step, where the dictionary is optimized with a given sparsity pattern. A novel framework is proposed to generalize benchmark mechanisms such as the method of optimal directions (MOD) and K-SVD where an arbitrary set of codewords and the corresponding sparse coefficients are simultaneously updated, hence the term simultaneous codeword optimization (SimCO). Moreover, its extended formulation ‘regularized SimCO’ mitigates the major bottleneck of dictionary update caused by the singular points. First and second order optimization procedures are designed to solve the primitive and regularized SimCO. In addition, a tree-structured multi-level representation of dictionary based on clustering is used to speed up the optimization process in the sparse coding stage. This novel dictionary learning algorithm is also applied for solving the underdetermined blind speech separation problem, leading to a multi-stage method, where the separation problem is reformulated as a sparse coding problem, with the dictionary being learned by an adaptive algorithm. Using mutual coherence and sparsity index, the performance of a variety of dictionaries for underdetermined speech separation is compared and analyzed, such as the dictionaries learned from speech mixtures and ground truth speech sources, as well as those predefined by mathematical transforms. Finally, we propose a new method for joint dictionary learning and source separation. Different from the multistage method, the proposed method can simultaneously estimate the mixing matrix, the dictionary and the sources in an alternating and blind manner. The advantages of all the proposed methods are demonstrated over the state-of-the-art methods using extensive numerical tests

    Shift & 2D Rotation Invariant Sparse Coding for Multivariate Signals

    Get PDF
    International audienceClassical dictionary learning algorithms (DLA) allow unicomponent signals to be processed. Due to our interest in two-dimensional (2D) motion signals, we wanted to mix the two components to provide rotation invariance. So, multicomponent frameworks are examined here. In contrast to the well-known multichannel framework, a multivariate framework is first introduced as a tool to easily solve our problem and to preserve the data structure. Within this multivariate framework, we then present sparse coding methods: multivariate orthogonal matching pursuit (M-OMP), which provides sparse approximation for multivariate signals, and multivariate DLA (M-DLA), which empirically learns the characteristic patterns (or features) that are associated to a multivariate signals set, and combines shift-invariance and online learning. Once the multivariate dictionary is learned, any signal of this considered set can be approximated sparsely. This multivariate framework is introduced to simply present the 2D rotation invariant (2DRI) case. By studying 2D motions that are acquired in bivariate real signals, we want the decompositions to be independent of the orientation of the movement execution in the 2D space. The methods are thus specified for the 2DRI case to be robust to any rotation: 2DRI-OMP and 2DRI-DLA. Shift and rotation invariant cases induce a compact learned dictionary and provide robust decomposition. As validation, our methods are applied to 2D handwritten data to extract the elementary features of this signals set, and to provide rotation invariant decomposition

    Source Separation for Hearing Aid Applications

    Get PDF
    corecore