152 research outputs found
Algorithms for nonnegative matrix factorization with the beta-divergence
This paper describes algorithms for nonnegative matrix factorization (NMF)
with the beta-divergence (beta-NMF). The beta-divergence is a family of cost
functions parametrized by a single shape parameter beta that takes the
Euclidean distance, the Kullback-Leibler divergence and the Itakura-Saito
divergence as special cases (beta = 2,1,0, respectively). The proposed
algorithms are based on a surrogate auxiliary function (a local majorization of
the criterion function). We first describe a majorization-minimization (MM)
algorithm that leads to multiplicative updates, which differ from standard
heuristic multiplicative updates by a beta-dependent power exponent. The
monotonicity of the heuristic algorithm can however be proven for beta in (0,1)
using the proposed auxiliary function. Then we introduce the concept of
majorization-equalization (ME) algorithm which produces updates that move along
constant level sets of the auxiliary function and lead to larger steps than MM.
Simulations on synthetic and real data illustrate the faster convergence of the
ME approach. The paper also describes how the proposed algorithms can be
adapted to two common variants of NMF : penalized NMF (i.e., when a penalty
function of the factors is added to the criterion function) and convex-NMF
(when the dictionary is assumed to belong to a known subspace).Comment: \`a para\^itre dans Neural Computatio
Improving independent vector analysis in speech and noise separation tasks
Independent vector analysis (IVA) is an efficient multichannel blind source separation method. However, source models conventionally assumed in IVA present some limitations in case of speech and noise separation tasks. Consequently, it is expected that using better source models that overcome these limitations will improve the source separation performance of IVA. In this work, an extension of IVA is proposed, with a new source model more suitable for speech and noise separation tasks. The proposed extended IVA was evaluated in a speech and noise separation
task, where it was proven to improve separation performance over baseline IVA. Furthermore, extended IVA was evaluated with several post-filters, aiming to realize an analogous setup to a multichannel Wiener filter (MWF) system. This kind of setup proved to further increase the separation performance of IVA
Recent Advances in Signal Processing
The signal processing task is a very critical issue in the majority of new technological inventions and challenges in a variety of applications in both science and engineering fields. Classical signal processing techniques have largely worked with mathematical models that are linear, local, stationary, and Gaussian. They have always favored closed-form tractability over real-world accuracy. These constraints were imposed by the lack of powerful computing tools. During the last few decades, signal processing theories, developments, and applications have matured rapidly and now include tools from many areas of mathematics, computer science, physics, and engineering. This book is targeted primarily toward both students and researchers who want to be exposed to a wide variety of signal processing techniques and algorithms. It includes 27 chapters that can be categorized into five different areas depending on the application at hand. These five categories are ordered to address image processing, speech processing, communication systems, time-series analysis, and educational packages respectively. The book has the advantage of providing a collection of applications that are completely independent and self-contained; thus, the interested reader can choose any chapter and skip to another without losing continuity
Application of sound source separation methods to advanced spatial audio systems
This thesis is related to the field of Sound Source Separation (SSS). It addresses the development
and evaluation of these techniques for their application in the resynthesis of high-realism sound scenes by
means of Wave Field Synthesis (WFS). Because the vast majority of audio recordings are preserved in twochannel
stereo format, special up-converters are required to use advanced spatial audio reproduction formats,
such as WFS. This is due to the fact that WFS needs the original source signals to be available, in order to
accurately synthesize the acoustic field inside an extended listening area. Thus, an object-based mixing is
required.
Source separation problems in digital signal processing are those in which several signals have been mixed
together and the objective is to find out what the original signals were. Therefore, SSS algorithms can be applied
to existing two-channel mixtures to extract the different objects that compose the stereo scene. Unfortunately,
most stereo mixtures are underdetermined, i.e., there are more sound sources than audio channels. This
condition makes the SSS problem especially difficult and stronger assumptions have to be taken, often related to
the sparsity of the sources under some signal transformation.
This thesis is focused on the application of SSS techniques to the spatial sound reproduction field. As a result,
its contributions can be categorized within these two areas. First, two underdetermined SSS methods are
proposed to deal efficiently with the separation of stereo sound mixtures. These techniques are based on a
multi-level thresholding segmentation approach, which enables to perform a fast and unsupervised separation of
sound sources in the time-frequency domain. Although both techniques rely on the same clustering type, the
features considered by each of them are related to different localization cues that enable to perform separation
of either instantaneous or real mixtures.Additionally, two post-processing techniques aimed at
improving the isolation of the separated sources are proposed. The performance achieved by
several SSS methods in the resynthesis of WFS sound scenes is afterwards evaluated by means of
listening tests, paying special attention to the change observed in the perceived spatial attributes.
Although the estimated sources are distorted versions of the original ones, the masking effects
involved in their spatial remixing make artifacts less perceptible, which improves the overall
assessed quality. Finally, some novel developments related to the application of time-frequency
processing to source localization and enhanced sound reproduction are presented.Cobos Serrano, M. (2009). Application of sound source separation methods to advanced spatial audio systems [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/8969Palanci
Unsupervised video indexing on audiovisual characterization of persons
Cette thèse consiste à proposer une méthode de caractérisation non-supervisée des intervenants dans les documents audiovisuels, en exploitant des données liées à leur apparence physique et à leur voix. De manière générale, les méthodes d'identification automatique, que ce soit en vidéo ou en audio, nécessitent une quantité importante de connaissances a priori sur le contenu. Dans ce travail, le but est d'étudier les deux modes de façon corrélée et d'exploiter leur propriété respective de manière collaborative et robuste, afin de produire un résultat fiable aussi indépendant que possible de toute connaissance a priori. Plus particulièrement, nous avons étudié les caractéristiques du flux audio et nous avons proposé plusieurs méthodes pour la segmentation et le regroupement en locuteurs que nous avons évaluées dans le cadre d'une campagne d'évaluation. Ensuite, nous avons mené une étude approfondie sur les descripteurs visuels (visage, costume) qui nous ont servis à proposer de nouvelles approches pour la détection, le suivi et le regroupement des personnes. Enfin, le travail s'est focalisé sur la fusion des données audio et vidéo en proposant une approche basée sur le calcul d'une matrice de cooccurrence qui nous a permis d'établir une association entre l'index audio et l'index vidéo et d'effectuer leur correction. Nous pouvons ainsi produire un modèle audiovisuel dynamique des intervenants.This thesis consists to propose a method for an unsupervised characterization of persons within audiovisual documents, by exploring the data related for their physical appearance and their voice. From a general manner, the automatic recognition methods, either in video or audio, need a huge amount of a priori knowledge about their content. In this work, the goal is to study the two modes in a correlated way and to explore their properties in a collaborative and robust way, in order to produce a reliable result as independent as possible from any a priori knowledge. More particularly, we have studied the characteristics of the audio stream and we have proposed many methods for speaker segmentation and clustering and that we have evaluated in a french competition. Then, we have carried a deep study on visual descriptors (face, clothing) that helped us to propose novel approches for detecting, tracking, and clustering of people within the document. Finally, the work was focused on the audiovisual fusion by proposing a method based on computing the cooccurrence matrix that allowed us to establish an association between audio and video indexes, and to correct them. That will enable us to produce a dynamic audiovisual model for each speaker
Recommended from our members
Advanced robust non-invasive foetal heart detection techniques during active labour using one pair of transabdominal electrodes
The thesis proposes and evaluates three state-of-the-art signal processing techniques to detect fetal heartbeats within each maternal cardiac cycle, during labour contractions, using only a pair of transabdominal electrodes. The first and second techniques are, namely, the structured third- order cumulant-slice-template matching and the bispectral-contours-template matching for fetal QRS identification, respectively. The third technique is based on the modified and appropriately weighted spectral multiple signal classification (MUSIC) with incorporated covariance matrix for uterine contraction noise-like interfering signals also contaminated with noise. Essentially, two modifications to the standard MUSIC have been developed in order to enhance the performance of the spectral estimator in our applied work. The first modification involves the introduction of an optimised weighting function to the segmented ECG covariance matrix, and is chiefly aimed at enhancing the fetal QRS major spectral peak which occurs at around 30 Hz against the mother QRS major spectral peak usually occurring around 17 Hz and all other noise contributions. Additional optional pseudo-bispectral enhancement to sharpen the maternal and fetal spectral peaks, in particular when the mother and fetal R-waves are temporally coincident, have been achieved. The second modification to the spectral MUSIC is the removal of the unjustified assumption that only white Gaussian noise is present and the incorporation of the actual measured labour uterine contraction covariance matrix in reconfigured subspace analysis. This inevitably leads to the generalised eigenvectors - eigenvalues decomposition modern signal processing. This is now coined the modified, interference incorporated pseudo-spectral MUSIC. The above mentioned first and second techniques are higher-order statistics-based (HOS) and hybrid involving both signal processing and NN classifiers. The third technique is second-order statistics-based (SOS). In all techniques, the removal of signal non-linearity with the aid of non-linear Volterra synthesisers plays a crucial part in the fetal detection integrity.
Accurately assessed fetal heart classification rates as high as 95% have been achieved during labour, thus helping to provide non-invasive transparency to fetal intrapartum welfare. Performance analysis and evaluation processes involved more than 30 critical cases classified as “fetal under stress in labour” recorded in a London hospital database and used both transbadominal ECG electrodes and fetal scalp electrodes. The latter facilitates detection of the instantaneous fetal heart rate which is then used as the Reference Fetal Heart Rate in the assessment of the classification rate of each of the above mentioned techniques. It will be shown that the fetal heartbeats are completely masked by uterine activity and noise artefacts in all the recorded transabdominal maternal ECG signals. The fetal scalp electrode was, therefore, deemed necessary to provide the highest accurate measure of fetal heart functionality (from the hospital viewpoint), and in the assessment of the three non-invasive techniques presented in this thesis. The techniques may also be used during gestation and as early as 10 weeks
- …