Search CORE

5 research outputs found

Spectrogram inversion and potential applications for hearing research

Author: Decorsière Remi Julien Blaise
Publication venue: Technical University of Denmark, Department of Electrical Engineering
Publication date: 01/01/2013
Field of study

Single channel signal separation using pseudo-stereo model and time-freqency masking

Author: Tengtrairat Naruephorn
Publication venue: Newcastle University
Publication date: 01/01/2013
Field of study

PhD ThesisIn many practical applications, one sensor is only available to record a mixture of a number of signals. Single-channel blind signal separation (SCBSS) is the research topic that addresses the problem of recovering the original signals from the observed mixture without (or as little as possible) any prior knowledge of the signals. Given a single mixture, a new pseudo-stereo mixing model is developed. A “pseudo-stereo” mixture is formulated by weighting and time-shifting the original single-channel mixture. This creates an artificial resemblance of a stereo signal given by one location which results in the same time-delay but different attenuation of the source signals. The pseudo-stereo mixing model relaxes the underdetermined ill-conditions associated with monaural source separation and begets the advantage of the relationship of the signals between the readily observed mixture and the pseudo-stereo mixture. This research proposes three novel algorithms based on the pseudo-stereo mixing model and the binary time-frequency (TF) mask. Firstly, the proposed SCBSS algorithm estimates signals’ weighted coefficients from a ratio of the pseudo-stereo mixing model and then constructs a binary maximum likelihood TF masking for separating the observed mixture. Secondly, a mixture in noisy background environment is considered. Thus, a mixture enhancement algorithm has been developed and the proposed SCBSS algorithm is reformulated using an adaptive coefficients estimator. The adaptive coefficients estimator computes the signal characteristics for each time frame. This property is desirable for both speech and audio signals as they are aptly characterized as non-stationary AR processes. Finally, a multiple-time delay (MTD) pseudo-stereo SINGLE CHANNEL SIGNAL SEPARATION ii mixture is developed. The MTD mixture enhances the flexibility as well as the separability over the originally proposed pseudo-stereo mixing model. The separation algorithm of the MTD mixture has also been derived. Additionally, comparison analysis between the MTD mixture and the pseudo-stereo mixture has also been identified. All algorithms have been demonstrated by synthesized and real-audio signals. The performance of source separation has been assessed by measuring the distortion between original source and the estimated one according to the signal-to-distortion (SDR) ratio. Results show that all proposed SCBSS algorithms yield a significantly better separation performance with an average SDR improvement that ranges from 2.4dB to 5dB per source and they are computationally faster over the benchmarked algorithms.Payap University

Newcastle University eTheses

Probabilistic inference of speech signals from phaseless spectrograms

Author: Brendan J. Frey
Kannan Achan
Sam T. Roweis
Publication venue: MIT Press
Publication date
Field of study

Many techniques for complex speech processing such as denoising and deconvolution, time/frequency warping, multiple speaker separation, and multiple microphone analysis operate on sequences of short-time power spectra (spectrograms), a representation which is often well-suited to these tasks. However, a significant problem with algorithms that manipulate spectrograms is that the output spectrogram does not include a phase component, which is needed to create a time-domain signal that has good perceptual quality. Here we describe a generative model of time-domain speech signals and their spectrograms, and show how an efficient optimizer can be used to find the maximum a posteriori speech signal, given the spectrogram. In contrast to techniques that alternate between estimating the phase and a spectrally-consistent signal, our technique directly infers the speech signal, thus jointly optimizing the phase and a spectrally-consistent signal. We compare our technique with a standard method using signal-to-noise ratios, but we also provide audio files on the web for the purpose of demonstrating the improvement in perceptual quality that our technique offers.

CiteSeerX

Hierarchical learning : theory with applications in speech and vision

Author: Bouvrie Jacob V
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2009
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Brain and Cognitive Sciences, 2009.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Cataloged from student submitted PDF version of thesis.Includes bibliographical references (p. 123-132).Over the past two decades several hierarchical learning models have been developed and applied to a diverse range of practical tasks with much success. Little is known, however, as to why such models work as well as they do. Indeed, most are difficult to analyze, and cannot be easily characterized using the established tools from statistical learning theory. In this thesis, we study hierarchical learning architectures from two complementary perspectives: one theoretical and the other empirical. The theoretical component of the thesis centers on a mathematical framework describing a general family of hierarchical learning architectures. The primary object of interest is a recursively defined feature map, and its associated kernel. The class of models we consider exploit the fact that data in a wide variety of problems satisfy a decomposability property. Paralleling the primate visual cortex, hierarchies are assembled from alternating filtering and pooling stages that build progressively invariant representations which are simultaneously selective for increasingly complex stimuli. A goal of central importance in the study of hierarchical architectures and the cortex alike, is that of understanding quantitatively the tradeoff between invariance and selectivity, and how invariance and selectivity contribute towards providing an improved representation useful for learning from data. A reasonable expectation is that an unsupervised hierarchical representation will positively impact the sample complexity of a corresponding supervised learning task.(cont.) We therefore analyze invariance and discrimination properties that emerge in particular instances of layered models described within our framework. A group-theoretic analysis leads to a concise set of conditions which must be met to establish invariance, as well as a constructive prescription for meeting those conditions. An information-theoretic analysis is then undertaken and seen as a means by which to characterize a model's discrimination properties. The empirical component of the thesis experimentally evaluates key assumptions built into the mathematical framework. In the case of images, we present simulations which support the hypothesis that layered architectures can reduce the sample complexity of a non-trivial learning problem. In the domain of speech, we describe a 3 localized analysis technique that leads to a noise-robust representation. The resulting biologically-motivated features are found to outperform traditional methods on a standard phonetic classification task in both clean and noisy conditions.by Jacob V. Bouvrie.Ph.D

DSpace@MIT