10 research outputs found

    Superposition frames for adaptive time-frequency analysis and fast reconstruction

    Full text link
    In this article we introduce a broad family of adaptive, linear time-frequency representations termed superposition frames, and show that they admit desirable fast overlap-add reconstruction properties akin to standard short-time Fourier techniques. This approach stands in contrast to many adaptive time-frequency representations in the extant literature, which, while more flexible than standard fixed-resolution approaches, typically fail to provide efficient reconstruction and often lack the regular structure necessary for precise frame-theoretic analysis. Our main technical contributions come through the development of properties which ensure that this construction provides for a numerically stable, invertible signal representation. Our primary algorithmic contributions come via the introduction and discussion of specific signal adaptation criteria in deterministic and stochastic settings, based respectively on time-frequency concentration and nonstationarity detection. We conclude with a short speech enhancement example that serves to highlight potential applications of our approach.Comment: 16 pages, 6 figures; revised versio

    Denoising strategies for general finite frames

    Get PDF
    Overcomplete representations such as wavelets and windowed Fourier expansions have become mainstays of modern statistical data analysis. In the present work, in the context of general finite frames, we derive an oracle expression for the mean quadratic risk of a linear diagonal de-noising procedure which immediately yields the optimal linear diagonal estimator. Moreover, we obtain an expression for an unbiased estimator of the risk of any smooth shrinkage rule. This last result motivates a set of practical estimation procedures for general finite frames that can be viewed as the generalization of the classical procedures for orthonormal bases. A simulation study verifies the effectiveness of the proposed procedures with respect to the classical ones and confirms that the correlations induced by frame structure should be explicitly treated to yield an improvement in estimation precision

    On the performance of superposition window

    Get PDF
    Superposition window is often used in the digital signal processing and other fields of signal processing such as power spectral estimation and adaptive time-frequency analysis. Different overlap and windows used in superposition system may affect the final results. The main contribution of this paper is in providing the insight into the properties of the overlap-add technique with different window or overlap ratio, which is very helpful in selecting these parameters for a practical application

    An Entropy Based Method for Local Time-Adaptation of the Spectrogram

    Full text link
    We propose a method for automatic local time-adaptation of the spectrogram of audio signals: it is based on the decomposition of a signal within a Gabor multi-frame through the STFT operator. The sparsity of the analysis in every individual frame of the multi-frame is evaluated through the R\'enyi entropy measures: the best local resolution is determined minimizing the entropy values. The overall spectrogram of the signal we obtain thus provides local optimal resolution adaptively evolving over time. We give examples of the performance of our algorithm with an instrumental sound and a synthetic one, showing the improvement in spectrogram displaying obtained with an automatic adaptation of the resolution. The analysis operator is invertible, thus leading to a perfect reconstruction of the original signal through the analysis coefficients

    Objective assessment and reduction of noise in musical signal

    Get PDF
    Tato disertační práce pojednává o objektivním měření a potlačování rušivého šumu na pozadí hudebního signálu. V této práci je navrhnut nový algoritmus pro objektivní měření slyšitelnosti tohoto typu šumu. Provedenými poslechovými testy bylo prokázáno, že tento nový algoritmus lépe predikuje slyšitelnost šumu na pozadí než stávající algoritmy. Výhodou navrženého algoritmu je skutečnost, že lze tento algoritmus využít i na obecný zvukový signál, kdy se hodnotí slyšitelnost jednoho zvuku na pozadí jiného. U tohoto typu signálu stávající algoritmy často selhávají. Dále je v této práci navrhnut nový způsob adaptivní segmentace pro dělení dlouhotrvajícího zvukového signálu na krátké segmenty s proměnlivou délkou. Bylo ukázáno, že při použití tohoto nového způsobu segmentace v systémech pro potlačení šumu na pozadí, má výstupní zvukový signál vyšší subjektivně vnímanou kvalitu než ostatní testované způsoby segmentace.The dissertation thesis focuses on objective assessment and reduction of disturbing background noise in a musical signal. In this work, a new algorithm for the assessment of background noise audibility is proposed. The listening tests performed show that this new algorithm better predicts the background noise audibility than the existing algorithms do. An advantage of this new algorithm is the fact that it can be used even in the case of a general audio signal and not only musical signal, i.e. in the case when the audibility of one sound on the background of another sound is assessed. The existing algorithms often fail in this case. The next part of the dissertation thesis deals with an adaptive segmentation scheme for the segmentation of long-term musical signals into short segments of different lengths. A new adaptive segmentation scheme is then introduced here. It has been shown that this new adaptive segmentation scheme significantly improves the subjectively perceived quality of the musical signal from the output of noise reduction systems which use this new adaptive segmentation scheme. The quality improvement is better than that achieved by other segmentation schemes tested.

    Phase vocoder and beyond

    Get PDF
    For a broad range of sound transformations, quality is measured according to the common expectation about the result: if a male’s voice has to be changed in a female’s one, there exists a common reference for the perceptive evaluation of the result; the same holds if an instrumental sound has to be made longer, or shorter. Following the argument in Röbel, “Between Physics and Perception: Signal Models for High Level Audio Processing”, a fundamental requirement for these transformation algorithms is their need of signal models that are strongly linked to perceptually relevant physical properties of the sound source. This paper is a short survey about the phase vocoder technique, together with its extensions and improvements relying on appropriate sound models, which have led to high level audio processing algorithms

    Seeing sound: a new way to illustrate auditory objects and their neural correlates

    Full text link
    This thesis develops a new method for time-frequency signal processing and examines the relevance of the new representation in studies of neural coding in songbirds. The method groups together associated regions of the time-frequency plane into objects defined by time-frequency contours. By combining information about structurally stable contour shapes over multiple time-scales and angles, a signal decomposition is produced that distributes resolution adaptively. As a result, distinct signal components are represented in their own most parsimonious forms.  Next, through neural recordings in singing birds, it was found that activity in song premotor cortex is significantly correlated with the objects defined by this new representation of sound. In this process, an automated way of finding sub-syllable acoustic transitions in birdsongs was first developed, and then increased spiking probability was found at the boundaries of these acoustic transitions. Finally, a new approach to study auditory cortical sequence processing more generally is proposed. In this approach, songbirds were trained to discriminate Morse-code-like sequences of clicks, and the neural correlates of this behavior were examined in primary and secondary auditory cortex. It was found that a distinct transformation of auditory responses to the sequences of clicks exists as information transferred from primary to secondary auditory areas. Neurons in secondary auditory areas respond asynchronously and selectively -- in a manner that depends on the temporal context of the click. This transformation from a temporal to a spatial representation of sound provides a possible basis for the songbird's natural ability to discriminate complex temporal sequences

    Zolotarev polynomials utilization in spectral analysis

    Get PDF
    Tato práce je zaměřena na vybrané problémy Zolotarevových polynomů a jejich vyuľití ke spektrální analýze. Pokud jde o Zolotarevovy polynomy, jsou popsány základní vlastnosti symetrických Zolotarevových polynomů včetně ortogonality. Rovněľ se provádí prozkoumání numerických vlastností algoritmů generujících dokonce Zolotarevovy polynomy. Pokud jde o aplikaci Zolotarevových polynomů na spektrální analýzu, je implementována aproximovaná diskrétní Zolotarevova transformace, která umoľňuje výpočet spektrogramu (zologramu) v reálném čase. Aproximovaná diskrétní zolotarevská transformace je navíc upravena tak, aby lépe fungovala při analýze tlumených exponenciálních signálů. A nakonec je navrľena nová diskrétní Zolotarevova transformace implementovaná plně v časové oblasti. Tato transformace také ukazuje, ľe některé rysy pozorované u aproximované diskrétní Zolotarevovy transformace jsou důsledkem pouľití Zolotarevových polynomů.This thesis is focused on selected problems of symmetrical Zolotarev polynomials and their use in spectral analysis. Basic properties of symmetrical Zolotarev polynomials including orthogonality are described. Also, the exploration of numerical properties of algorithms generating even Zolotarev polynomials is performed. As regards to the application of Zolotarev polynomials to spectral analysis the Approximated Discrete Zolotarev Transform is implemented so that it enables computing of zologram in real–time. Moreover, the Approximated Discrete Zolotarev Transform is modified to perform better in the analysis of damped exponential signals. And finally, a novel Discrete Zolotarev Transform implemented fully in the time domain is suggested. This transform also shows that some features observed using the Approximated Discrete Zolotarev Transform are a consequence of using Zolotarev polynomials

    Toward an interpretive framework of two-dimensional speech-signal processing

    Get PDF
    Thesis (Ph. D.)--Harvard-MIT Division of Health Sciences and Technology, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 177-179).Traditional representations of speech are derived from short-time segments of the signal and result in time-frequency distributions of energy such as the short-time Fourier transform and spectrogram. Speech-signal models of such representations have had utility in a variety of applications such as speech analysis, recognition, and synthesis. Nonetheless, they do not capture spectral, temporal, and joint spectrotemporal energy fluctuations (or "modulations") present in local time-frequency regions of the time-frequency distribution. Inspired by principles from image processing and evidence from auditory neurophysiological models, a variety of twodimensional (2-D) processing techniques have been explored in the literature as alternative representations of speech; however, speech-based models are lacking in this framework. This thesis develops speech-signal models for a particular 2-D processing approach in which 2-D Fourier transforms are computed on local time-frequency regions of the canonical narrowband or wideband spectrogram; we refer to the resulting transformed space as the Grating Compression Transform (GCT). We argue for a 2-D sinusoidal-series amplitude modulation model of speech content in the spectrogram domain that relates to speech production characteristics such as pitch/noise of the source, pitch dynamics, formant structure and dynamics, and offset/onset content. Narrowband- and wideband-based models are shown to exhibit important distinctions in interpretation and oftentimes "dual" behavior. In the transformed GCT space, the modeling results in a novel taxonomy of signal behavior based on the distribution of formant and onset/offset content in the transformed space via source characteristics. Our formulation provides a speech-specific interpretation of the concept of "modulation" in 2-D processing in contrast to existing approaches that have done so either phenomenologically through qualitative analyses and/or implicitly through data-driven machine learning approaches. One implication of the proposed taxonomy is its potential for interpreting transformations of other time-frequency distributions such as the auditory spectrogram which is generally viewed as being "narrowband"/"wideband" in its low/high-frequency regions. The proposed signal model is evaluated in several ways. First, we perform analysis of synthetic speech signals to characterize its properties and limitations. Next, we develop an algorithm for analysis/synthesis of spectrograms using the model and demonstrate its ability to accurately represent real speech content. As an example application, we further apply the models in cochannel speaker separation, exploiting the GCT's ability to distribute speaker-specific content and often recover overlapping information through demodulation and interpolation in the 2-D GCT space. Specifically, in multi-pitch estimation, we demonstrate the GCT's ability to accurately estimate separate and crossing pitch tracks under certain conditions. Finally, we demonstrate the model's ability to separate mixtures of speech signals using both prior and estimated pitch information. Generalization to other speech-signal processing applications is proposed.by Tianyu Tom Wang.Ph.D
    corecore