38 research outputs found

    A Phase Vocoder based on Nonstationary Gabor Frames

    Full text link
    We propose a new algorithm for time stretching music signals based on the theory of nonstationary Gabor frames (NSGFs). The algorithm extends the techniques of the classical phase vocoder (PV) by incorporating adaptive time-frequency (TF) representations and adaptive phase locking. The adaptive TF representations imply good time resolution for the onsets of attack transients and good frequency resolution for the sinusoidal components. We estimate the phase values only at peak channels and the remaining phases are then locked to the values of the peaks in an adaptive manner. During attack transients we keep the stretch factor equal to one and we propose a new strategy for determining which channels are relevant for reinitializing the corresponding phase values. In contrast to previously published algorithms we use a non-uniform NSGF to obtain a low redundancy of the corresponding TF representation. We show that with just three times as many TF coefficients as signal samples, artifacts such as phasiness and transient smearing can be greatly reduced compared to the classical PV. The proposed algorithm is tested on both synthetic and real world signals and compared with state of the art algorithms in a reproducible manner.Comment: 10 pages, 6 figure

    Audio- ja puhesignaalien aika-asteikon muuttaminen

    Get PDF
    In audio time-scale modification (TSM), the duration of an audio recording is changed while retaining its local frequency content. In this thesis, a novel phase vocoder based technique for TSM was developed, which is based on the new concept of fuzzy classification of points in the time-frequency representation of an input signal. The points in the time-frequency representation are classified into three signal classes: tonalness, noisiness, and transientness. The information from the classification is used to preserve the distinct nature of these components during modification. The quality of the proposed method was evaluated by means of a listening test. The proposed method scored slightly higher than a state-of-the-art academic TSM technique, and similarly as a commercial TSM software. The proposed method is suitable for high-quality TSM of a wide variety of audio and speech signals.Äänen aika-asteikon muuttamisessa äänitteen pituutta muokataan niin, että sen paikallinen taajuussisältö säilyy samanlaisena. Tässä diplomityössä kehitettiin uusi, vaihevokooderiin pohjautuva menetelmä äänen aika-asteikon muuttamiseen. Menetelmä perustuu äänen aikataajuusesityksen pisteiden sumeaan luokitteluun. Pisteet luokitellaan soinnillisiksi, kohinaisiksi ja transienttisiksi määrittämällä jatkuva totuusarvo pisteen kuulumiselle kuhunkin näistä luokista. Sumeasta luokittelusta saatua tietoa käytetään hyväksi näiden erilaisten signaalikomponenttien ominaisuuksien säilyttämiseen aika-asteikon muuttamisessa. Esitellyn menetelmän laatua arvioitiin kuuntelukokeen avulla. Esitelty menetelmä sai kokeessa hieman paremmat pisteet kuin viimeisintä tekniikkaa edustava akateeminen menetelmä, ja samanlaiset pisteet kuin kaupallinen ohjelmisto. Esitelty menetelmä soveltuu monenlaisien musiikki- ja puhesignaalien aika-asteikon muuttamiseen

    Phase vocoder and beyond

    Get PDF
    For a broad range of sound transformations, quality is measured according to the common expectation about the result: if a male’s voice has to be changed in a female’s one, there exists a common reference for the perceptive evaluation of the result; the same holds if an instrumental sound has to be made longer, or shorter. Following the argument in Röbel, “Between Physics and Perception: Signal Models for High Level Audio Processing”, a fundamental requirement for these transformation algorithms is their need of signal models that are strongly linked to perceptually relevant physical properties of the sound source. This paper is a short survey about the phase vocoder technique, together with its extensions and improvements relying on appropriate sound models, which have led to high level audio processing algorithms

    A tutorial on onset detection in music signals

    Full text link

    Designing Gabor windows using convex optimization

    Full text link
    Redundant Gabor frames admit an infinite number of dual frames, yet only the canonical dual Gabor system, constructed from the minimal l2-norm dual window, is widely used. This window function however, might lack desirable properties, e.g. good time-frequency concentration, small support or smoothness. We employ convex optimization methods to design dual windows satisfying the Wexler-Raz equations and optimizing various constraints. Numerical experiments suggest that alternate dual windows with considerably improved features can be found

    Towards Real-Time Non-Stationary Sinusoidal Modelling of Kick and Bass Sounds for Audio Analysis and Modification

    Get PDF
    Sinusoidal Modelling is a powerful and flexible parametric method for analysing and processing audio signals. These signals have an underlying structure that modern spectral models aim to exploit by separating the signal into sinusoidal, transient, and noise components. Each of these can then be modelled in a manner most appropriate to that component's inherent structure. The accuracy of the estimated parameters is directly related to the quality of the model's representation of the signal, and the assumptions made about its underlying structure. For sinusoidal models, these assumptions generally affect the non-stationary estimates related to amplitude and frequency modulations, and the type of amplitude change curve. This is especially true when using a single analysis frame in a non-overlapping framework, where biased estimates can result in discontinuities at frame boundaries. It is therefore desirable for such a model to distinguish between the shape of different amplitude changes and adapt the estimation of this accordingly. Intra-frame amplitude change can be interpreted as a change in the windowing function applied to a stationary sinusoid, which can be estimated from the derivative of the phase with respect to frequency at magnitude peaks in the DFT spectrum. A method for measuring monotonic linear amplitude change from single-frame estimates using the first-order derivative of the phase with respect to frequency (approximated by the first-order difference) is presented, along with a method of distinguishing between linear and exponential amplitude change. An adaption of the popular matching pursuit algorithm for refining model parameters in a segmented framework has been investigated using a dictionary comprised of sinusoids with parameters varying slightly from model estimates, based on Modelled Pursuit (MoP). Modelling of the residual signal using a segmented undecimated Wavelet Transform (segUWT) is presented. A generalisation for both the forward and inverse transforms, for delay compensations and overlap extensions for different lengths of Wavelets and the number of decomposition levels in an Overlap Save (OLS) implementation for dealing with convolution block-based artefacts is presented. This shift invariant implementation of the DWT is a popular tool for de-noising and shows promising results for the separation of transients from noise

    Interactive Manipulation of Musical Melody in Audio Recordings

    Get PDF
    The objective of this project is to develop an interactive technique to manipulate melody in musical recordings. The proposed methodology is based on the use of melody detection methods combined with the invertible constant Q transform (CQT), which allows a high-quality modification of musical content. This work will consist of several stages, the first of which will focus on monophonic recordings and subsequently we will explore methods to manipulate polyphonic recordings. The long-term objective is to alter a melody of a piece of music in such a way that it may sound similar to another. We have set, as and end goal, to allows users to perform melody manipulation and experiment with their music collection. To achieve this goal, we will devise approaches for high quality polyphonic melody manipulation, using a dataset of melodic content and mixed audio recordings. To ensure the system's usability, a listening test or user-study evaluation of the algorithm will be performed

    A Parametric Sound Object Model for Sound Texture Synthesis

    Get PDF
    This thesis deals with the analysis and synthesis of sound textures based on parametric sound objects. An overview is provided about the acoustic and perceptual principles of textural acoustic scenes, and technical challenges for analysis and synthesis are considered. Four essential processing steps for sound texture analysis are identifi ed, and existing sound texture systems are reviewed, using the four-step model as a guideline. A theoretical framework for analysis and synthesis is proposed. A parametric sound object synthesis (PSOS) model is introduced, which is able to describe individual recorded sounds through a fi xed set of parameters. The model, which applies to harmonic and noisy sounds, is an extension of spectral modeling and uses spline curves to approximate spectral envelopes, as well as the evolution of parameters over time. In contrast to standard spectral modeling techniques, this representation uses the concept of objects instead of concatenated frames, and it provides a direct mapping between sounds of diff erent length. Methods for automatic and manual conversion are shown. An evaluation is presented in which the ability of the model to encode a wide range of di fferent sounds has been examined. Although there are aspects of sounds that the model cannot accurately capture, such as polyphony and certain types of fast modulation, the results indicate that high quality synthesis can be achieved for many different acoustic phenomena, including instruments and animal vocalizations. In contrast to many other forms of sound encoding, the parametric model facilitates various techniques of machine learning and intelligent processing, including sound clustering and principal component analysis. Strengths and weaknesses of the proposed method are reviewed, and possibilities for future development are discussed