Search CORE

38 research outputs found

A Phase Vocoder based on Nonstationary Gabor Frames

Author: Dörfler Monika
Ottosen Emil Solsbæk
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

We propose a new algorithm for time stretching music signals based on the theory of nonstationary Gabor frames (NSGFs). The algorithm extends the techniques of the classical phase vocoder (PV) by incorporating adaptive time-frequency (TF) representations and adaptive phase locking. The adaptive TF representations imply good time resolution for the onsets of attack transients and good frequency resolution for the sinusoidal components. We estimate the phase values only at peak channels and the remaining phases are then locked to the values of the peaks in an adaptive manner. During attack transients we keep the stretch factor equal to one and we propose a new strategy for determining which channels are relevant for reinitializing the corresponding phase values. In contrast to previously published algorithms we use a non-uniform NSGF to obtain a low redundancy of the corresponding TF representation. We show that with just three times as many TF coefficients as signal samples, artifacts such as phasiness and transient smearing can be greatly reduced compared to the classical PV. The proposed algorithm is tested on both synthetic and real world signals and compared with state of the art algorithms in a reproducible manner.Comment: 10 pages, 6 figure

arXiv.org e-Print Archive

VBN

Audio- ja puhesignaalien aika-asteikon muuttaminen

Author: Damskägg Eero-Pekka
Publication venue
Publication date: 12/02/2018
Field of study

In audio time-scale modification (TSM), the duration of an audio recording is changed while retaining its local frequency content. In this thesis, a novel phase vocoder based technique for TSM was developed, which is based on the new concept of fuzzy classification of points in the time-frequency representation of an input signal. The points in the time-frequency representation are classified into three signal classes: tonalness, noisiness, and transientness. The information from the classification is used to preserve the distinct nature of these components during modification. The quality of the proposed method was evaluated by means of a listening test. The proposed method scored slightly higher than a state-of-the-art academic TSM technique, and similarly as a commercial TSM software. The proposed method is suitable for high-quality TSM of a wide variety of audio and speech signals.Äänen aika-asteikon muuttamisessa äänitteen pituutta muokataan niin, että sen paikallinen taajuussisältö säilyy samanlaisena. Tässä diplomityössä kehitettiin uusi, vaihevokooderiin pohjautuva menetelmä äänen aika-asteikon muuttamiseen. Menetelmä perustuu äänen aikataajuusesityksen pisteiden sumeaan luokitteluun. Pisteet luokitellaan soinnillisiksi, kohinaisiksi ja transienttisiksi määrittämällä jatkuva totuusarvo pisteen kuulumiselle kuhunkin näistä luokista. Sumeasta luokittelusta saatua tietoa käytetään hyväksi näiden erilaisten signaalikomponenttien ominaisuuksien säilyttämiseen aika-asteikon muuttamisessa. Esitellyn menetelmän laatua arvioitiin kuuntelukokeen avulla. Esitelty menetelmä sai kokeessa hieman paremmat pisteet kuin viimeisintä tekniikkaa edustava akateeminen menetelmä, ja samanlaiset pisteet kuin kaupallinen ohjelmisto. Esitelty menetelmä soveltuu monenlaisien musiikki- ja puhesignaalien aika-asteikon muuttamiseen

Aaltodoc Publication Archive

Phase vocoder and beyond

Author: Liuni Marco
Röbel Axel
Publication venue: Musica/Tecnologia
Publication date: 05/08/2013
Field of study

For a broad range of sound transformations, quality is measured according to the common expectation about the result: if a male’s voice has to be changed in a female’s one, there exists a common reference for the perceptive evaluation of the result; the same holds if an instrumental sound has to be made longer, or shorter. Following the argument in Röbel, “Between Physics and Perception: Signal Models for High Level Audio Processing”, a fundamental requirement for these transformation algorithms is their need of signal models that are strongly linked to perceptually relevant physical properties of the sound source. This paper is a short survey about the phase vocoder technique, together with its extensions and improvements relying on appropriate sound models, which have led to high level audio processing algorithms

Firenze University Press: E-Journals

Modulation vocoder for analysis, processing and synthesis of audio signals with application to frequency selective pitch transposition

Author: Disch Sascha
Publication venue: Hannover : Gottfried Wilhelm Leibniz Universität Hannover
Publication date: 01/01/2011
Field of study

[no abstract

Fraunhofer-ePrints

Institutionelles Repositorium der Leibniz Universität Hannover

A tutorial on onset detection in music signals

Author: C. Duxbury
J.P. Bello
L. Daudet
M. Davies
M.B. Sandler
S. Abdallah
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Designing Gabor windows using convex optimization

Author: Balazs Peter
Holighaus Nicki
Perraudin Nathanaël
Søndergaard Peter L.
Publication venue
Publication date: 11/04/2018
Field of study

Redundant Gabor frames admit an infinite number of dual frames, yet only the canonical dual Gabor system, constructed from the minimal l2-norm dual window, is widely used. This window function however, might lack desirable properties, e.g. good time-frequency concentration, small support or smoothness. We employ convex optimization methods to design dual windows satisfying the Wexler-Raz equations and optimizing various constraints. Numerical experiments suggest that alternate dual windows with considerably improved features can be found

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Towards Real-Time Non-Stationary Sinusoidal Modelling of Kick and Bass Sounds for Audio Analysis and Modification

Author: Murray John Stuart
Publication venue
Publication date: 01/09/2022
Field of study

Sinusoidal Modelling is a powerful and flexible parametric method for analysing and processing audio signals. These signals have an underlying structure that modern spectral models aim to exploit by separating the signal into sinusoidal, transient, and noise components. Each of these can then be modelled in a manner most appropriate to that component's inherent structure. The accuracy of the estimated parameters is directly related to the quality of the model's representation of the signal, and the assumptions made about its underlying structure. For sinusoidal models, these assumptions generally affect the non-stationary estimates related to amplitude and frequency modulations, and the type of amplitude change curve. This is especially true when using a single analysis frame in a non-overlapping framework, where biased estimates can result in discontinuities at frame boundaries. It is therefore desirable for such a model to distinguish between the shape of different amplitude changes and adapt the estimation of this accordingly. Intra-frame amplitude change can be interpreted as a change in the windowing function applied to a stationary sinusoid, which can be estimated from the derivative of the phase with respect to frequency at magnitude peaks in the DFT spectrum. A method for measuring monotonic linear amplitude change from single-frame estimates using the first-order derivative of the phase with respect to frequency (approximated by the first-order difference) is presented, along with a method of distinguishing between linear and exponential amplitude change. An adaption of the popular matching pursuit algorithm for refining model parameters in a segmented framework has been investigated using a dictionary comprised of sinusoids with parameters varying slightly from model estimates, based on Modelled Pursuit (MoP). Modelling of the residual signal using a segmented undecimated Wavelet Transform (segUWT) is presented. A generalisation for both the forward and inverse transforms, for delay compensations and overlap extensions for different lengths of Wavelets and the number of decomposition levels in an Overlap Save (OLS) implementation for dealing with convolution block-based artefacts is presented. This shift invariant implementation of the DWT is a popular tool for de-noising and shows promising results for the separation of transients from noise

White Rose E-theses Online

Interactive Manipulation of Musical Melody in Audio Recordings

Author: Miguel Miranda Guedes da Rocha e Silva
Publication venue
Publication date: 12/07/2017
Field of study

The objective of this project is to develop an interactive technique to manipulate melody in musical recordings. The proposed methodology is based on the use of melody detection methods combined with the invertible constant Q transform (CQT), which allows a high-quality modification of musical content. This work will consist of several stages, the first of which will focus on monophonic recordings and subsequently we will explore methods to manipulate polyphonic recordings. The long-term objective is to alter a melody of a piece of music in such a way that it may sound similar to another. We have set, as and end goal, to allows users to perform melody manipulation and experiment with their music collection. To achieve this goal, we will devise approaches for high quality polyphonic melody manipulation, using a dataset of melodic content and mixed audio recordings. To ensure the system's usability, a listening test or user-study evaluation of the algorithm will be performed

Repositório Aberto da Universidade do Porto

A Parametric Sound Object Model for Sound Texture Synthesis

Author: Möhlmann Daniel
Publication venue
Publication date: 01/01/2011
Field of study

This thesis deals with the analysis and synthesis of sound textures based on parametric sound objects. An overview is provided about the acoustic and perceptual principles of textural acoustic scenes, and technical challenges for analysis and synthesis are considered. Four essential processing steps for sound texture analysis are identifi ed, and existing sound texture systems are reviewed, using the four-step model as a guideline. A theoretical framework for analysis and synthesis is proposed. A parametric sound object synthesis (PSOS) model is introduced, which is able to describe individual recorded sounds through a fi xed set of parameters. The model, which applies to harmonic and noisy sounds, is an extension of spectral modeling and uses spline curves to approximate spectral envelopes, as well as the evolution of parameters over time. In contrast to standard spectral modeling techniques, this representation uses the concept of objects instead of concatenated frames, and it provides a direct mapping between sounds of diff erent length. Methods for automatic and manual conversion are shown. An evaluation is presented in which the ability of the model to encode a wide range of di fferent sounds has been examined. Although there are aspects of sounds that the model cannot accurately capture, such as polyphony and certain types of fast modulation, the results indicate that high quality synthesis can be achieved for many different acoustic phenomena, including instruments and animal vocalizations. In contrast to many other forms of sound encoding, the parametric model facilitates various techniques of machine learning and intelligent processing, including sound clustering and principal component analysis. Strengths and weaknesses of the proposed method are reviewed, and possibilities for future development are discussed

E-LIB Dokumentserver - Staats und Universitätsbibliothek Bremen