15 research outputs found
Randomized Signal Processing with Continuous Frames
This paper focuses on signal processing tasks in which the signal is
transformed from the signal space to a higher dimensional phase space using a
continuous frame, processed in this space, and synthesized to an output signal.
For example, in a phase vocoder method, an audio signal is transformed to the
time-frequency plane via the short time Fourier transform, manipulated there,
and synthesized to an output audio signal. We show how to approximate such
methods, termed phase space signal processing methods, using a Monte Carlo
method. The Monte Carlo method speeds up computations, since the number of
samples required for a certain accuracy is proportional to the dimension of the
signal space, and not to the dimension of phase space, which is typically
higher. We utilize this property for a new phase vocoder method, based on an
enhanced time-frequency space, with more dimensions than the classical method.
The higher dimension of phase space improves the quality of the method, while
retaining the computational complexity of a standard phase vocoder based on
regular samples
Audio- ja puhesignaalien aika-asteikon muuttaminen
In audio time-scale modification (TSM), the duration of an audio recording is changed while retaining its local frequency content. In this thesis, a novel phase vocoder based technique for TSM was developed, which is based on the new concept of fuzzy classification of points in the time-frequency representation of an input signal. The points in the time-frequency representation are classified into three signal classes: tonalness, noisiness, and transientness. The information from the classification is used to preserve the distinct nature of these components during modification. The quality of the proposed method was evaluated by means of a listening test. The proposed method scored slightly higher than a state-of-the-art academic TSM technique, and similarly as a commercial TSM software. The proposed method is suitable for high-quality TSM of a wide variety of audio and speech signals.ĂĂ€nen aika-asteikon muuttamisessa ÀÀnitteen pituutta muokataan niin, ettĂ€ sen paikallinen taajuussisĂ€ltö sĂ€ilyy samanlaisena. TĂ€ssĂ€ diplomityössĂ€ kehitettiin uusi, vaihevokooderiin pohjautuva menetelmĂ€ ÀÀnen aika-asteikon muuttamiseen. MenetelmĂ€ perustuu ÀÀnen aikataajuusesityksen pisteiden sumeaan luokitteluun. Pisteet luokitellaan soinnillisiksi, kohinaisiksi ja transienttisiksi mÀÀrittĂ€mĂ€llĂ€ jatkuva totuusarvo pisteen kuulumiselle kuhunkin nĂ€istĂ€ luokista. Sumeasta luokittelusta saatua tietoa kĂ€ytetÀÀn hyvĂ€ksi nĂ€iden erilaisten signaalikomponenttien ominaisuuksien sĂ€ilyttĂ€miseen aika-asteikon muuttamisessa. Esitellyn menetelmĂ€n laatua arvioitiin kuuntelukokeen avulla. Esitelty menetelmĂ€ sai kokeessa hieman paremmat pisteet kuin viimeisintĂ€ tekniikkaa edustava akateeminen menetelmĂ€, ja samanlaiset pisteet kuin kaupallinen ohjelmisto. Esitelty menetelmĂ€ soveltuu monenlaisien musiikki- ja puhesignaalien aika-asteikon muuttamiseen
A transient-preserving audio time-stretching algorithm and a real-time realization for a commercial music product
The core of this work is a sub-band transient detection/preservation scheme based on the complex domain transient detection, and inspired by Robelâs work. This proposed technique can be integrated in a real-time phase vocoder analysis/synthesis scheme without introducing latency at relatively low computational cost
Computationally efficient music synthesis : methods and sound design
TÀssÀ diplomityössÀ esitetÀÀn musiikkisyntetisaattorin suunnittelua systeemille, jonka laskentateho ja muistikapasiteetti ovat rajoitettuja. Ensiksi kerrataan mahdollisia synteesitekniikoita sekÀ arvioidaan niiden kÀyttökelpoisuutta laskennallisesti tehokkaassa musiikkisynteesissÀ. KÀytÀnnössÀ kÀyttökelpoiset tekniikat ovat lisÀÀvÀ ja lÀhde-suodinsynteesit, ja erikoistapauksissa taajuusmodulaatio-, aaltotaulukko- ja samplaussynteesit.
TÀmÀn jÀlkeen kÀyttökelpoisten tekniikoiden rakenteiden suunnittelua esitetÀÀn tarkemmin, sekÀ esitetÀÀn nÀiden rakenteiden ominaisuuksia ja suunnitteluongelmia. Suurin ongelma kohdataan digitaalisessa lÀhde-suodinsynteesissÀ, jossa klassisten aaltomuotojen, kuten saha-aallon kÀyttö lÀhdesignaalina on ongelmallista laskostumisen takia, joka johtuu aaltomuodossa olevista epÀjatkuvuuksista. Olemassa olevia kaistarajoitettuja aaltomuotosynteesimenetelmiÀ kerrataan, ja polynomimuotoiseen kaistarajoitetuun askelfunktioon perustuvaa menetelmÀÀ esitellÀÀn tarkemmin antamalla suunnittelusÀÀntöjÀ kÀyttökelpoisille polynomeille. MenetelmÀÀ testataan lisÀksi kahdella kolmannen asteen polynomilla. NÀmÀ polynomit vÀhentÀvÀt laskostumista korkeilla taajuuksilla enemmÀn verrattuna ensimmÀisen asteen polynomiin, mutta pienillÀ taajuksilla ensimmÀisen asteen polynomi tuottaa parempia tuloksia. LisÀksi kerrataan muita mahdollisia ÀÀniefektialgoritmeja ja arvioidaan niiden kÀyttökelpoisuutta laskennallisesti tehokkaassa musiikkisynteesissÀ.
Useasti ÀÀnisynteesisysteemin tÀytyy pystyÀ generoimaan musiikkia, jossa kÀytetÀÀn monia erilaisia ÀÀniÀ, jotka ulottuvat oikeista akustisista soittimista elektronisiin soittimiin ja luonnon ÀÀniin. Siksi tÀllainen systeemi tarvitsee huolellista ÀÀnten suunnittelua. TÀssÀ diplomityössÀ esitetÀÀn suunnittelusÀÀntöjÀ erilaisten ÀÀnien imitoimiseksi. LisÀksi esitellÀÀn synteesimenetelmien parametrien vaikutus ÀÀnivarianttien suunnitteluun.In this thesis, the design of a music synthesizer for systems suffering from limitations in computing power and memory capacity is presented. First, different possible synthesis techniques are reviewed and their applicability in computationally efficient music synthesis is discussed. In practice, the applicable techniques are limited to additive and source-filter synthesis, and, in special cases, to frequency modulation, wavetable and sampling synthesis.
Next, the design of the structures of the applicable techniques are presented in detail, and properties and design issues of these structures are discussed. A major implementation problem is raised in digital source-filter synthesis, where the use of classic waveforms, such as sawtooth wave, as the source signal is challenging due to aliasing caused by waveform discontinuities. Methods for existing bandlimited waveform synthesis are reviewed, and a new approach using polynomial bandlimited step function is presented in detail with design rules for the applicable polynomials. The approach is also tested with two different third-order polynomials. They reduce aliasing more at high frequencies, but at low frequencies their performance is worse than with the first-order polynomial. In addition, some commonly used sound effect algorithms are reviewed with respect to their applicability in computationally efficient music synthesis.
In many cases the sound synthesis system must be capable of producing music consisting of various different sounds ranging from real acoustic instruments to electronic instruments and sounds from nature. Therefore, the music synthesis system requires careful sound design. In this thesis, sound design rules for imitation of various sounds using the computationally efficient synthesis techniques are presented. In addition, the effects of the parameter variation for the design of sound variants are presented
Independent formant and pitch control applied to singing voice
Thesis (MScIng)--University of Stellenbosch, 2004.ENGLISH ABSTRACT: A singing voice can be manipulated artificially by means of a digital computer for the
purposes of creating new melodies or to correct existing ones. When the fundamental frequency
of an audio signal that represents a human voice is changed by simple algorithms,
the formants of the voice tend to move to new frequency locations, making it sound unnatural.
The main purpose is to design a technique by which the pitch and formants of a
singing voice can be controlled independently.AFRIKAANSE OPSOMMING: Onafhanklike formant- en toonhoogte beheer toegepas op ân sangstem: ân Sangstem kan
deur ân digitale rekenaar gemanipuleer word om nuwe melodieše te skep, of om bestaandes
te verbeter. Wanneer die fundamentele frekwensie van ân klanksein (wat ân menslike stem
voorstel) deur ân eenvoudige algoritme verander word, skuif die oorspronklike formante
na nuwe frekwensie gebiede. Dit veroorsaak dat die resultaat onnatuurlik klink. Die hoof
oogmerk is om ân tegniek te ontwerp wat die toonhoogte en die formante van ân sangstem
apart kan beheer
Statistical models for natural sounds
It is important to understand the rich structure of natural sounds in order to solve important
tasks, like automatic speech recognition, and to understand auditory processing
in the brain. This thesis takes a step in this direction by characterising the statistics of
simple natural sounds. We focus on the statistics because perception often appears to
depend on them, rather than on the raw waveform. For example the perception of auditory
textures, like running water, wind, fire and rain, depends on summary-statistics,
like the rate of falling rain droplets, rather than on the exact details of the physical
source.
In order to analyse the statistics of sounds accurately it is necessary to improve a
number of traditional signal processing methods, including those for amplitude demodulation,
time-frequency analysis, and sub-band demodulation. These estimation tasks
are ill-posed and therefore it is natural to treat them as Bayesian inference problems.
The new probabilistic versions of these methods have several advantages. For example,
they perform more accurately on natural signals and are more robust to noise,
they can also fill-in missing sections of data, and provide error-bars. Furthermore,
free-parameters can be learned from the signal. Using these new algorithms we demonstrate
that the energy, sparsity, modulation depth and modulation time-scale in each
sub-band of a signal are critical statistics, together with the dependencies between the
sub-band modulators. In order to validate this claim, a model containing co-modulated
coloured noise carriers is shown to be capable of generating a range of realistic sounding
auditory textures.
Finally, we explored the connection between the statistics of natural sounds and perception.
We demonstrate that inference in the model for auditory textures qualitatively
replicates the primitive grouping rules that listeners use to understand simple acoustic
scenes. This suggests that the auditory system is optimised for the statistics of natural
sounds