81 research outputs found
Time-Varying Quasi-Closed-Phase Analysis for Accurate Formant Tracking in Speech Signals
In this paper, we propose a new method for the accurate estimation and
tracking of formants in speech signals using time-varying quasi-closed-phase
(TVQCP) analysis. Conventional formant tracking methods typically adopt a
two-stage estimate-and-track strategy wherein an initial set of formant
candidates are estimated using short-time analysis (e.g., 10--50 ms), followed
by a tracking stage based on dynamic programming or a linear state-space model.
One of the main disadvantages of these approaches is that the tracking stage,
however good it may be, cannot improve upon the formant estimation accuracy of
the first stage. The proposed TVQCP method provides a single-stage formant
tracking that combines the estimation and tracking stages into one. TVQCP
analysis combines three approaches to improve formant estimation and tracking:
(1) it uses temporally weighted quasi-closed-phase analysis to derive
closed-phase estimates of the vocal tract with reduced interference from the
excitation source, (2) it increases the residual sparsity by using the
optimization and (3) it uses time-varying linear prediction analysis over long
time windows (e.g., 100--200 ms) to impose a continuity constraint on the vocal
tract model and hence on the formant trajectories. Formant tracking experiments
with a wide variety of synthetic and natural speech signals show that the
proposed TVQCP method performs better than conventional and popular formant
tracking tools, such as Wavesurfer and Praat (based on dynamic programming),
the KARMA algorithm (based on Kalman filtering), and DeepFormants (based on
deep neural networks trained in a supervised manner). Matlab scripts for the
proposed method can be found at: https://github.com/njaygowda/ftrac
Computationally efficient music synthesis : methods and sound design
Tässä diplomityössä esitetään musiikkisyntetisaattorin suunnittelua systeemille, jonka laskentateho ja muistikapasiteetti ovat rajoitettuja. Ensiksi kerrataan mahdollisia synteesitekniikoita sekä arvioidaan niiden käyttökelpoisuutta laskennallisesti tehokkaassa musiikkisynteesissä. Käytännössä käyttökelpoiset tekniikat ovat lisäävä ja lähde-suodinsynteesit, ja erikoistapauksissa taajuusmodulaatio-, aaltotaulukko- ja samplaussynteesit.
Tämän jälkeen käyttökelpoisten tekniikoiden rakenteiden suunnittelua esitetään tarkemmin, sekä esitetään näiden rakenteiden ominaisuuksia ja suunnitteluongelmia. Suurin ongelma kohdataan digitaalisessa lähde-suodinsynteesissä, jossa klassisten aaltomuotojen, kuten saha-aallon käyttö lähdesignaalina on ongelmallista laskostumisen takia, joka johtuu aaltomuodossa olevista epäjatkuvuuksista. Olemassa olevia kaistarajoitettuja aaltomuotosynteesimenetelmiä kerrataan, ja polynomimuotoiseen kaistarajoitetuun askelfunktioon perustuvaa menetelmää esitellään tarkemmin antamalla suunnittelusääntöjä käyttökelpoisille polynomeille. Menetelmää testataan lisäksi kahdella kolmannen asteen polynomilla. Nämä polynomit vähentävät laskostumista korkeilla taajuuksilla enemmän verrattuna ensimmäisen asteen polynomiin, mutta pienillä taajuksilla ensimmäisen asteen polynomi tuottaa parempia tuloksia. Lisäksi kerrataan muita mahdollisia ääniefektialgoritmeja ja arvioidaan niiden käyttökelpoisuutta laskennallisesti tehokkaassa musiikkisynteesissä.
Useasti äänisynteesisysteemin täytyy pystyä generoimaan musiikkia, jossa käytetään monia erilaisia ääniä, jotka ulottuvat oikeista akustisista soittimista elektronisiin soittimiin ja luonnon ääniin. Siksi tällainen systeemi tarvitsee huolellista äänten suunnittelua. Tässä diplomityössä esitetään suunnittelusääntöjä erilaisten äänien imitoimiseksi. Lisäksi esitellään synteesimenetelmien parametrien vaikutus äänivarianttien suunnitteluun.In this thesis, the design of a music synthesizer for systems suffering from limitations in computing power and memory capacity is presented. First, different possible synthesis techniques are reviewed and their applicability in computationally efficient music synthesis is discussed. In practice, the applicable techniques are limited to additive and source-filter synthesis, and, in special cases, to frequency modulation, wavetable and sampling synthesis.
Next, the design of the structures of the applicable techniques are presented in detail, and properties and design issues of these structures are discussed. A major implementation problem is raised in digital source-filter synthesis, where the use of classic waveforms, such as sawtooth wave, as the source signal is challenging due to aliasing caused by waveform discontinuities. Methods for existing bandlimited waveform synthesis are reviewed, and a new approach using polynomial bandlimited step function is presented in detail with design rules for the applicable polynomials. The approach is also tested with two different third-order polynomials. They reduce aliasing more at high frequencies, but at low frequencies their performance is worse than with the first-order polynomial. In addition, some commonly used sound effect algorithms are reviewed with respect to their applicability in computationally efficient music synthesis.
In many cases the sound synthesis system must be capable of producing music consisting of various different sounds ranging from real acoustic instruments to electronic instruments and sounds from nature. Therefore, the music synthesis system requires careful sound design. In this thesis, sound design rules for imitation of various sounds using the computationally efficient synthesis techniques are presented. In addition, the effects of the parameter variation for the design of sound variants are presented
Developing a flexible and expressive realtime polyphonic wave terrain synthesis instrument based on a visual and multidimensional methodology
The Jitter extended library for Max/MSP is distributed with a gamut of tools for the generation, processing, storage, and visual display of multidimensional data structures. With additional support for a wide range of media types, and the interaction between these mediums, the environment presents a perfect working ground for Wave Terrain Synthesis. This research details the practical development of a realtime Wave Terrain Synthesis instrument within the Max/MSP programming environment utilizing the Jitter extended library. Various graphical processing routines are explored in relation to their potential use for Wave Terrain Synthesis
A Parametric Sound Object Model for Sound Texture Synthesis
This thesis deals with the analysis and synthesis of sound textures based on parametric sound objects. An overview is provided about the acoustic and perceptual principles of textural acoustic scenes, and technical challenges for analysis and synthesis are considered. Four essential processing steps for sound texture analysis are identifi ed, and existing sound texture systems are reviewed, using the four-step model as a guideline. A theoretical framework for analysis and synthesis is proposed. A parametric sound object synthesis (PSOS) model is introduced, which is able to describe individual recorded sounds through a fi xed set of parameters. The model, which applies to harmonic and noisy sounds, is an extension of spectral modeling and uses spline curves to approximate spectral envelopes, as well as the evolution of parameters over time. In contrast to standard spectral modeling techniques, this representation uses the concept of objects instead of concatenated frames, and it provides a direct mapping between sounds of diff erent length. Methods for automatic and manual conversion are shown. An evaluation is presented in which the ability of the model to encode a wide range of di fferent sounds has been examined. Although there are aspects of sounds that the model cannot accurately capture, such as polyphony and certain types of fast modulation, the results indicate that high quality synthesis can be achieved for many different acoustic phenomena, including instruments and animal vocalizations. In contrast to many other forms of sound encoding, the parametric model facilitates various techniques of machine learning and intelligent processing, including sound clustering and principal component analysis. Strengths and weaknesses of the proposed method are reviewed, and possibilities for future development are discussed
Astronomical component estimation (ACE v.1) by time-variant sinusoidal modeling
Accurately deciphering periodic variations in paleoclimate proxy signals is essential for cyclostratigraphy. Classical spectral analysis often relies on methods based on (fast) Fourier transformation. This technique has no unique solution separating variations in amplitude and frequency. This characteristic can make it difficult to correctly interpret a proxy's power spectrum or to accurately evaluate simultaneous changes in amplitude and frequency in evolutionary analyses. This drawback is circumvented by using a polynomial approach to estimate instantaneous amplitude and frequency in orbital components. This approach was proven useful to characterize audio signals (music and speech), which are non-stationary in nature. Paleoclimate proxy signals and audio signals share similar dynamics; the only difference is the frequency relationship between the different components. A harmonic-frequency relationship exists in audio signals, whereas this relation is non-harmonic in paleoclimate signals. However, this difference is irrelevant for the problem of separating simultaneous changes in amplitude and frequency. Using an approach with overlapping analysis frames, the model (Astronomical Component Estimation, version 1: ACE v.1) captures time variations of an orbital component by modulating a stationary sinusoid centered at its mean frequency, with a single polynomial. Hence, the parameters that determine the model are the mean frequency of the orbital component and the polynomial coefficients. The first parameter depends on geologic interpretations, whereas the latter are estimated by means of linear least-squares. As output, the model provides the orbital component waveform, either in the depth or time domain. Uncertainty analyses of the model estimates are performed using Monte Carlo simulations. Furthermore, it allows for a unique decomposition of the signal into its instantaneous amplitude and frequency. Frequency modulation patterns reconstruct changes in accumulation rate, whereas amplitude modulation identifies eccentricity-modulated precession. The functioning of the time-variant sinusoidal model is illustrated and validated using a synthetic insolation signal. The new modeling approach is tested on two case studies: (1) a Pliocene-Pleistocene benthic delta O-18 record from Ocean Drilling Program (ODP) Site 846 and (2) a Danian magnetic susceptibility record from the Contessa Highway section, Gubbio, Italy
Hidden Markov model based Finnish text-to-speech system utilizing glottal inverse filtering
Tässä työssä esitetään uusi Markovin piilomalleihin (hidden Markov model, HMM) perustuva äänilähteen käänteissuodatusta hyödyntävä suomenkielinen puhesynteesijärjestelmä. Uuden puhesynteesimenetelmän päätavoite on tuottaa luonnolliselta kuulostavaa synteettistä puhetta, jonka ominaisuuksia voidaan muuttaa eri puhujien, puhetyylien tai jopa äänen emootiosisällön mukaan. Näiden tavoitteiden mahdollistamiseksi uudessa puhesynteesimenetelmässä mallinnetaan ihmisen äänentuottojärjestelmää äänilähteen käänteissuodatuksen ja HMM-mallinnuksen avulla.
Uusi puhesynteesijärjestelmä hyödyntää äänilähteen käänteissuodatusmenetelmää, joka mahdollistaa äänilähteen ominaisuuksien parametrisoinnin erillään muista puheen parametreista, ja siten näiden parametrien mallintamisen erikseen HMM-järjestelmässä. Synteesivaiheessa luonnollisesta puheesta laskettuja glottispulsseja käytetään äänilähteen luomiseen, ja äänilähteen ominaisuuksia muokataan edelleen tilastollisen HMM-järjestelmän tuottaman parametrisen kuvauksen avulla, mikä imitoi oikeassa puheessa esiintyvää luonnollista äänilähteen ominaisuuksien vaihtelua.
Subjektiivisten kuuntelukokeiden tulokset osoittavat, että uuden puhesynteesimenetelmän laatu on huomattavasti parempi verrattuna perinteiseen HMM-pohjaiseen puhesynteesijärjestelmään. Lisäksi tulokset osoittavat, että uusi puhesynteesimenetelmä pystyy tuottamaan luonnolliselta kuulostavaa puhetta eri puhujien ominaisuuksilla.In this work, a new hidden Markov model (HMM) based text-to-speech (TTS) system utilizing glottal inverse filtering is described. The primary goal of the new TTS system is to enable producing natural sounding synthetic speech in different speaking styles with different speaker characteristics and emotions. In order to achieve these goals, the function of the real human voice production mechanism is modeled with the help of glottal inverse filtering embedded in a statistical framework of HMM.
The new TTS system uses a glottal inverse filtering based parametrization method that enables the extraction of voice source characteristics separate from other speech parameters, and thus the individual modeling of these characteristics in the HMM system. In the synthesis stage, natural glottal flow pulses are used for creating the voice source, and the voice source characteristics are further modified according to the adaptive all-pole model generated by the HMM system in order to imitate the natural variation in the real voice source.
Subjective listening tests show that the quality of the new TTS system is considerably better compared to a traditional HMM-based speech synthesizer. Moreover, the new system is clearly able to produce natural sounding synthetic speech with specific speaker characteristics
A computational framework for sound segregation in music signals
Tese de doutoramento. Engenharia Electrotécnica e de Computadores. Faculdade de Engenharia. Universidade do Porto. 200
Designing sound : procedural audio research based on the book by Andy Farnell
In
procedural
media,
data
normally
acquired
by
measuring
something,
commonly
described
as
sampling,
is
replaced
by
a
set
of
computational
rules
(procedure)
that
defines
the
typical
structure
and/or
behaviour
of
that
thing.
Here,
a
general
approach
to
sound
as
a
definable
process,
rather
than
a
recording,
is
developed.
By
analysis
of
their
physical
and
perceptual
qualities,
natural
objects
or
processes
that
produce
sound
are
modelled
by
digital
Sounding
Objects
for
use
in
arts
and
entertainments.
This
Thesis
discusses
different
aspects
of
Procedural
Audio
introducing
several
new
approaches
and
solutions
to
this
emerging
field
of
Sound
Design.Em
Media
Procedimental,
os
dados
os
dados
normalmente
adquiridos
através
da
medição
de
algo
habitualmente
designado
como
amostragem,
são
substituídos
por
um
conjunto
de
regras
computacionais
(procedimento)
que
definem
a
estrutura
típica,
ou
comportamento,
desse
elemento.
Neste
caso
é
desenvolvida
uma
abordagem
ao
som
definível
como
um
procedimento
em
vez
de
uma
gravação.
Através
da
análise
das
suas
características
físicas
e
perceptuais
,
objetos
naturais
ou
processos
que
produzem
som,
são
modelados
como
objetos
sonoros
digitais
para
utilização
nas
Artes
e
Entretenimento.
Nesta
Tese
são
discutidos
diferentes
aspectos
de
Áudio
Procedimental,
sendo
introduzidas
várias
novas
abordagens
e
soluções
para
o
campo
emergente
do
Design
Sonoro
- …