1,220 research outputs found
Audio Analysis/synthesis System
A method and apparatus for the automatic analysis, synthesis and modification of audio signals, based on an overlap-add sinusoidal model, is disclosed. Automatic analysis of amplitude, frequency and phase parameters of the model is achieved using an analysis-by-synthesis procedure which incorporates successive approximation, yielding synthetic waveforms which are very good approximations to the original waveforms and are perceptually identical to the original sounds. A generalized overlap-add sinusoidal model is introduced which can modify audio signals without objectionable artifacts. In addition, a new approach to pitch-scale modification allows for the use of arbitrary spectral envelope estimates and addresses the problems of high-frequency loss and noise amplification encountered with prior art methods. The overlap-add synthesis method provides the ability to synthesize sounds with computational efficiency rivaling that of synthesis using the discrete short-time Fourier transform (DSTFT) while eliminating the modification artifacts associated with that method.Georgia Tech Research Corporatio
Radial Basis Function Networks for Conversion of Sound Spectra
In many advanced signal processing tasks, such as pitch shifting, voice conversion or sound synthesis, accurate spectral processing is required. Here, the use of Radial Basis Function Networks (RBFN) is proposed for the modeling of the spectral changes (or conversions) related to the control of important sound parameters, such as pitch or intensity. The identification of such conversion functions is based on a procedure which learns the shape of the conversion from few couples of target spectra from a data set. The generalization properties of RBFNs provides for interpolation with respect to the pitch range. In the construction of the training set, mel-cepstral encoding of the spectrum is used to catch the perceptually most relevant spectral changes. Moreover, a singular value decomposition (SVD) approach is used to reduce the dimension of conversion functions. The RBFN conversion functions introduced are characterized by a perceptually-based fast training procedure, desirable interpolation properties and computational efficiency
High-resolution sinusoidal analysis for resolving harmonic collisions in music audio signal processing
Many music signals can largely be considered an additive combination of
multiple sources, such as musical instruments or voice. If the musical sources
are pitched instruments, the spectra they produce are predominantly harmonic,
and are thus well suited to an additive sinusoidal model. However,
due to resolution limits inherent in time-frequency analyses, when the harmonics
of multiple sources occupy equivalent time-frequency regions, their
individual properties are additively combined in the time-frequency representation
of the mixed signal. Any such time-frequency point in a mixture
where multiple harmonics overlap produces a single observation from which
the contributions owed to each of the individual harmonics cannot be trivially
deduced. These overlaps are referred to as overlapping partials or harmonic
collisions. If one wishes to infer some information about individual sources in
music mixtures, the information carried in regions where collided harmonics
exist becomes unreliable due to interference from other sources. This interference
has ramifications in a variety of music signal processing applications
such as multiple fundamental frequency estimation, source separation, and
instrumentation identification.
This thesis addresses harmonic collisions in music signal processing applications.
As a solution to the harmonic collision problem, a class of signal
subspace-based high-resolution sinusoidal parameter estimators is explored.
Specifically, the direct matrix pencil method, or equivalently, the Estimation
of Signal Parameters via Rotational Invariance Techniques (ESPRIT)
method, is used with the goal of producing estimates of the salient parameters
of individual harmonics that occupy equivalent time-frequency regions. This
estimation method is adapted here to be applicable to time-varying signals
such as musical audio. While high-resolution methods have been previously
explored in the context of music signal processing, previous work has not
addressed whether or not such methods truly produce high-resolution sinusoidal parameter estimates in real-world music audio signals. Therefore, this
thesis answers the question of whether high-resolution sinusoidal parameter
estimators are really high-resolution for real music signals.
This work directly explores the capabilities of this form of sinusoidal parameter
estimation to resolve collided harmonics. The capabilities of this
analysis method are also explored in the context of music signal processing
applications. Potential benefits of high-resolution sinusoidal analysis are
examined in experiments involving multiple fundamental frequency estimation
and audio source separation. This work shows that there are indeed
benefits to high-resolution sinusoidal analysis in music signal processing applications,
especially when compared to methods that produce sinusoidal
parameter estimates based on more traditional time-frequency representations.
The benefits of this form of sinusoidal analysis are made most evident
in multiple fundamental frequency estimation applications, where substantial
performance gains are seen. High-resolution analysis in the context of
computational auditory scene analysis-based source separation shows similar
performance to existing comparable methods
Real-time segmentation of the temporal evolution of musical sounds
Since the studies of Helmholtz, it has been known that the temporal evolution of musical sounds plays an important role
in our perception of timbre. The accurate temporal segmentation of musical sounds into regions with distinct characteristics
is therefore of interest to researchers in the field of timbre perception as well as to those working with different forms
of sound modelling and manipulation. Following recent work by Hajda (1996), Peeters (2004) and Caetano et al (2010),
this paper presents a new method for the automatic segmentation of the temporal evolution of isolated musical sounds in real-time. We define attack, sustain and release segments using cues from a combination of the amplitude envelope, the spectro- temporal evolution and a measurement of the stability of the sound that is derived from the onset detection function. We conclude with an evaluation of the method
Physically Informed Subtraction of a String's Resonances from Monophonic, Discretely Attacked Tones : a Phase Vocoder Approach
A method for the subtraction of a string's oscillations from monophonic,
plucked- or hit-string tones is presented. The remainder of the subtraction
is the response of the instrument's body to the excitation, and potentially
other sources, such as faint vibrations of other strings, background
noises or recording artifacts. In some respects, this method is similar to a
stochastic-deterministic decomposition based on Sinusoidal Modeling Synthesis
[MQ86, IS87]. However, our method targets string partials expressly,
according to a physical model of the string's vibrations described in this thesis.
Also, the method sits on a Phase Vocoder scheme. This approach has
the essential advantage that the subtraction of the partials can take place
\instantly", on a frame-by-frame basis, avoiding the necessity of tracking the
partials and therefore availing of the possibility of a real-time implementation.
The subtraction takes place in the frequency domain, and a method
is presented whereby the computational cost of this process can be reduced
through the reduction of a partial's frequency-domain data to its main lobe.
In each frame of the Phase Vocoder, the string is encoded as a set of partials,
completely described by four constants of frequency, phase, magnitude
and exponential decay. These parameters are obtained with a novel method,
the Complex Exponential Phase Magnitude Evolution (CSPME), which is
a generalisation of the CSPE [SG06] to signals with exponential envelopes
and which surpasses the nite resolution of the Discrete Fourier Transform.
The encoding obtained is an intuitive representation of the string, suitable
to musical processing
Mandarin Singing Voice Synthesis Based on Harmonic Plus Noise Model and Singing Expression Analysis
The purpose of this study is to investigate how humans interpret musical
scores expressively, and then design machines that sing like humans. We
consider six factors that have a strong influence on the expression of human
singing. The factors are related to the acoustic, phonetic, and musical
features of a real singing signal. Given real singing voices recorded following
the MIDI scores and lyrics, our analysis module can extract the expression
parameters from the real singing signals semi-automatically. The expression
parameters are used to control the singing voice synthesis (SVS) system for
Mandarin Chinese, which is based on the harmonic plus noise model (HNM). The
results of perceptual experiments show that integrating the expression factors
into the SVS system yields a notable improvement in perceptual naturalness,
clearness, and expressiveness. By one-to-one mapping of the real singing signal
and expression controls to the synthesizer, our SVS system can simulate the
interpretation of a real singer with the timbre of a speaker.Comment: 8 pages, technical repor
SOUND SYNTHESIS WITH CELLULAR AUTOMATA
This thesis reports on new music technology research which investigates the use of cellular automata (CA) for the digital synthesis of dynamic sounds. The research addresses the problem of the sound design limitations of synthesis techniques based on CA. These limitations fundamentally stem from the unpredictable and autonomous nature of these computational models.
Therefore, the aim of this thesis is to develop a sound synthesis technique based on CA capable of allowing a sound design process. A critical analysis of previous research in this area will be presented in order to justify that this problem has not been previously solved. Also, it will be discussed why this problem is worthwhile to solve.
In order to achieve such aim, a novel approach is proposed which considers the output of CA as digital signals and uses DSP procedures to analyse them. This approach opens a large variety of possibilities for better understanding the self-organization process of CA with a view to identifying not only mapping possibilities for making the synthesis of sounds possible, but also control possibilities which enable a sound design process.
As a result of this approach, this thesis presents a technique called Histogram Mapping Synthesis (HMS), which is based on the statistical analysis of CA evolutions by histogram measurements. HMS will be studied with four different automatons, and a considerable number of control mechanisms will be presented. These will show that HMS enables a reasonable sound design process.
With these control mechanisms it is possible to design and produce in a predictable and controllable manner a variety of timbres. Some of these timbres are imitations of sounds produced by acoustic means and others are novel. All the sounds obtained present dynamic features and many of them, including some of those that are novel, retain important characteristics of sounds produced by acoustic means
Computer Models for Musical Instrument Identification
PhDA particular aspect in the perception of sound is concerned with what is commonly
termed as texture or timbre. From a perceptual perspective, timbre is what allows us
to distinguish sounds that have similar pitch and loudness. Indeed most people are
able to discern a piano tone from a violin tone or able to distinguish different voices
or singers.
This thesis deals with timbre modelling. Specifically, the formant theory of timbre
is the main theme throughout. This theory states that acoustic musical instrument
sounds can be characterised by their formant structures. Following this principle, the
central point of our approach is to propose a computer implementation for building
musical instrument identification and classification systems.
Although the main thrust of this thesis is to propose a coherent and unified
approach to the musical instrument identification problem, it is oriented towards the
development of algorithms that can be used in Music Information Retrieval (MIR)
frameworks. Drawing on research in speech processing, a complete supervised system
taking into account both physical and perceptual aspects of timbre is described.
The approach is composed of three distinct processing layers. Parametric models
that allow us to represent signals through mid-level physical and perceptual representations
are considered. Next, the use of the Line Spectrum Frequencies as spectral
envelope and formant descriptors is emphasised. Finally, the use of generative and
discriminative techniques for building instrument and database models is investigated.
Our system is evaluated under realistic recording conditions using databases of isolated
notes and melodic phrases
- …