1,220 research outputs found

    Audio Analysis/synthesis System

    Get PDF
    A method and apparatus for the automatic analysis, synthesis and modification of audio signals, based on an overlap-add sinusoidal model, is disclosed. Automatic analysis of amplitude, frequency and phase parameters of the model is achieved using an analysis-by-synthesis procedure which incorporates successive approximation, yielding synthetic waveforms which are very good approximations to the original waveforms and are perceptually identical to the original sounds. A generalized overlap-add sinusoidal model is introduced which can modify audio signals without objectionable artifacts. In addition, a new approach to pitch-scale modification allows for the use of arbitrary spectral envelope estimates and addresses the problems of high-frequency loss and noise amplification encountered with prior art methods. The overlap-add synthesis method provides the ability to synthesize sounds with computational efficiency rivaling that of synthesis using the discrete short-time Fourier transform (DSTFT) while eliminating the modification artifacts associated with that method.Georgia Tech Research Corporatio

    Radial Basis Function Networks for Conversion of Sound Spectra

    Get PDF
    In many advanced signal processing tasks, such as pitch shifting, voice conversion or sound synthesis, accurate spectral processing is required. Here, the use of Radial Basis Function Networks (RBFN) is proposed for the modeling of the spectral changes (or conversions) related to the control of important sound parameters, such as pitch or intensity. The identification of such conversion functions is based on a procedure which learns the shape of the conversion from few couples of target spectra from a data set. The generalization properties of RBFNs provides for interpolation with respect to the pitch range. In the construction of the training set, mel-cepstral encoding of the spectrum is used to catch the perceptually most relevant spectral changes. Moreover, a singular value decomposition (SVD) approach is used to reduce the dimension of conversion functions. The RBFN conversion functions introduced are characterized by a perceptually-based fast training procedure, desirable interpolation properties and computational efficiency

    High-resolution sinusoidal analysis for resolving harmonic collisions in music audio signal processing

    Get PDF
    Many music signals can largely be considered an additive combination of multiple sources, such as musical instruments or voice. If the musical sources are pitched instruments, the spectra they produce are predominantly harmonic, and are thus well suited to an additive sinusoidal model. However, due to resolution limits inherent in time-frequency analyses, when the harmonics of multiple sources occupy equivalent time-frequency regions, their individual properties are additively combined in the time-frequency representation of the mixed signal. Any such time-frequency point in a mixture where multiple harmonics overlap produces a single observation from which the contributions owed to each of the individual harmonics cannot be trivially deduced. These overlaps are referred to as overlapping partials or harmonic collisions. If one wishes to infer some information about individual sources in music mixtures, the information carried in regions where collided harmonics exist becomes unreliable due to interference from other sources. This interference has ramifications in a variety of music signal processing applications such as multiple fundamental frequency estimation, source separation, and instrumentation identification. This thesis addresses harmonic collisions in music signal processing applications. As a solution to the harmonic collision problem, a class of signal subspace-based high-resolution sinusoidal parameter estimators is explored. Specifically, the direct matrix pencil method, or equivalently, the Estimation of Signal Parameters via Rotational Invariance Techniques (ESPRIT) method, is used with the goal of producing estimates of the salient parameters of individual harmonics that occupy equivalent time-frequency regions. This estimation method is adapted here to be applicable to time-varying signals such as musical audio. While high-resolution methods have been previously explored in the context of music signal processing, previous work has not addressed whether or not such methods truly produce high-resolution sinusoidal parameter estimates in real-world music audio signals. Therefore, this thesis answers the question of whether high-resolution sinusoidal parameter estimators are really high-resolution for real music signals. This work directly explores the capabilities of this form of sinusoidal parameter estimation to resolve collided harmonics. The capabilities of this analysis method are also explored in the context of music signal processing applications. Potential benefits of high-resolution sinusoidal analysis are examined in experiments involving multiple fundamental frequency estimation and audio source separation. This work shows that there are indeed benefits to high-resolution sinusoidal analysis in music signal processing applications, especially when compared to methods that produce sinusoidal parameter estimates based on more traditional time-frequency representations. The benefits of this form of sinusoidal analysis are made most evident in multiple fundamental frequency estimation applications, where substantial performance gains are seen. High-resolution analysis in the context of computational auditory scene analysis-based source separation shows similar performance to existing comparable methods

    Real-time segmentation of the temporal evolution of musical sounds

    Get PDF
    Since the studies of Helmholtz, it has been known that the temporal evolution of musical sounds plays an important role in our perception of timbre. The accurate temporal segmentation of musical sounds into regions with distinct characteristics is therefore of interest to researchers in the field of timbre perception as well as to those working with different forms of sound modelling and manipulation. Following recent work by Hajda (1996), Peeters (2004) and Caetano et al (2010), this paper presents a new method for the automatic segmentation of the temporal evolution of isolated musical sounds in real-time. We define attack, sustain and release segments using cues from a combination of the amplitude envelope, the spectro- temporal evolution and a measurement of the stability of the sound that is derived from the onset detection function. We conclude with an evaluation of the method

    Physically Informed Subtraction of a String's Resonances from Monophonic, Discretely Attacked Tones : a Phase Vocoder Approach

    Get PDF
    A method for the subtraction of a string's oscillations from monophonic, plucked- or hit-string tones is presented. The remainder of the subtraction is the response of the instrument's body to the excitation, and potentially other sources, such as faint vibrations of other strings, background noises or recording artifacts. In some respects, this method is similar to a stochastic-deterministic decomposition based on Sinusoidal Modeling Synthesis [MQ86, IS87]. However, our method targets string partials expressly, according to a physical model of the string's vibrations described in this thesis. Also, the method sits on a Phase Vocoder scheme. This approach has the essential advantage that the subtraction of the partials can take place \instantly", on a frame-by-frame basis, avoiding the necessity of tracking the partials and therefore availing of the possibility of a real-time implementation. The subtraction takes place in the frequency domain, and a method is presented whereby the computational cost of this process can be reduced through the reduction of a partial's frequency-domain data to its main lobe. In each frame of the Phase Vocoder, the string is encoded as a set of partials, completely described by four constants of frequency, phase, magnitude and exponential decay. These parameters are obtained with a novel method, the Complex Exponential Phase Magnitude Evolution (CSPME), which is a generalisation of the CSPE [SG06] to signals with exponential envelopes and which surpasses the nite resolution of the Discrete Fourier Transform. The encoding obtained is an intuitive representation of the string, suitable to musical processing

    Mandarin Singing Voice Synthesis Based on Harmonic Plus Noise Model and Singing Expression Analysis

    Full text link
    The purpose of this study is to investigate how humans interpret musical scores expressively, and then design machines that sing like humans. We consider six factors that have a strong influence on the expression of human singing. The factors are related to the acoustic, phonetic, and musical features of a real singing signal. Given real singing voices recorded following the MIDI scores and lyrics, our analysis module can extract the expression parameters from the real singing signals semi-automatically. The expression parameters are used to control the singing voice synthesis (SVS) system for Mandarin Chinese, which is based on the harmonic plus noise model (HNM). The results of perceptual experiments show that integrating the expression factors into the SVS system yields a notable improvement in perceptual naturalness, clearness, and expressiveness. By one-to-one mapping of the real singing signal and expression controls to the synthesizer, our SVS system can simulate the interpretation of a real singer with the timbre of a speaker.Comment: 8 pages, technical repor

    SOUND SYNTHESIS WITH CELLULAR AUTOMATA

    Get PDF
    This thesis reports on new music technology research which investigates the use of cellular automata (CA) for the digital synthesis of dynamic sounds. The research addresses the problem of the sound design limitations of synthesis techniques based on CA. These limitations fundamentally stem from the unpredictable and autonomous nature of these computational models. Therefore, the aim of this thesis is to develop a sound synthesis technique based on CA capable of allowing a sound design process. A critical analysis of previous research in this area will be presented in order to justify that this problem has not been previously solved. Also, it will be discussed why this problem is worthwhile to solve. In order to achieve such aim, a novel approach is proposed which considers the output of CA as digital signals and uses DSP procedures to analyse them. This approach opens a large variety of possibilities for better understanding the self-organization process of CA with a view to identifying not only mapping possibilities for making the synthesis of sounds possible, but also control possibilities which enable a sound design process. As a result of this approach, this thesis presents a technique called Histogram Mapping Synthesis (HMS), which is based on the statistical analysis of CA evolutions by histogram measurements. HMS will be studied with four different automatons, and a considerable number of control mechanisms will be presented. These will show that HMS enables a reasonable sound design process. With these control mechanisms it is possible to design and produce in a predictable and controllable manner a variety of timbres. Some of these timbres are imitations of sounds produced by acoustic means and others are novel. All the sounds obtained present dynamic features and many of them, including some of those that are novel, retain important characteristics of sounds produced by acoustic means

    Computer Models for Musical Instrument Identification

    Get PDF
    PhDA particular aspect in the perception of sound is concerned with what is commonly termed as texture or timbre. From a perceptual perspective, timbre is what allows us to distinguish sounds that have similar pitch and loudness. Indeed most people are able to discern a piano tone from a violin tone or able to distinguish different voices or singers. This thesis deals with timbre modelling. Specifically, the formant theory of timbre is the main theme throughout. This theory states that acoustic musical instrument sounds can be characterised by their formant structures. Following this principle, the central point of our approach is to propose a computer implementation for building musical instrument identification and classification systems. Although the main thrust of this thesis is to propose a coherent and unified approach to the musical instrument identification problem, it is oriented towards the development of algorithms that can be used in Music Information Retrieval (MIR) frameworks. Drawing on research in speech processing, a complete supervised system taking into account both physical and perceptual aspects of timbre is described. The approach is composed of three distinct processing layers. Parametric models that allow us to represent signals through mid-level physical and perceptual representations are considered. Next, the use of the Line Spectrum Frequencies as spectral envelope and formant descriptors is emphasised. Finally, the use of generative and discriminative techniques for building instrument and database models is investigated. Our system is evaluated under realistic recording conditions using databases of isolated notes and melodic phrases
    • …
    corecore