203 research outputs found

    A Parametric Sound Object Model for Sound Texture Synthesis

    Get PDF
    This thesis deals with the analysis and synthesis of sound textures based on parametric sound objects. An overview is provided about the acoustic and perceptual principles of textural acoustic scenes, and technical challenges for analysis and synthesis are considered. Four essential processing steps for sound texture analysis are identifi ed, and existing sound texture systems are reviewed, using the four-step model as a guideline. A theoretical framework for analysis and synthesis is proposed. A parametric sound object synthesis (PSOS) model is introduced, which is able to describe individual recorded sounds through a fi xed set of parameters. The model, which applies to harmonic and noisy sounds, is an extension of spectral modeling and uses spline curves to approximate spectral envelopes, as well as the evolution of parameters over time. In contrast to standard spectral modeling techniques, this representation uses the concept of objects instead of concatenated frames, and it provides a direct mapping between sounds of diff erent length. Methods for automatic and manual conversion are shown. An evaluation is presented in which the ability of the model to encode a wide range of di fferent sounds has been examined. Although there are aspects of sounds that the model cannot accurately capture, such as polyphony and certain types of fast modulation, the results indicate that high quality synthesis can be achieved for many different acoustic phenomena, including instruments and animal vocalizations. In contrast to many other forms of sound encoding, the parametric model facilitates various techniques of machine learning and intelligent processing, including sound clustering and principal component analysis. Strengths and weaknesses of the proposed method are reviewed, and possibilities for future development are discussed

    Auditory group theory with applications to statistical basis methods for structured audio

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Program in Media Arts & Sciences, 1998.Includes bibliographical references (p. 161-172).Michael Anthony Casey.Ph.D

    Harmonic Sinusoid Modeling of Tonal Music Events

    Get PDF
    PhDThis thesis presents the theory, implementation and applications of the harmonic sinusoid modeling of pitched audio events. Harmonic sinusoid modeling is a parametric model that expresses an audio signal, or part of an audio signal, as the linear combination of concurrent slow-varying sinusoids, grouped together under harmonic frequency constraints. The harmonic sinusoid modeling is an extension of the sinusoid modeling, with the additional frequency constraints so that it is capable to directly model tonal sounds. This enables applications such as object-oriented audio manipulations, polyphonic transcription, instrument/singer recognition with background music, etc. The modeling system consists of an analyzer and a synthesizer. The analyzer extracts harmonic sinusoidal parameters from an audio waveform, while the synthesizer rebuilds an audio waveform from these parameters. Parameter estimation is based on a detecting-grouping-tracking framework. The detecting stage finds and estimates sinusoid atoms; the grouping stage collects concurrent atoms into harmonic groups; the tracking stage collects the atom groups at different time to form continuous harmonic sinusoid tracks. Compared to standard sinusoid model, the harmonic model focuses on harmonic groups of atoms rather than on isolated atoms, therefore naturally represents tonal sounds. The synthesizer rebuilds the audio signal by interpolating measured parameters along the found tracks. We propose the first application of the harmonic sinusoid model in digital audio editors. For audio editing, with the tonal events directly represented by a parametric model, we can implement standard audio editing functionalities on tonal events embedded in an audio signal, or invent new sound effects based on the model parameters themselves. Possibilities for other applications are suggested at the end of this thesis.Financial support: European Commission, the Higher Education Funding Council for England, and Queen Mary, University of Londo

    Prediction-driven computational auditory scene analysis

    Get PDF
    The sound of a busy environment, such as a city street, gives rise to a perception of numerous distinct events in a human listener--the 'auditory scene analysis' of the acoustic information. Recent advances in the understanding of this process from experimental psychoacoustics have led to several efforts to build a computer model capable of the same function. This work is known as 'computational auditory scene analysis'. The dominant approach to this problem has been as a sequence of modules, the output of one forming the input to the next. Sound is converted to its spectrum, cues are picked out, and representations of the cues are grouped into an abstract description of the initial input. This 'data-driven' approach has some specific weaknesses in comparison to the auditory system: it will interpret a given sound in the same way regardless of its context, and it cannot 'infer' the presence of a sound for which direct evidence is hidden by other components. The 'prediction-driven' approach is presented as an alternative, in which analysis is a process of reconciliation between the observed acoustic features and the predictions of an internal model of the sound-producing entities in the environment. In this way, predicted sound events will form part of the scene interpretation as long as they are consistent with the input sound, regardless of whether direct evidence is found. A blackboard-based implementation of this approach is described which analyzes dense, ambient sound examples into a vocabulary of noise clouds, transient clicks, and a correlogram-based representation of wide-band periodic energy called the weft. The system is assessed through experiments that firstly investigate subjects' perception of distinct events in ambient sound examples, and secondly collect quality judgments for sound events resynthesized by the system. Although rated as far from perfect, there was good agreement between the events detected by the model and by the listeners. In addition, the experimental procedure does not depend on special aspects of the algorithm (other than the generation of resyntheses), and is applicable to the assessment and comparison of other models of human auditory organization

    Towards Real-Time Non-Stationary Sinusoidal Modelling of Kick and Bass Sounds for Audio Analysis and Modification

    Get PDF
    Sinusoidal Modelling is a powerful and flexible parametric method for analysing and processing audio signals. These signals have an underlying structure that modern spectral models aim to exploit by separating the signal into sinusoidal, transient, and noise components. Each of these can then be modelled in a manner most appropriate to that component's inherent structure. The accuracy of the estimated parameters is directly related to the quality of the model's representation of the signal, and the assumptions made about its underlying structure. For sinusoidal models, these assumptions generally affect the non-stationary estimates related to amplitude and frequency modulations, and the type of amplitude change curve. This is especially true when using a single analysis frame in a non-overlapping framework, where biased estimates can result in discontinuities at frame boundaries. It is therefore desirable for such a model to distinguish between the shape of different amplitude changes and adapt the estimation of this accordingly. Intra-frame amplitude change can be interpreted as a change in the windowing function applied to a stationary sinusoid, which can be estimated from the derivative of the phase with respect to frequency at magnitude peaks in the DFT spectrum. A method for measuring monotonic linear amplitude change from single-frame estimates using the first-order derivative of the phase with respect to frequency (approximated by the first-order difference) is presented, along with a method of distinguishing between linear and exponential amplitude change. An adaption of the popular matching pursuit algorithm for refining model parameters in a segmented framework has been investigated using a dictionary comprised of sinusoids with parameters varying slightly from model estimates, based on Modelled Pursuit (MoP). Modelling of the residual signal using a segmented undecimated Wavelet Transform (segUWT) is presented. A generalisation for both the forward and inverse transforms, for delay compensations and overlap extensions for different lengths of Wavelets and the number of decomposition levels in an Overlap Save (OLS) implementation for dealing with convolution block-based artefacts is presented. This shift invariant implementation of the DWT is a popular tool for de-noising and shows promising results for the separation of transients from noise

    Pitch-Informed Solo and Accompaniment Separation

    Get PDF
    Das Thema dieser Dissertation ist die Entwicklung eines Systems zur Tonhöhen-informierten Quellentrennung von Musiksignalen in Soloinstrument und Begleitung. Dieses ist geeignet, die dominanten Instrumente aus einem Musikstück zu isolieren, unabhängig von der Art des Instruments, der Begleitung und Stilrichtung. Dabei werden nur einstimmige Melodieinstrumente in Betracht gezogen. Die Musikaufnahmen liegen monaural vor, es kann also keine zusätzliche Information aus der Verteilung der Instrumente im Stereo-Panorama gewonnen werden. Die entwickelte Methode nutzt Tonhöhen-Information als Basis für eine sinusoidale Modellierung der spektralen Eigenschaften des Soloinstruments aus dem Musikmischsignal. Anstatt die spektralen Informationen pro Frame zu bestimmen, werden in der vorgeschlagenen Methode Tonobjekte für die Separation genutzt. Tonobjekt-basierte Verarbeitung ermöglicht es, zusätzlich die Notenanfänge zu verfeinern, transiente Artefakte zu reduzieren, gemeinsame Amplitudenmodulation (Common Amplitude Modulation CAM) einzubeziehen und besser nichtharmonische Elemente der Töne abzuschätzen. Der vorgestellte Algorithmus zur Quellentrennung von Soloinstrument und Begleitung ermöglicht eine Echtzeitverarbeitung und ist somit relevant für den praktischen Einsatz. Ein Experiment zur besseren Modellierung der Zusammenhänge zwischen Magnitude, Phase und Feinfrequenz von isolierten Instrumententönen wurde durchgeführt. Als Ergebnis konnte die Kontinuität der zeitlichen Einhüllenden, die Inharmonizität bestimmter Musikinstrumente und die Auswertung des Phasenfortschritts für die vorgestellte Methode ausgenutzt werden. Zusätzlich wurde ein Algorithmus für die Quellentrennung in perkussive und harmonische Signalanteile auf Basis des Phasenfortschritts entwickelt. Dieser erreicht ein verbesserte perzeptuelle Qualität der harmonischen und perkussiven Signale gegenüber vergleichbaren Methoden nach dem Stand der Technik. Die vorgestellte Methode zur Klangquellentrennung in Soloinstrument und Begleitung wurde zu den Evaluationskampagnen SiSEC 2011 und SiSEC 2013 eingereicht. Dort konnten vergleichbare Ergebnisse im Hinblick auf perzeptuelle Bewertungsmaße erzielt werden. Die Qualität eines Referenzalgorithmus im Hinblick auf den in dieser Dissertation beschriebenen Instrumentaldatensatz übertroffen werden. Als ein Anwendungsszenario für die Klangquellentrennung in Solo und Begleitung wurde ein Hörtest durchgeführt, der die Qualitätsanforderungen an Quellentrennung im Kontext von Musiklernsoftware bewerten sollte. Die Ergebnisse dieses Hörtests zeigen, dass die Solo- und Begleitspur gemäß unterschiedlicher Qualitätskriterien getrennt werden sollten. Die Musiklernsoftware Songs2See integriert die vorgestellte Klangquellentrennung bereits in einer kommerziell erhältlichen Anwendung.This thesis addresses the development of a system for pitch-informed solo and accompaniment separation capable of separating main instruments from music accompaniment regardless of the musical genre of the track, or type of music accompaniment. For the solo instrument, only pitched monophonic instruments were considered in a single-channel scenario where no panning or spatial location information is available. In the proposed method, pitch information is used as an initial stage of a sinusoidal modeling approach that attempts to estimate the spectral information of the solo instrument from a given audio mixture. Instead of estimating the solo instrument on a frame by frame basis, the proposed method gathers information of tone objects to perform separation. Tone-based processing allowed the inclusion of novel processing stages for attack refinement, transient interference reduction, common amplitude modulation (CAM) of tone objects, and for better estimation of non-harmonic elements that can occur in musical instrument tones. The proposed solo and accompaniment algorithm is an efficient method suitable for real-world applications. A study was conducted to better model magnitude, frequency, and phase of isolated musical instrument tones. As a result of this study, temporal envelope smoothness, inharmonicty of musical instruments, and phase expectation were exploited in the proposed separation method. Additionally, an algorithm for harmonic/percussive separation based on phase expectation was proposed. The algorithm shows improved perceptual quality with respect to state-of-the-art methods for harmonic/percussive separation. The proposed solo and accompaniment method obtained perceptual quality scores comparable to other state-of-the-art algorithms under the SiSEC 2011 and SiSEC 2013 campaigns, and outperformed the comparison algorithm on the instrumental dataset described in this thesis.As a use-case of solo and accompaniment separation, a listening test procedure was conducted to assess separation quality requirements in the context of music education. Results from the listening test showed that solo and accompaniment tracks should be optimized differently to suit quality requirements of music education. The Songs2See application was presented as commercial music learning software which includes the proposed solo and accompaniment separation method

    Temporal integration of loudness as a function of level

    Get PDF
    • …
    corecore