78 research outputs found
Re-Sonification of Objects, Events, and Environments
abstract: Digital sound synthesis allows the creation of a great variety of sounds. Focusing on interesting or ecologically valid sounds for music, simulation, aesthetics, or other purposes limits the otherwise vast digital audio palette. Tools for creating such sounds vary from arbitrary methods of altering recordings to precise simulations of vibrating objects. In this work, methods of sound synthesis by re-sonification are considered. Re-sonification, herein, refers to the general process of analyzing, possibly transforming, and resynthesizing or reusing recorded sounds in meaningful ways, to convey information. Applied to soundscapes, re-sonification is presented as a means of conveying activity within an environment. Applied to the sounds of objects, this work examines modeling the perception of objects as well as their physical properties and the ability to simulate interactive events with such objects. To create soundscapes to re-sonify geographic environments, a method of automated soundscape design is presented. Using recorded sounds that are classified based on acoustic, social, semantic, and geographic information, this method produces stochastically generated soundscapes to re-sonify selected geographic areas. Drawing on prior knowledge, local sounds and those deemed similar comprise a locale's soundscape. In the context of re-sonifying events, this work examines processes for modeling and estimating the excitations of sounding objects. These include plucking, striking, rubbing, and any interaction that imparts energy into a system, affecting the resultant sound. A method of estimating a linear system's input, constrained to a signal-subspace, is presented and applied toward improving the estimation of percussive excitations for re-sonification. To work toward robust recording-based modeling and re-sonification of objects, new implementations of banded waveguide (BWG) models are proposed for object modeling and sound synthesis. Previous implementations of BWGs use arbitrary model parameters and may produce a range of simulations that do not match digital waveguide or modal models of the same design. Subject to linear excitations, some models proposed here behave identically to other equivalently designed physical models. Under nonlinear interactions, such as bowing, many of the proposed implementations exhibit improvements in the attack characteristics of synthesized sounds.Dissertation/ThesisPh.D. Electrical Engineering 201
Audio source separation for music in low-latency and high-latency scenarios
Aquesta tesi proposa mètodes per tractar les limitacions de les tècniques existents de separació de fonts musicals en condicions de baixa i alta latència. En primer lloc, ens centrem en els mètodes amb un baix cost computacional i baixa latència. Proposem l'ús de la regularització de Tikhonov com a mètode de descomposició de l'espectre en el context de baixa latència. El comparem amb les tècniques existents en tasques d'estimació i seguiment dels tons, que són passos crucials en molts mètodes de separació. A continuació utilitzem i avaluem el mètode de descomposició de l'espectre en tasques de separació de veu cantada, baix i percussió. En segon lloc, proposem diversos mètodes d'alta latència que milloren la separació de la veu cantada, gràcies al modelatge de components específics, com la respiració i les consonants. Finalment, explorem l'ús de correlacions temporals i anotacions manuals per millorar la separació dels instruments de percussió i dels senyals musicals polifònics complexes.Esta tesis propone métodos para tratar las limitaciones de las técnicas existentes de separación de fuentes musicales en condiciones de baja y alta latencia. En primer lugar, nos centramos en los métodos con un bajo coste computacional y baja latencia. Proponemos el uso de la regularización de Tikhonov como método de descomposición del espectro en el contexto de baja latencia. Lo comparamos con las técnicas existentes en tareas de estimación y seguimiento de los tonos, que son pasos cruciales en muchos métodos de separación. A continuación utilizamos y evaluamos el método de descomposición del espectro en tareas de separación de voz cantada, bajo y percusión. En segundo lugar, proponemos varios métodos de alta latencia que mejoran la separación de la voz cantada, gracias al modelado de componentes que a menudo no se toman en cuenta, como la respiración y las consonantes. Finalmente, exploramos el uso de correlaciones temporales y anotaciones manuales para mejorar la separación de los instrumentos de percusión y señales musicales polifónicas complejas.This thesis proposes specific methods to address the limitations of current music source separation methods in low-latency and high-latency scenarios. First, we focus on methods with low computational cost and low latency. We propose the use of Tikhonov regularization as a method for spectrum decomposition in the low-latency context. We compare it to existing techniques in pitch estimation and tracking tasks, crucial steps in many separation methods. We then use the proposed spectrum decomposition method in low-latency separation tasks targeting singing voice, bass and drums. Second, we propose several high-latency methods that improve the separation of singing voice by modeling components that are often not accounted for, such as breathiness and consonants. Finally, we explore using temporal correlations and human annotations to enhance the separation of drums and complex polyphonic music signals
Pitch-Informed Solo and Accompaniment Separation
Das Thema dieser Dissertation ist die Entwicklung eines Systems zur
Tonhöhen-informierten Quellentrennung von Musiksignalen in Soloinstrument
und Begleitung. Dieses ist geeignet, die dominanten Instrumente aus einem
Musikstück zu isolieren, unabhängig von der Art des Instruments, der
Begleitung und Stilrichtung. Dabei werden nur einstimmige
Melodieinstrumente in Betracht gezogen. Die Musikaufnahmen liegen monaural
vor, es kann also keine zusätzliche Information aus der Verteilung der
Instrumente im Stereo-Panorama gewonnen werden.
Die entwickelte Methode nutzt Tonhöhen-Information als Basis für eine
sinusoidale Modellierung der spektralen Eigenschaften des Soloinstruments
aus dem Musikmischsignal. Anstatt die spektralen Informationen pro Frame zu
bestimmen, werden in der vorgeschlagenen Methode Tonobjekte für die
Separation genutzt. Tonobjekt-basierte Verarbeitung ermöglicht es,
zusätzlich die Notenanfänge zu verfeinern, transiente Artefakte zu
reduzieren, gemeinsame Amplitudenmodulation (Common Amplitude Modulation
CAM) einzubeziehen und besser nichtharmonische Elemente der Töne
abzuschätzen. Der vorgestellte Algorithmus zur Quellentrennung von
Soloinstrument und Begleitung ermöglicht eine Echtzeitverarbeitung und ist
somit relevant für den praktischen Einsatz.
Ein Experiment zur besseren Modellierung der Zusammenhänge zwischen
Magnitude, Phase und Feinfrequenz von isolierten Instrumententönen wurde
durchgeführt. Als Ergebnis konnte die Kontinuität der zeitlichen
Einhüllenden, die Inharmonizität bestimmter Musikinstrumente und die
Auswertung des Phasenfortschritts für die vorgestellte Methode ausgenutzt
werden. Zusätzlich wurde ein Algorithmus für die Quellentrennung in
perkussive und harmonische Signalanteile auf Basis des Phasenfortschritts
entwickelt. Dieser erreicht ein verbesserte perzeptuelle Qualität der
harmonischen und perkussiven Signale gegenüber vergleichbaren Methoden nach
dem Stand der Technik.
Die vorgestellte Methode zur Klangquellentrennung in Soloinstrument und
Begleitung wurde zu den Evaluationskampagnen SiSEC 2011 und SiSEC 2013
eingereicht. Dort konnten vergleichbare Ergebnisse im Hinblick auf
perzeptuelle Bewertungsmaße erzielt werden. Die Qualität eines
Referenzalgorithmus im Hinblick auf den in dieser Dissertation
beschriebenen Instrumentaldatensatz übertroffen werden.
Als ein Anwendungsszenario für die Klangquellentrennung in Solo und
Begleitung wurde ein Hörtest durchgeführt, der die Qualitätsanforderungen
an Quellentrennung im Kontext von Musiklernsoftware bewerten sollte. Die
Ergebnisse dieses Hörtests zeigen, dass die Solo- und Begleitspur gemäß
unterschiedlicher Qualitätskriterien getrennt werden sollten. Die
Musiklernsoftware Songs2See integriert die vorgestellte
Klangquellentrennung bereits in einer kommerziell erhältlichen Anwendung.This thesis addresses the development of a system for pitch-informed solo
and accompaniment separation capable of separating main instruments from
music accompaniment regardless of the musical genre of the track, or type
of music accompaniment. For the solo instrument, only pitched monophonic
instruments were considered in a single-channel scenario where no panning
or spatial location information is available.
In the proposed method, pitch information is used as an initial stage of a
sinusoidal modeling approach that attempts to estimate the spectral
information of the solo instrument from a given audio mixture. Instead of
estimating the solo instrument on a frame by frame basis, the proposed
method gathers information of tone objects to perform separation.
Tone-based processing allowed the inclusion of novel processing stages for
attack refinement, transient interference reduction, common amplitude
modulation (CAM) of tone objects, and for better estimation of non-harmonic
elements that can occur in musical instrument tones. The proposed solo and
accompaniment algorithm is an efficient method suitable for real-world
applications.
A study was conducted to better model magnitude, frequency, and phase of
isolated musical instrument tones. As a result of this study, temporal
envelope smoothness, inharmonicty of musical instruments, and phase
expectation were exploited in the proposed separation method. Additionally,
an algorithm for harmonic/percussive separation based on phase expectation
was proposed. The algorithm shows improved perceptual quality with respect
to state-of-the-art methods for harmonic/percussive separation.
The proposed solo and accompaniment method obtained perceptual quality
scores comparable to other state-of-the-art algorithms under the SiSEC 2011
and SiSEC 2013 campaigns, and outperformed the comparison algorithm on the
instrumental dataset described in this thesis.As a use-case of solo and
accompaniment separation, a listening test procedure was conducted to
assess separation quality requirements in the context of music education.
Results from the listening test showed that solo and accompaniment tracks
should be optimized differently to suit quality requirements of music
education. The Songs2See application was presented as commercial music
learning software which includes the proposed solo and accompaniment
separation method
Ontology of music performance variation
Performance variation in rhythm determines the extent that humans perceive and feel the effect of rhythmic pulsation and music in general. In many cases, these rhythmic variations can be linked to percussive performance. Such percussive performance variations are often absent in current percussive rhythmic models. The purpose of this thesis is to present an interactive computer model, called the PD-103, that simulates the micro-variations in human percussive performance. This thesis makes three main contributions to existing knowledge: firstly, by formalising a new method for modelling percussive performance; secondly, by developing a new compositional software tool called the PD-103 that models human percussive performance, and finally, by creating a portfolio of different musical styles to demonstrate the capabilities of the software. A large database of recorded samples are classified into zones based upon the vibrational characteristics of the instruments, to model timbral variation in human percussive performance. The degree of timbral variation is governed by principles of biomechanics and human percussive performance. A fuzzy logic algorithm is applied to analyse current and first-order sample selection in order to formulate an ontological description of music performance variation. Asynchrony values were extracted from recorded performances of three different performance skill levels to create \timing fingerprints" which characterise unique features to each percussionist. The PD-103 uses real performance timing data to determine asynchrony values for each synthesised note. The spectral content of the sample database forms a three-dimensional loudness/timbre space, intersecting instrumental behaviour with music composition. The reparameterisation of the sample database, following the analysis of loudness, spectral flatness, and spectral centroid, provides an opportunity to explore the timbral variations inherent in percussion instruments, to creatively explore dimensions of timbre. The PD-103 was used to create a music portfolio exploring different rhythmic possibilities with a focus on meso-periodic rhythms common to parts of West Africa, jazz drumming, and electroacoustic music. The portfolio also includes new timbral percussive works based on spectral features and demonstrates the central aim of this thesis, which is the creation of a new compositional software tool that integrates human percussive performance and subsequently extends this model to different genres of music
일반화된 디리클레 사전확률을 이용한 비지도적 음원 분리 방법
학위논문 (박사)-- 서울대학교 대학원 : 융합과학기술대학원 융합과학부, 2018. 2. 이교구.Music source separation aims to extract and reconstruct individual instrument sounds that constitute a mixture sound. It has received a great deal of attention recently due to its importance in the audio signal processing. In addition to its stand-alone applications such as noise reduction and instrument-wise equalization, the source separation can directly affect the performance of the various music information retrieval algorithms when used as a pre-processing. However, conventional source separation algorithms have failed to show satisfactory performance especially without the aid of spatial or musical information about the target source. To deal with this problem, we have focused on the spectral and temporal characteristics of sounds that can be observed in the spectrogram. Spectrogram decomposition is a commonly used technique to exploit such characteristicshowever, only a few simple characteristics such as sparsity were utilizable so far because most of the characteristics were difficult to be expressed in the form of algorithms. The main goal of this thesis is to investigate the possibility of using generalized Dirichlet prior to constrain spectral/temporal bases of the spectrogram decomposition algorithms. As the generalized Dirichlet prior is not only simple but also flexible in its usage, it enables us to utilize more characteristics in the spectrogram decomposition frameworks. From harmonic-percussive sound separation to harmonic instrument sound separation, we apply the generalized Dirichlet prior to various tasks and verify its flexible usage as well as fine performance.Chapter 1 Introduction 1
1.1 Motivation 1
1.2 Task of interest 4
1.2.1 Number of channels 4
1.2.2 Utilization of side-information 5
1.3 Approach 6
1.3.1 Spectrogram decomposition with constraints 7
1.3.2 Dirichlet prior 11
1.3.3 Contribution 12
1.4 Outline of the thesis 13
Chapter 2 Theoretical background 17
2.1 Probabilistic latent component analysis 18
2.2 Non-negative matrix factorization 21
2.3 Dirichlet prior 23
2.3.1 PLCA framework 24
2.3.2 NMF framework 26
2.4 Summary 28
Chapter 3 Harmonic-Percussive Source Separation Using Harmonicity and Sparsity Constraints . . 30
3.1 Introduction 30
3.2 Proposed method 33
3.2.1 Formulation of Harmonic-Percussive Separation 33
3.2.2 Relation to Dirichlet Prior 35
3.3 Performance evaluation 37
3.3.1 Sample Problem 37
3.3.2 Qualitative Analysis 38
3.3.3 Quantitative Analysis 42
3.4 Summary 43
Chapter 4 Exploiting Continuity/Discontinuity of Basis Vectors in Spectrogram Decomposition for Harmonic-Percussive Sound Separation 46
4.1 Introduction 46
4.2 Proposed Method 51
4.2.1 Characteristics of harmonic and percussive components 51
4.2.2 Derivation of the proposed method 56
4.2.3 Algorithm interpretation 61
4.3 Performance Evaluation 62
4.3.1 Parameter setting 63
4.3.2 Toy examples 66
4.3.3 SiSEC 2015 dataset 69
4.3.4 QUASI dataset 84
4.3.5 Subjective performance evaluation 85
4.3.6 Audio demo 87
4.4 Summary 87
Chapter 5 Informed Approach to Harmonic Instrument sound Separation 89
5.1 Introduction 89
5.2 Proposed method 91
5.2.1 Excitation-filter model 92
5.2.2 Linear predictive coding 94
5.2.3 Spectrogram decomposition procedure 96
5.3 Performance evaluation 99
5.3.1 Experimental settings 99
5.3.2 Performance comparison 101
5.3.3 Envelope extraction 102
5.4 Summary 104
Chapter 6 Blind Approach to Harmonic Instrument sound Separation 105
6.1 Introduction 105
6.2 Proposed method 106
6.3 Performance evaluation 109
6.3.1 Weight optimization 109
6.3.2 Performance comparison 109
6.3.3 Effect of envelope similarity 112
6.4 Summary 114
Chapter 7 Conclusion and Future Work 115
7.1 Contributions 115
7.2 Future work 119
7.2.1 Application to multi-channel audio environment 119
7.2.2 Application to vocal separation 119
7.2.3 Application to various audio source separation tasks 120
Bibliography 121
초 록 137Docto
Applications of analysis and synthesis techniques for complex sounds
Master'sMASTER OF SCIENC
Recommended from our members
Bayesian methods in music modelling
This thesis presents several hierarchical generative Bayesian models of musical signals designed to improve the accuracy of existing multiple pitch detection systems and other musical signal processing applications whilst remaining feasible for real-time computation. At the lowest level the signal is modelled as a set of overlapping sinusoidal basis functions. The parameters of these basis functions are built into a prior framework based on principles known from musical theory and the physics of musical instruments. The model of a musical note optionally includes phenomena such as frequency and amplitude modulations, damping, volume, timbre and inharmonicity. The occurrence of note onsets in a performance of a piece of music is controlled by an underlying tempo process and the alignment of the timings to the underlying score of the music.
A variety of applications are presented for these models under differing inference constraints. Where full Bayesian inference is possible, reversible-jump Markov Chain Monte Carlo is employed to estimate the number of notes and partial frequency components in each frame of music. We also use approximate techniques such as model selection criteria and variational Bayes methods for inference in situations where computation time is limited or the amount of data to be processed is large. For the higher level score parameters, greedy search and conditional modes algorithms are found to be sufficiently accurate.
We emphasize the links between the models and inference algorithms developed in this thesis with that in existing and parallel work, and demonstrate the effects of making modifications to these models both theoretically and by means of experimental results
Auditory group theory with applications to statistical basis methods for structured audio
Thesis (Ph. D.)--Massachusetts Institute of Technology, Program in Media Arts & Sciences, 1998.Includes bibliographical references (p. 161-172).Michael Anthony Casey.Ph.D
- …