470 research outputs found
PnP Maxtools: Autonomous Parameter Control in MaxMSP Utilizing MIR Algorithms
This research presents a new approach to computer automation through the implementation of novel real-time music information retrieval algorithms developed for this project. It documents the development of the PnP.Maxtools package, a set of open source objects designed within the popular programming environment MaxMSP. The package is a set of pre/post processing filters, objective and subjective timbral descriptors, audio effects, and other objects that are designed to be used together to compose music or improvise without the use of external controllers or hardware. The PnP.Maxtools package objects are designed to be used quickly and easily using a `plug and play\u27 style with as few initial arguments needed as possible. The PnP.Maxtools package is designed to take incoming audio from a microphone, analyze it, and use the analysis to control an audio effect on the incoming signal in real-time. In this way, the audio content has a real musical and analogous relationship with the resulting musical transformations while the control parameters become more multifaceted and better able to serve the needs of artists. The term Reflexive Automation is presented that describes this unsupervised relationship between the content of the sound being analyzed and the analogous and automatic control over a specific musical parameter. A set of compositions are also presented that demonstrate ideal usage of the object categories for creating reflexive systems and achieving fully autonomous control over musical parameters
Real-time Percussive Technique Recognition and Embedding Learning for the Acoustic Guitar
Real-time music information retrieval (RT-MIR) has much potential to augment
the capabilities of traditional acoustic instruments. We develop RT-MIR
techniques aimed at augmenting percussive fingerstyle, which blends acoustic
guitar playing with guitar body percussion. We formulate several design
objectives for RT-MIR systems for augmented instrument performance: (i) causal
constraint, (ii) perceptually negligible action-to-sound latency, (iii) control
intimacy support, (iv) synthesis control support. We present and evaluate
real-time guitar body percussion recognition and embedding learning techniques
based on convolutional neural networks (CNNs) and CNNs jointly trained with
variational autoencoders (VAEs). We introduce a taxonomy of guitar body
percussion based on hand part and location. We follow a cross-dataset
evaluation approach by collecting three datasets labelled according to the
taxonomy. The embedding quality of the models is assessed using KL-Divergence
across distributions corresponding to different taxonomic classes. Results
indicate that the networks are strong classifiers especially in a simplified
2-class recognition task, and the VAEs yield improved class separation compared
to CNNs as evidenced by increased KL-Divergence across distributions. We argue
that the VAE embedding quality could support control intimacy and rich
interaction when the latent space's parameters are used to control an external
synthesis engine. Further design challenges around generalisation to different
datasets have been identified.Comment: Accepted at the 24th Int. Society for Music Information Retrieval
Conf., Milan, Italy, 202
Real-time Percussive Technique Recognition and Embedding Learning for the Acoustic Guitar
Real-time music information retrieval (RT-MIR) has much potential to augment the capabilities of traditional acoustic instruments. We develop RT-MIR techniques aimed at augmenting percussive fingerstyle, which blends acoustic guitar playing with guitar body percussion. We formulate several design objectives for RT-MIR systems for augmented instrument performance: (i) causal constraint, (ii) perceptually negligible action-to-sound latency, (iii) control intimacy support, (iv) synthesis control support. We present and evaluate real-time guitar body percussion recognition and embedding learning techniques based on convolutional neural networks (CNNs) and CNNs jointly trained with variational autoencoders (VAEs). We introduce a taxonomy of guitar body percussion based on hand part and location. We follow a cross-dataset evaluation approach by collecting three datasets labelled according to the taxonomy. The embedding quality of the models is assessed using KL-Divergence across distributions corresponding to different taxonomic classes. Results indicate that the networks are strong classifiers especially in a simplified 2-class recognition task, and the VAEs yield improved class separation compared to CNNs as evidenced by increased KL-Divergence across distributions. We argue that the VAE embedding quality could support control intimacy and rich interaction when the latent space's parameters are used to control an external synthesis engine. Further design challenges around generalisation to different datasets have been identified
Delayed Decision-making in Real-time Beatbox Percussion Classification
This is an electronic version of an article published in Journal of New Music Research, 39(3), 203-213, 2010. doi:10.1080/09298215.2010.512979. Journal of New Music Research is available online at: www.tandfonline.com/openurl?genre=article&issn=1744-5027&volume=39&issue=3&spage=20
The development of corpus-based computer assisted composition program and its application for instrumental music composition
In the last 20 years, we have seen the nourishing environment for the development of
music software using a corpus of audio data expanding significantly, namely that synthesis
techniques producing electronic sounds, and supportive tools for creative activities
are the driving forces to the growth. Some software produces a sequence of sounds by
means of synthesizing a chunk of source audio data retrieved from an audio database
according to a rule. Since the matching of sources is processed according to their descriptive
features extracted by FFT analysis, the quality of the result is significantly
influenced by the outcomes of the Audio Analysis, Segmentation, and Decomposition.
Also, the synthesis process often requires a considerable amount of sample data and
this can become an obstacle to establish easy, inexpensive, and user-friendly applications
on various kinds of devices. Therefore, it is crucial to consider how to treat the
data and construct an efficient database for the synthesis. We aim to apply corpusbased
synthesis techniques to develop a Computer Assisted Composition program, and
to investigate the actual application of the program on ensemble pieces. The goal of
this research is to apply the program to the instrumental music composition, refine its
function, and search new avenues for innovative compositional method
Problems and opportunities of applying data-& audio-mining techniques to ethnic music
[TODO] Add abstract here
Ontology of music performance variation
Performance variation in rhythm determines the extent that humans perceive and feel the effect of rhythmic pulsation and music in general. In many cases, these rhythmic variations can be linked to percussive performance. Such percussive performance variations are often absent in current percussive rhythmic models. The purpose of this thesis is to present an interactive computer model, called the PD-103, that simulates the micro-variations in human percussive performance. This thesis makes three main contributions to existing knowledge: firstly, by formalising a new method for modelling percussive performance; secondly, by developing a new compositional software tool called the PD-103 that models human percussive performance, and finally, by creating a portfolio of different musical styles to demonstrate the capabilities of the software. A large database of recorded samples are classified into zones based upon the vibrational characteristics of the instruments, to model timbral variation in human percussive performance. The degree of timbral variation is governed by principles of biomechanics and human percussive performance. A fuzzy logic algorithm is applied to analyse current and first-order sample selection in order to formulate an ontological description of music performance variation. Asynchrony values were extracted from recorded performances of three different performance skill levels to create \timing fingerprints" which characterise unique features to each percussionist. The PD-103 uses real performance timing data to determine asynchrony values for each synthesised note. The spectral content of the sample database forms a three-dimensional loudness/timbre space, intersecting instrumental behaviour with music composition. The reparameterisation of the sample database, following the analysis of loudness, spectral flatness, and spectral centroid, provides an opportunity to explore the timbral variations inherent in percussion instruments, to creatively explore dimensions of timbre. The PD-103 was used to create a music portfolio exploring different rhythmic possibilities with a focus on meso-periodic rhythms common to parts of West Africa, jazz drumming, and electroacoustic music. The portfolio also includes new timbral percussive works based on spectral features and demonstrates the central aim of this thesis, which is the creation of a new compositional software tool that integrates human percussive performance and subsequently extends this model to different genres of music
Real-time hit classification in a smart cajón
© 2018 Turchet, McPherson and Barthet. Smartmusical instruments are a class of IoT devices formusicmaking, which encompass embedded intelligence as well as wireless connectivity. In previous work, we established design requirements for a novel smart musical instrument, a smart cajón, following a user-centered approach. This paper describes the implementation and technical evaluation of the designed component of the smart cajón related to hit classification and repurposing. A conventional acoustic cajón was enhanced with sensors to classify position of the hit and the gesture that produced it. The instrument was equipped with five piezo pickups attached to the internal panels and a condenser microphone located inside. The developed sound engine leveraged digital signal processing, sensor fusion, and machine learning techniques to classify the position, dynamics, and timbre of each hit. The techniques were devised and implemented to achieve low latency between action and the electronically-generated sounds, as well as keep computational efficiency high. The system was tuned to classify two main cajón playing techniques at different locations and we conducted evaluations using over 2,000 hits performed by two professional players. We first assessed the classification performance when training and testing data related to recordings fromthe same player. In this configuration, classification accuracies of 100% were obtained for hit detection and location. Accuracies of over 90% were obtained when classifying timbres produced by the two playing techniques. We then assessed the classifier in a cross-player configuration (training and testing were performed using recordings from different players). Results indicated that while hit location scales relatively well across different players, gesture identification requires that the involved classifiers are trained specifically for each musician
Vocal imitation for query by vocalisation
PhD ThesisThe human voice presents a rich and powerful medium for expressing sonic ideas such as musical sounds. This capability extends beyond the sounds used in speech, evidenced for example in the art form of beatboxing, and recent studies highlighting the utility of vocal imitation for communicating sonic concepts. Meanwhile, the advance of digital audio has resulted in huge libraries of sounds at the disposal of music producers and sound designers. This presents a compelling search problem: with larger search spaces, the task of navigating sound libraries has become increasingly difficult. The versatility and expressive nature of the voice provides a seemingly ideal medium for querying sound libraries, raising the question of how well humans are able to vocally imitate
musical sounds, and how we might use the voice as a tool for search. In this thesis we address these questions by investigating the ability of musicians to
vocalise synthesised and percussive sounds, and evaluate the suitability of different audio features for predicting the perceptual similarity between vocal
imitations and imitated sounds.
In the first experiment, musicians were tasked with imitating synthesised sounds with one or two time–varying feature envelopes applied. The results
show that participants were able to imitate pitch, loudness, and spectral centroid features accurately, and that imitation accuracy was generally preserved
when the imitated stimuli combined two, non-necessarily congruent features. This demonstrates the viability of using the voice as a natural means of
expressing time series of two features simultaneously. The second experiment consisted of two parts. In a vocal production task,
musicians were asked to imitate drum sounds. Listeners were then asked to rate the similarity between the imitations and sounds from the same category
(e.g. kick, snare etc.). The results show that drum sounds received the highest similarity ratings when rated against their imitations (as opposed to imitations of another sound), and overall more than half the imitated sounds were correctly identified with above chance accuracy from the imitations, although
this varied considerably between drum categories.
The findings from the vocal imitation experiments highlight the capacity of musicians to vocally imitate musical sounds, and some limitations of non–
verbal vocal expression. Finally, we investigated the performance of different audio features as predictors of perceptual similarity between the imitations and
imitated sounds from the second experiment. We show that features learned using convolutional auto–encoders outperform a number of popular heuristic
features for this task, and that preservation of temporal information is more important than spectral resolution for differentiating between the vocal imitations and same–category drum sounds
Automatic Transcription of Bass Guitar Tracks applied for Music Genre Classification and Sound Synthesis
Musiksignale bestehen in der Regel aus einer Überlagerung mehrerer
Einzelinstrumente. Die meisten existierenden Algorithmen zur automatischen
Transkription und Analyse von Musikaufnahmen im Forschungsfeld des Music
Information Retrieval (MIR) versuchen, semantische Information direkt aus
diesen gemischten Signalen zu extrahieren. In den letzten Jahren wurde
häufig beobachtet, dass die Leistungsfähigkeit dieser Algorithmen durch
die Signalüberlagerungen und den daraus resultierenden Informationsverlust
generell limitiert ist. Ein möglicher Lösungsansatz besteht darin,
mittels Verfahren der Quellentrennung die beteiligten Instrumente vor der
Analyse klanglich zu isolieren. Die Leistungsfähigkeit dieser Algorithmen
ist zum aktuellen Stand der Technik jedoch nicht immer ausreichend, um eine
sehr gute Trennung der Einzelquellen zu ermöglichen. In dieser Arbeit
werden daher ausschließlich isolierte Instrumentalaufnahmen untersucht,
die klanglich nicht von anderen Instrumenten überlagert sind. Exemplarisch
werden anhand der elektrischen Bassgitarre auf die Klangerzeugung dieses
Instrumentes hin spezialisierte Analyse- und Klangsynthesealgorithmen
entwickelt und evaluiert.Im ersten Teil der vorliegenden Arbeit wird ein
Algorithmus vorgestellt, der eine automatische Transkription von
Bassgitarrenaufnahmen durchführt. Dabei wird das Audiosignal durch
verschiedene Klangereignisse beschrieben, welche den gespielten Noten auf
dem Instrument entsprechen. Neben den üblichen Notenparametern Anfang,
Dauer, Lautstärke und Tonhöhe werden dabei auch instrumentenspezifische
Parameter wie die verwendeten Spieltechniken sowie die Saiten- und Bundlage
auf dem Instrument automatisch extrahiert. Evaluationsexperimente anhand
zweier neu erstellter Audiodatensätze belegen, dass der vorgestellte
Transkriptionsalgorithmus auf einem Datensatz von realistischen
Bassgitarrenaufnahmen eine höhere Erkennungsgenauigkeit erreichen kann als
drei existierende Algorithmen aus dem Stand der Technik. Die Schätzung der
instrumentenspezifischen Parameter kann insbesondere für isolierte
Einzelnoten mit einer hohen Güte durchgeführt werden.Im zweiten Teil der
Arbeit wird untersucht, wie aus einer Notendarstellung typischer sich
wieder- holender Basslinien auf das Musikgenre geschlossen werden kann.
Dabei werden Audiomerkmale extrahiert, welche verschiedene tonale,
rhythmische, und strukturelle Eigenschaften von Basslinien quantitativ
beschreiben. Mit Hilfe eines neu erstellten Datensatzes von 520 typischen
Basslinien aus 13 verschiedenen Musikgenres wurden drei verschiedene
Ansätze für die automatische Genreklassifikation verglichen. Dabei zeigte
sich, dass mit Hilfe eines regelbasierten Klassifikationsverfahrens nur
Anhand der Analyse der Basslinie eines Musikstückes bereits eine mittlere
Erkennungsrate von 64,8 % erreicht werden konnte.Die Re-synthese der
originalen Bassspuren basierend auf den extrahierten Notenparametern wird
im dritten Teil der Arbeit untersucht. Dabei wird ein neuer
Audiosynthesealgorithmus vorgestellt, der basierend auf dem Prinzip des
Physical Modeling verschiedene Aspekte der für die Bassgitarre
charakteristische Klangerzeugung wie Saitenanregung, Dämpfung, Kollision
zwischen Saite und Bund sowie dem Tonabnehmerverhalten nachbildet.
Weiterhin wird ein parametrischerAudiokodierungsansatz diskutiert, der es
erlaubt, Bassgitarrenspuren nur anhand der ermittel- ten notenweisen
Parameter zu übertragen um sie auf Dekoderseite wieder zu
resynthetisieren. Die Ergebnisse mehrerer Hötest belegen, dass der
vorgeschlagene Synthesealgorithmus eine Re- Synthese von
Bassgitarrenaufnahmen mit einer besseren Klangqualität ermöglicht als die
Ãœbertragung der Audiodaten mit existierenden Audiokodierungsverfahren, die
auf sehr geringe Bitraten ein gestellt sind.Music recordings most often consist of multiple instrument signals, which
overlap in time and frequency. In the field of Music Information Retrieval
(MIR), existing algorithms for the automatic transcription and analysis of
music recordings aim to extract semantic information from mixed audio
signals. In the last years, it was frequently observed that the algorithm
performance is limited due to the signal interference and the resulting
loss of information. One common approach to solve this problem is to first
apply source separation algorithms to isolate the present musical
instrument signals before analyzing them individually. The performance of
source separation algorithms strongly depends on the number of instruments
as well as on the amount of spectral overlap.In this thesis, isolated
instrumental tracks are analyzed in order to circumvent the challenges of
source separation. Instead, the focus is on the development of
instrument-centered signal processing algorithms for music transcription,
musical analysis, as well as sound synthesis. The electric bass guitar is
chosen as an example instrument. Its sound production principles are
closely investigated and considered in the algorithmic design.In the first
part of this thesis, an automatic music transcription algorithm for
electric bass guitar recordings will be presented. The audio signal is
interpreted as a sequence of sound events, which are described by various
parameters. In addition to the conventionally used score-level parameters
note onset, duration, loudness, and pitch, instrument-specific parameters
such as the applied instrument playing techniques and the geometric
position on the instrument fretboard will be extracted. Different
evaluation experiments confirmed that the proposed transcription algorithm
outperformed three state-of-the-art bass transcription algorithms for the
transcription of realistic bass guitar recordings. The estimation of the
instrument-level parameters works with high accuracy, in particular for
isolated note samples.In the second part of the thesis, it will be
investigated, whether the sole analysis of the bassline of a music piece
allows to automatically classify its music genre. Different score-based
audio features will be proposed that allow to quantify tonal, rhythmic, and
structural properties of basslines. Based on a novel data set of 520
bassline transcriptions from 13 different music genres, three approaches
for music genre classification were compared. A rule-based classification
system could achieve a mean class accuracy of 64.8 % by only taking
features into account that were extracted from the bassline of a music
piece.The re-synthesis of a bass guitar recordings using the previously
extracted note parameters will be studied in the third part of this thesis.
Based on the physical modeling of string instruments, a novel sound
synthesis algorithm tailored to the electric bass guitar will be presented.
The algorithm mimics different aspects of the instrument’s sound
production mechanism such as string excitement, string damping, string-fret
collision, and the influence of the electro-magnetic pickup. Furthermore, a
parametric audio coding approach will be discussed that allows to encode
and transmit bass guitar tracks with a significantly smaller bit rate than
conventional audio coding algorithms do. The results of different listening
tests confirmed that a higher perceptual quality can be achieved if the
original bass guitar recordings are encoded and re-synthesized using the
proposed parametric audio codec instead of being encoded using conventional
audio codecs at very low bit rate settings
- …