262 research outputs found
Automatic Transcription of Bass Guitar Tracks applied for Music Genre Classification and Sound Synthesis
Musiksignale bestehen in der Regel aus einer Überlagerung mehrerer
Einzelinstrumente. Die meisten existierenden Algorithmen zur automatischen
Transkription und Analyse von Musikaufnahmen im Forschungsfeld des Music
Information Retrieval (MIR) versuchen, semantische Information direkt aus
diesen gemischten Signalen zu extrahieren. In den letzten Jahren wurde
häufig beobachtet, dass die Leistungsfähigkeit dieser Algorithmen durch
die Signalüberlagerungen und den daraus resultierenden Informationsverlust
generell limitiert ist. Ein möglicher Lösungsansatz besteht darin,
mittels Verfahren der Quellentrennung die beteiligten Instrumente vor der
Analyse klanglich zu isolieren. Die Leistungsfähigkeit dieser Algorithmen
ist zum aktuellen Stand der Technik jedoch nicht immer ausreichend, um eine
sehr gute Trennung der Einzelquellen zu ermöglichen. In dieser Arbeit
werden daher ausschließlich isolierte Instrumentalaufnahmen untersucht,
die klanglich nicht von anderen Instrumenten überlagert sind. Exemplarisch
werden anhand der elektrischen Bassgitarre auf die Klangerzeugung dieses
Instrumentes hin spezialisierte Analyse- und Klangsynthesealgorithmen
entwickelt und evaluiert.Im ersten Teil der vorliegenden Arbeit wird ein
Algorithmus vorgestellt, der eine automatische Transkription von
Bassgitarrenaufnahmen durchführt. Dabei wird das Audiosignal durch
verschiedene Klangereignisse beschrieben, welche den gespielten Noten auf
dem Instrument entsprechen. Neben den üblichen Notenparametern Anfang,
Dauer, Lautstärke und Tonhöhe werden dabei auch instrumentenspezifische
Parameter wie die verwendeten Spieltechniken sowie die Saiten- und Bundlage
auf dem Instrument automatisch extrahiert. Evaluationsexperimente anhand
zweier neu erstellter Audiodatensätze belegen, dass der vorgestellte
Transkriptionsalgorithmus auf einem Datensatz von realistischen
Bassgitarrenaufnahmen eine höhere Erkennungsgenauigkeit erreichen kann als
drei existierende Algorithmen aus dem Stand der Technik. Die Schätzung der
instrumentenspezifischen Parameter kann insbesondere für isolierte
Einzelnoten mit einer hohen Güte durchgeführt werden.Im zweiten Teil der
Arbeit wird untersucht, wie aus einer Notendarstellung typischer sich
wieder- holender Basslinien auf das Musikgenre geschlossen werden kann.
Dabei werden Audiomerkmale extrahiert, welche verschiedene tonale,
rhythmische, und strukturelle Eigenschaften von Basslinien quantitativ
beschreiben. Mit Hilfe eines neu erstellten Datensatzes von 520 typischen
Basslinien aus 13 verschiedenen Musikgenres wurden drei verschiedene
Ansätze für die automatische Genreklassifikation verglichen. Dabei zeigte
sich, dass mit Hilfe eines regelbasierten Klassifikationsverfahrens nur
Anhand der Analyse der Basslinie eines Musikstückes bereits eine mittlere
Erkennungsrate von 64,8 % erreicht werden konnte.Die Re-synthese der
originalen Bassspuren basierend auf den extrahierten Notenparametern wird
im dritten Teil der Arbeit untersucht. Dabei wird ein neuer
Audiosynthesealgorithmus vorgestellt, der basierend auf dem Prinzip des
Physical Modeling verschiedene Aspekte der für die Bassgitarre
charakteristische Klangerzeugung wie Saitenanregung, Dämpfung, Kollision
zwischen Saite und Bund sowie dem Tonabnehmerverhalten nachbildet.
Weiterhin wird ein parametrischerAudiokodierungsansatz diskutiert, der es
erlaubt, Bassgitarrenspuren nur anhand der ermittel- ten notenweisen
Parameter zu übertragen um sie auf Dekoderseite wieder zu
resynthetisieren. Die Ergebnisse mehrerer Hötest belegen, dass der
vorgeschlagene Synthesealgorithmus eine Re- Synthese von
Bassgitarrenaufnahmen mit einer besseren Klangqualität ermöglicht als die
Übertragung der Audiodaten mit existierenden Audiokodierungsverfahren, die
auf sehr geringe Bitraten ein gestellt sind.Music recordings most often consist of multiple instrument signals, which
overlap in time and frequency. In the field of Music Information Retrieval
(MIR), existing algorithms for the automatic transcription and analysis of
music recordings aim to extract semantic information from mixed audio
signals. In the last years, it was frequently observed that the algorithm
performance is limited due to the signal interference and the resulting
loss of information. One common approach to solve this problem is to first
apply source separation algorithms to isolate the present musical
instrument signals before analyzing them individually. The performance of
source separation algorithms strongly depends on the number of instruments
as well as on the amount of spectral overlap.In this thesis, isolated
instrumental tracks are analyzed in order to circumvent the challenges of
source separation. Instead, the focus is on the development of
instrument-centered signal processing algorithms for music transcription,
musical analysis, as well as sound synthesis. The electric bass guitar is
chosen as an example instrument. Its sound production principles are
closely investigated and considered in the algorithmic design.In the first
part of this thesis, an automatic music transcription algorithm for
electric bass guitar recordings will be presented. The audio signal is
interpreted as a sequence of sound events, which are described by various
parameters. In addition to the conventionally used score-level parameters
note onset, duration, loudness, and pitch, instrument-specific parameters
such as the applied instrument playing techniques and the geometric
position on the instrument fretboard will be extracted. Different
evaluation experiments confirmed that the proposed transcription algorithm
outperformed three state-of-the-art bass transcription algorithms for the
transcription of realistic bass guitar recordings. The estimation of the
instrument-level parameters works with high accuracy, in particular for
isolated note samples.In the second part of the thesis, it will be
investigated, whether the sole analysis of the bassline of a music piece
allows to automatically classify its music genre. Different score-based
audio features will be proposed that allow to quantify tonal, rhythmic, and
structural properties of basslines. Based on a novel data set of 520
bassline transcriptions from 13 different music genres, three approaches
for music genre classification were compared. A rule-based classification
system could achieve a mean class accuracy of 64.8 % by only taking
features into account that were extracted from the bassline of a music
piece.The re-synthesis of a bass guitar recordings using the previously
extracted note parameters will be studied in the third part of this thesis.
Based on the physical modeling of string instruments, a novel sound
synthesis algorithm tailored to the electric bass guitar will be presented.
The algorithm mimics different aspects of the instrument’s sound
production mechanism such as string excitement, string damping, string-fret
collision, and the influence of the electro-magnetic pickup. Furthermore, a
parametric audio coding approach will be discussed that allows to encode
and transmit bass guitar tracks with a significantly smaller bit rate than
conventional audio coding algorithms do. The results of different listening
tests confirmed that a higher perceptual quality can be achieved if the
original bass guitar recordings are encoded and re-synthesized using the
proposed parametric audio codec instead of being encoded using conventional
audio codecs at very low bit rate settings
DMRN+18: Digital Music Research Network One-day Workshop 2023
DMRN+18: Digital Music Research Network One-day Workshop 2023 Queen Mary University of London Tuesday 19th December 2023 • Keynote speaker: Stefan Bilbao The Digital Music Research Network (DMRN) aims to promote research in the area of digital music, by bringing together researchers from UK and overseas universities, as well as industry, for its annual workshop. The workshop will include invited and contributed talks and posters. The workshop will be an ideal opportunity for networking with other people working in the area. Keynote speakers: Stefan Bilbao Tittle: Physics-based Audio: Sound Synthesis and Virtual Acoustics. Abstract: Any acoustically-produced sound produced must be the result of physical laws that describe the dynamics of a given system---always at least partly mechanical, and sometimes with an electronic element as well. One approach to the synthesis of natural acoustic timbres, thus, is through simulation, often referred to in this context as physical modelling, or physics-based audio. In this talk, the principles of physics-based audio, and the various different approaches to simulation are described, followed by a set of examples covering: various musical instrument types; the important related problem of the emulation of room acoustics or “virtual acoustics”; the embedding of instruments in a 3D virtual space; electromechanical effects; and also new modular instrument designs based on physical laws, but without a counterpart in the real world. Some more technical details follow, including the strengths, weaknesses and limitations of such methods, and pointers to some links to data-centred black-box approaches to sound generation and effects processing. The talk concludes with some musical examples and recent work on moving such algorithms to a real-time setting.. Bio: Stefan is a Professor (full) at Reid School of Music, University of Edinburgh, he is the Personal Chair of Acoustics and Audio Signal Processing, Music. He currently works on computational acoustics, for applications in sound synthesis and virtual acoustics. Special topics of interest include: Finite difference time domain methods, distributed nonlinear systems such as strings and plates, architectural acoustics, spatial audio in simulation, multichannel sound synthesis, and hardware and software realizations. More information on: https://www.acoustics.ed.ac.uk/group-members/dr-stefan-bilbao/ DMRN+18 is sponsored by The UKRI Centre for Doctoral Training in Artificial Intelligence and Music (AIM); a leading PhD research programme aimed at the Music/Audio Technology and Creative Industries, based at Queen Mary University of London
Adaptive Scattering Transforms for Playing Technique Recognition
Playing techniques contain distinctive information about musical expressivity and interpretation. Yet, current research in music signal analysis suffers from a scarcity of computational models for playing techniques, especially in the context of live performance. To address this problem, our paper develops a general framework for playing technique recognition. We propose the adaptive scattering transform, which refers to any scattering transform that includes a stage of data-driven dimensionality reduction over at least one of its wavelet variables, for representing playing techniques. Two adaptive scattering features are presented: frequency-adaptive scattering and direction-adaptive scattering. We analyse seven playing techniques: vibrato, tremolo, trill, flutter-tongue, acciaccatura, portamento, and glissando. To evaluate the proposed methodology, we create a new dataset containing full-length Chinese bamboo flute performances (CBFdataset) with expert playing technique annotations. Once trained on the proposed scattering representations, a support vector classifier achieves state-of-the-art results. We provide explanatory visualisations of scattering coefficients for each technique and verify the system over three additional datasets with various instrumental and vocal techniques: VPset, SOL, and VocalSet
Music Information Retrieval Meets Music Education
This paper addresses the use of Music Information Retrieval (MIR) techniques in music education and their integration in learning software. A general overview of systems that are either commercially available or in research stage is presented. Furthermore, three well-known MIR methods used in music learning systems and their state-of-the-art are described: music transcription, solo and accompaniment track creation, and generation of performance instructions. As a representative example of a music learning system developed within the MIR community, the Songs2See software is outlined. Finally, challenges and directions for future research are described
Automatic Guitar Chord Detection
Automatic guitar chord detection is a process that attempts to detect a guitar chord from a piece of audio. Generally, automatic chord detection is considered to be a part of a large problem termed as automatic transcription. Although there has been a lot of research in the field of automatic transcription, but having a reliable transcription system is still a distant prospect. Chord detection becomes interesting as chords have comparatively stable structure and they completely describe the occurring harmonies in a piece of music.
This thesis presents a novel approach for detecting the correctness of musical chords played by guitar. The approach is based on pattern matching technique applied to the database of chords and their typical mistakes. Mistakes are the versions of a chord where typical playing errors are made. Transient of a chord is skipped and its spectrum is whitened. A certain region of whitened spectra is chosen as a feature vector. Cosine distance is computed between the extracted features and the data present in a reference chord database. Finally, the system detects the correctness of a played chord based on k-Nearest Neighbor (k-NN) classifier.
The developed system uses two types of spectral whitening techniques: one is based on Linear Predictive Coding (LPC) and the other is based on Phase Transform-beta (PHAT-beta). The average accuracy shown by LPC based system is 72% while that of PHAT-beta is 82.5%. The system was also evaluated under different noise conditions
Polyphonic music information retrieval based on multi-label cascade classification system
Recognition and separation of sounds played by various instruments is very useful in labeling audio files with semantic information. This is a non-trivial task requiring sound analysis, but the results can aid automatic indexing and browsing music data when searching for melodies played by user specified instruments. Melody match based on pitch detection technology has drawn much attention and a lot of MIR systems have been developed to fulfill this task. However, musical instrument recognition remains an unsolved problem in the domain. Numerous approaches on acoustic feature extraction have already been proposed for timbre recognition. Unfortunately, none of those monophonic timbre estimation algorithms can be successfully applied to polyphonic sounds, which are the more usual cases in the real music world. This has stimulated the research on multi-labeled instrument classification and new features development for content-based automatic music information retrieval. The original audio signals are the large volume of unstructured sequential values, which are not suitable for traditional data mining algorithms; while the acoustical features are sometime not sufficient for instrument recognition in polyphonic sounds because they are higher-level representatives of raw signal lacking details of original information. In order to capture the patterns which evolve on the time scale, new temporal features are introduced to supply more temporal information for the timbre recognition. We will introduce the multi-labeled classification system to estimate multiple timbre information from the polyphonic sound by classification based on acoustic features and short-term power spectrum matching. In order to achieve higher estimation rate, we introduced the hierarchically structured cascade classification system under the inspiration of the human perceptual process. This cascade classification system makes a first estimate on the higher level decision attribute, which stands for the musical instrument family. Then, the further estimation is done within that specific family range. Experiments showed better performance of a hierarchical system than the traditional flat classification method which directly estimates the instrument without higher level of family information analysis.
Traditional hierarchical structures were constructed in human semantics, which are meaningful from human perspective but not appropriate for the cascade system. We introduce the new hierarchical instrument schema according to the clustering results of the acoustic features. This new schema better describes the similarity among different instruments or among different playing techniques of the same instrument. The classification results show the higher accuracy of cascade system with the new schema compared to the traditional schemas. The query answering system is built based on the cascade classifier
- …