584 research outputs found

    Music Synchronization, Audio Matching, Pattern Detection, and User Interfaces for a Digital Music Library System

    Get PDF
    Over the last two decades, growing efforts to digitize our cultural heritage could be observed. Most of these digitization initiatives pursuit either one or both of the following goals: to conserve the documents - especially those threatened by decay - and to provide remote access on a grand scale. For music documents these trends are observable as well, and by now several digital music libraries are in existence. An important characteristic of these music libraries is an inherent multimodality resulting from the large variety of available digital music representations, such as scanned score, symbolic score, audio recordings, and videos. In addition, for each piece of music there exists not only one document of each type, but many. Considering and exploiting this multimodality and multiplicity, the DFG-funded digital library initiative PROBADO MUSIC aimed at developing a novel user-friendly interface for content-based retrieval, document access, navigation, and browsing in large music collections. The implementation of such a front end requires the multimodal linking and indexing of the music documents during preprocessing. As the considered music collections can be very large, the automated or at least semi-automated calculation of these structures would be recommendable. The field of music information retrieval (MIR) is particularly concerned with the development of suitable procedures, and it was the goal of PROBADO MUSIC to include existing and newly developed MIR techniques to realize the envisioned digital music library system. In this context, the present thesis discusses the following three MIR tasks: music synchronization, audio matching, and pattern detection. We are going to identify particular issues in these fields and provide algorithmic solutions as well as prototypical implementations. In Music synchronization, for each position in one representation of a piece of music the corresponding position in another representation is calculated. This thesis focuses on the task of aligning scanned score pages of orchestral music with audio recordings. Here, a previously unconsidered piece of information is the textual specification of transposing instruments provided in the score. Our evaluations show that the neglect of such information can result in a measurable loss of synchronization accuracy. Therefore, we propose an OCR-based approach for detecting and interpreting the transposition information in orchestral scores. For a given audio snippet, audio matching methods automatically calculate all musically similar excerpts within a collection of audio recordings. In this context, subsequence dynamic time warping (SSDTW) is a well-established approach as it allows for local and global tempo variations between the query and the retrieved matches. Moving to real-life digital music libraries with larger audio collections, however, the quadratic runtime of SSDTW results in untenable response times. To improve on the response time, this thesis introduces a novel index-based approach to SSDTW-based audio matching. We combine the idea of inverted file lists introduced by Kurth and Müller (Efficient index-based audio matching, 2008) with the shingling techniques often used in the audio identification scenario. In pattern detection, all repeating patterns within one piece of music are determined. Usually, pattern detection operates on symbolic score documents and is often used in the context of computer-aided motivic analysis. Envisioned as a new feature of the PROBADO MUSIC system, this thesis proposes a string-based approach to pattern detection and a novel interactive front end for result visualization and analysis

    Kompozicionalni hierarhični model za pridobivanje informacij iz glasbe

    Full text link
    In recent years, deep architectures, most commonly based on neural networks, have advanced the state of the art in many research areas. Due to the popularity and the success of deep neural-networks, other deep architectures, including compositional models, have been put aside from mainstream research. This dissertation presents the compositional hierarchical model as a novel deep architecture for music processing. Our main motivation was to develop and explore an alternative non-neural deep architecture for music processing which would be transparent, meaning that the encoded knowledge would be interpretable, trained in an unsupervised manner and on small datasets, and useful as a feature extractor for classification tasks, as well as a transparent model for unsupervised pattern discovery. We base our work on compositional models, as compositionality is inherent in music. The proposed compositional hierarchical model learns a multi-layer hierarchical representation of the analyzed music signals in an unsupervised manner. It provides transparent insights into the learned concepts and their structure. It can be used as a feature extractor---its output can be used for classification tasks using existing machine learning techniques. Moreover, the model\u27s transparency enables an interpretation of the learned concepts, so the model can be used for analysis (exploration of the learned hierarchy) or discovery-oriented (inferring the hierarchy) tasks, which is difficult with most neural network based architectures. The proposed model uses relative coding of the learned concepts, which eliminates the need for large annotated training datasets that are essential in deep architectures with a large number of parameters. Relative coding contributes to slim models, which are fast to execute and have low memory requirements. The model also incorporates several biologically-inspired mechanisms that are modeled according to the mechanisms that exists at the lower levels of human perception (eg~ lateral inhibition in the human ear) and that significantly affect perception. The proposed model is evaluated on several music information retrieval tasks and its results are compared to the current state of the art. The dissertation is structured as follows. In the first chapter we present the motivation for the development of the new model. In the second chapter we elaborate on the related work in music information retrieval and review other compositional and transparent models. Chapter three introduces a thorough description of the proposed model. The model structure, its learning and inference methods are explained, as well as the incorporated biologically-inspired mechanisms. The model is then applied to several different music domains, which are divided according to the type of input data. In this we follow the timeline of the development and the implementation of the model. In chapter four, we present the model\u27s application to audio recordings, specifically for two tasks: automatic chord estimation and multiple fundamental frequency estimation. In chapter five, we present the model\u27s application to symbolic music representations. We concentrate on pattern discovery, emphasizing the model\u27s ability to tackle such problems. We also evaluate the model as a feature generator for tune family classification. Finally, in chapter six, we show the latest progress in developing the model for representing rhythm and show that it exhibits a high degree of robustness in extracting high-level rhythmic structures from music signals. We conclude the dissertation by summarizing our work and the results, elaborating on forthcoming work in the development of the model and its future applications.S porastom globokih arhitektur, ki temeljijo na nevronskih mrežah, so se v zadnjem času bistveno izboljšali rezultati pri reševanju problemov na več področjih. Zaradi popularnosti in uspešnosti teh globokih pristopov, temelječih na nevronskih mrežah, so bili drugi, predvsem kompozicionalni pristopi, odmaknjeni od središča pozornosti raziskav. V pričujoči disertaciji se posvečamo vprašanju, ali je mogoče razviti globoko arhitekturo, ki bo presegla obstoječe probleme globokih arhitektur. S tem namenom se vračamo h kompozicionalnim modelom in predstavimo kompozicionalni hierarhični model kot alternativno globoko arhitekturo, ki bo imela naslednje značilnosti: transparentnost, ki omogoča enostavno razlago naučenih konceptov, nenadzorovano učenje in zmožnost učenja na majhnih podatkovnih bazah, uporabnost modela kot izluščevalca značilk, kot tudi zmožnost uporabe transparentnosti modela za odkrivanje vzorcev. Naše delo temelji na kompozicionalnih modelih, ki so v glasbi intuitivni. Predlagani kompozicionalni hierarhični model je zmožen nenadzorovanega učenja večnivojske predstavitve glasbenega vhoda. Model omogoča pregled naučenih konceptov skozi transparentne strukture. Lahko ga uporabimo kot generator značilk -- izhod modela lahko uporabimo za klasifikacijo z drugimi pristopi strojnega učenja. Hkrati pa lahko transparentnost predlaganega modela uporabimo za analizo (raziskovanje naučene hierarhije) pri odkrivanju vzorcev, kar je težko izvedljivo z ostalimi pristopi, ki temeljijo na nevronskih mrežah. Relativno kodiranje konceptov v samem modelu pripomore k precej manjšim modelom in posledično zmanjšuje potrebo po velikih podatkovnih zbirkah, potrebnih za učenje modela. Z vpeljavo biološko navdahnjenih mehanizmov želimo model še bolj približati človeškemu načinu zaznave. Za nekatere mehanizme, na primer inhibicijo, vemo, da so v človeški percepciji prisotni na nižjih nivojih v ušesu in bistveno vplivajo na način zaznave. V modelu uvedemo prve korake k takšnemu načinu procesiranja proti končnemu cilju izdelave modela, ki popolnoma odraža človeško percepcijo. V prvem poglavju disertacije predstavimo motivacijo za razvoj novega modela. V drugem poglavju se posvetimo dosedanjim objavljenim dosežkom na tem področju. V nadaljnjih poglavjih se osredotočimo na sam model. Sprva opišemo teoretično zasnovo modela in način učenja ter delovanje biološko-navdahnjenih mehanizmov. V naslednjem koraku model apliciramo na več različnih glasbenih domen, ki so razdeljene glede na tip vhodnih podatkov. Pri tem sledimo časovnici razvoja in implementacijam modela tekom doktorskega študija. Najprej predstavimo aplikacijo modela za časovno-frekvenčne signale, na katerem model preizkusimo za dve opravili: avtomatsko ocenjevanje harmonij in avtomatsko transkripcijo osnovnih frekvenc. V petem poglavju predstavimo drug način aplikacije modela, tokrat na simbolne vhodne podatke, ki predstavljajo glasbeni zapis. Pri tem pristopu se osredotočamo na odkrivanje vzorcev, s čimer poudarimo zmožnost modela za reševanje tovrstnih problemov, ki je ostalim pristopom še nedosegljivo. Model prav tako evalviramo v vlogi generatorja značilk. Pri tem ga evalviramo na problemu melodične podobnosti pesmi in razvrščanja v variantne tipe. Nazadnje, v šestem poglavju, pokažemo zadnji dosežek razvoja modela, ki ga apliciramo na problem razumevanja ritma v glasbi. Prilagojeni model analiziramo in pokažemo njegovo zmožnost učenja različnih ritmičnih oblik in visoko stopnjo robustnosti pri izluščevanju visokonivojskih struktur v ritmu. V zaključkih disertacije povzamemo vloženo delo in rezultate ter nakažemo nadaljnje korake za razvoj modela v prihodnosti

    Recognizing Patterns of Music Signals to Songs Classification Using Modified AIS-Based Classifier

    Get PDF
    Human capabilities of recognizing different type of music and grouping them into categories of genre are so remarkable that experts in music can perform such classification using their hearing senses and logical judgment. For decades now, the scientific community were involved in research to automate the human process of recognizing genre of songs. These efforts would normally imitate the human method of recognizing the music by considering every essential component of the songs from artist voice, melody of the music through to the type of instruments used. As a result, various approaches or mechanisms are introduced and developed to automate the classification process. The results of these studies so far have been remarkable yet can still be improved. The aim of this research is to investigate Artificial Immune System (AIS) domain by focusing on the modified AIS-based classifier to solve this problem where the focuses are the censoring and monitoring modules. In this highlight, stages of music recognition are emphasized where feature extraction, feature selection, and feature classification processes are explained. Comparison of performances between proposed classifier and WEKA application is discussed

    Automatic Transcription of Bass Guitar Tracks applied for Music Genre Classification and Sound Synthesis

    Get PDF
    Musiksignale bestehen in der Regel aus einer Überlagerung mehrerer Einzelinstrumente. Die meisten existierenden Algorithmen zur automatischen Transkription und Analyse von Musikaufnahmen im Forschungsfeld des Music Information Retrieval (MIR) versuchen, semantische Information direkt aus diesen gemischten Signalen zu extrahieren. In den letzten Jahren wurde häufig beobachtet, dass die Leistungsfähigkeit dieser Algorithmen durch die Signalüberlagerungen und den daraus resultierenden Informationsverlust generell limitiert ist. Ein möglicher Lösungsansatz besteht darin, mittels Verfahren der Quellentrennung die beteiligten Instrumente vor der Analyse klanglich zu isolieren. Die Leistungsfähigkeit dieser Algorithmen ist zum aktuellen Stand der Technik jedoch nicht immer ausreichend, um eine sehr gute Trennung der Einzelquellen zu ermöglichen. In dieser Arbeit werden daher ausschließlich isolierte Instrumentalaufnahmen untersucht, die klanglich nicht von anderen Instrumenten überlagert sind. Exemplarisch werden anhand der elektrischen Bassgitarre auf die Klangerzeugung dieses Instrumentes hin spezialisierte Analyse- und Klangsynthesealgorithmen entwickelt und evaluiert.Im ersten Teil der vorliegenden Arbeit wird ein Algorithmus vorgestellt, der eine automatische Transkription von Bassgitarrenaufnahmen durchführt. Dabei wird das Audiosignal durch verschiedene Klangereignisse beschrieben, welche den gespielten Noten auf dem Instrument entsprechen. Neben den üblichen Notenparametern Anfang, Dauer, Lautstärke und Tonhöhe werden dabei auch instrumentenspezifische Parameter wie die verwendeten Spieltechniken sowie die Saiten- und Bundlage auf dem Instrument automatisch extrahiert. Evaluationsexperimente anhand zweier neu erstellter Audiodatensätze belegen, dass der vorgestellte Transkriptionsalgorithmus auf einem Datensatz von realistischen Bassgitarrenaufnahmen eine höhere Erkennungsgenauigkeit erreichen kann als drei existierende Algorithmen aus dem Stand der Technik. Die Schätzung der instrumentenspezifischen Parameter kann insbesondere für isolierte Einzelnoten mit einer hohen Güte durchgeführt werden.Im zweiten Teil der Arbeit wird untersucht, wie aus einer Notendarstellung typischer sich wieder- holender Basslinien auf das Musikgenre geschlossen werden kann. Dabei werden Audiomerkmale extrahiert, welche verschiedene tonale, rhythmische, und strukturelle Eigenschaften von Basslinien quantitativ beschreiben. Mit Hilfe eines neu erstellten Datensatzes von 520 typischen Basslinien aus 13 verschiedenen Musikgenres wurden drei verschiedene Ansätze für die automatische Genreklassifikation verglichen. Dabei zeigte sich, dass mit Hilfe eines regelbasierten Klassifikationsverfahrens nur Anhand der Analyse der Basslinie eines Musikstückes bereits eine mittlere Erkennungsrate von 64,8 % erreicht werden konnte.Die Re-synthese der originalen Bassspuren basierend auf den extrahierten Notenparametern wird im dritten Teil der Arbeit untersucht. Dabei wird ein neuer Audiosynthesealgorithmus vorgestellt, der basierend auf dem Prinzip des Physical Modeling verschiedene Aspekte der für die Bassgitarre charakteristische Klangerzeugung wie Saitenanregung, Dämpfung, Kollision zwischen Saite und Bund sowie dem Tonabnehmerverhalten nachbildet. Weiterhin wird ein parametrischerAudiokodierungsansatz diskutiert, der es erlaubt, Bassgitarrenspuren nur anhand der ermittel- ten notenweisen Parameter zu übertragen um sie auf Dekoderseite wieder zu resynthetisieren. Die Ergebnisse mehrerer Hötest belegen, dass der vorgeschlagene Synthesealgorithmus eine Re- Synthese von Bassgitarrenaufnahmen mit einer besseren Klangqualität ermöglicht als die Übertragung der Audiodaten mit existierenden Audiokodierungsverfahren, die auf sehr geringe Bitraten ein gestellt sind.Music recordings most often consist of multiple instrument signals, which overlap in time and frequency. In the field of Music Information Retrieval (MIR), existing algorithms for the automatic transcription and analysis of music recordings aim to extract semantic information from mixed audio signals. In the last years, it was frequently observed that the algorithm performance is limited due to the signal interference and the resulting loss of information. One common approach to solve this problem is to first apply source separation algorithms to isolate the present musical instrument signals before analyzing them individually. The performance of source separation algorithms strongly depends on the number of instruments as well as on the amount of spectral overlap.In this thesis, isolated instrumental tracks are analyzed in order to circumvent the challenges of source separation. Instead, the focus is on the development of instrument-centered signal processing algorithms for music transcription, musical analysis, as well as sound synthesis. The electric bass guitar is chosen as an example instrument. Its sound production principles are closely investigated and considered in the algorithmic design.In the first part of this thesis, an automatic music transcription algorithm for electric bass guitar recordings will be presented. The audio signal is interpreted as a sequence of sound events, which are described by various parameters. In addition to the conventionally used score-level parameters note onset, duration, loudness, and pitch, instrument-specific parameters such as the applied instrument playing techniques and the geometric position on the instrument fretboard will be extracted. Different evaluation experiments confirmed that the proposed transcription algorithm outperformed three state-of-the-art bass transcription algorithms for the transcription of realistic bass guitar recordings. The estimation of the instrument-level parameters works with high accuracy, in particular for isolated note samples.In the second part of the thesis, it will be investigated, whether the sole analysis of the bassline of a music piece allows to automatically classify its music genre. Different score-based audio features will be proposed that allow to quantify tonal, rhythmic, and structural properties of basslines. Based on a novel data set of 520 bassline transcriptions from 13 different music genres, three approaches for music genre classification were compared. A rule-based classification system could achieve a mean class accuracy of 64.8 % by only taking features into account that were extracted from the bassline of a music piece.The re-synthesis of a bass guitar recordings using the previously extracted note parameters will be studied in the third part of this thesis. Based on the physical modeling of string instruments, a novel sound synthesis algorithm tailored to the electric bass guitar will be presented. The algorithm mimics different aspects of the instrument’s sound production mechanism such as string excitement, string damping, string-fret collision, and the influence of the electro-magnetic pickup. Furthermore, a parametric audio coding approach will be discussed that allows to encode and transmit bass guitar tracks with a significantly smaller bit rate than conventional audio coding algorithms do. The results of different listening tests confirmed that a higher perceptual quality can be achieved if the original bass guitar recordings are encoded and re-synthesized using the proposed parametric audio codec instead of being encoded using conventional audio codecs at very low bit rate settings

    Machine Annotation of Traditional Irish Dance Music

    Get PDF
    The work presented in this thesis is validated in experiments using 130 realworld field recordings of traditional music from sessions, classes, concerts and commercial recordings. Test audio includes solo and ensemble playing on a variety of instruments recorded in real-world settings such as noisy public sessions. Results are reported using standard measures from the field of information retrieval (IR) including accuracy, error, precision and recall and the system is compared to alternative approaches for CBMIR common in the literature

    A Bio-Inspired Music Genre Classification Framework using Modified AIS-Based Classifier

    Get PDF
    For decades now, scientific community are involved in various works to automate the human process of recognizing different types of music using different elements for example different instruments used. These efforts would imitate the human method of recognizing the music by considering every essential component of the songs from artist voice, melody of the music through to the type of instruments used. Various approaches or mechanisms are introduced and developed to automate the classification process since then. The results of these studies so far have been remarkable yet can still be improved. The aim of this research is to investigate Artificial Immune System (AIS) domain by focusing on the modified AIS-based classifier to solve this problem where the focuses are the censoring and monitoring modules. In this highlight, stages of music recognition are emphasized where feature extraction, feature selection, and feature classification processes are explained. Comparison of performances between proposed classifier and WEKA application is discussed. Almost 20 to 30 percent of classification accuracies are increased in this study
    corecore