42 research outputs found

    Pitch-Informed Solo and Accompaniment Separation

    Get PDF
    ï»żDas Thema dieser Dissertation ist die Entwicklung eines Systems zur Tonhöhen-informierten Quellentrennung von Musiksignalen in Soloinstrument und Begleitung. Dieses ist geeignet, die dominanten Instrumente aus einem MusikstĂŒck zu isolieren, unabhĂ€ngig von der Art des Instruments, der Begleitung und Stilrichtung. Dabei werden nur einstimmige Melodieinstrumente in Betracht gezogen. Die Musikaufnahmen liegen monaural vor, es kann also keine zusĂ€tzliche Information aus der Verteilung der Instrumente im Stereo-Panorama gewonnen werden. Die entwickelte Methode nutzt Tonhöhen-Information als Basis fĂŒr eine sinusoidale Modellierung der spektralen Eigenschaften des Soloinstruments aus dem Musikmischsignal. Anstatt die spektralen Informationen pro Frame zu bestimmen, werden in der vorgeschlagenen Methode Tonobjekte fĂŒr die Separation genutzt. Tonobjekt-basierte Verarbeitung ermöglicht es, zusĂ€tzlich die NotenanfĂ€nge zu verfeinern, transiente Artefakte zu reduzieren, gemeinsame Amplitudenmodulation (Common Amplitude Modulation CAM) einzubeziehen und besser nichtharmonische Elemente der Töne abzuschĂ€tzen. Der vorgestellte Algorithmus zur Quellentrennung von Soloinstrument und Begleitung ermöglicht eine Echtzeitverarbeitung und ist somit relevant fĂŒr den praktischen Einsatz. Ein Experiment zur besseren Modellierung der ZusammenhĂ€nge zwischen Magnitude, Phase und Feinfrequenz von isolierten Instrumententönen wurde durchgefĂŒhrt. Als Ergebnis konnte die KontinuitĂ€t der zeitlichen EinhĂŒllenden, die InharmonizitĂ€t bestimmter Musikinstrumente und die Auswertung des Phasenfortschritts fĂŒr die vorgestellte Methode ausgenutzt werden. ZusĂ€tzlich wurde ein Algorithmus fĂŒr die Quellentrennung in perkussive und harmonische Signalanteile auf Basis des Phasenfortschritts entwickelt. Dieser erreicht ein verbesserte perzeptuelle QualitĂ€t der harmonischen und perkussiven Signale gegenĂŒber vergleichbaren Methoden nach dem Stand der Technik. Die vorgestellte Methode zur Klangquellentrennung in Soloinstrument und Begleitung wurde zu den Evaluationskampagnen SiSEC 2011 und SiSEC 2013 eingereicht. Dort konnten vergleichbare Ergebnisse im Hinblick auf perzeptuelle Bewertungsmaße erzielt werden. Die QualitĂ€t eines Referenzalgorithmus im Hinblick auf den in dieser Dissertation beschriebenen Instrumentaldatensatz ĂŒbertroffen werden. Als ein Anwendungsszenario fĂŒr die Klangquellentrennung in Solo und Begleitung wurde ein Hörtest durchgefĂŒhrt, der die QualitĂ€tsanforderungen an Quellentrennung im Kontext von Musiklernsoftware bewerten sollte. Die Ergebnisse dieses Hörtests zeigen, dass die Solo- und Begleitspur gemĂ€ĂŸ unterschiedlicher QualitĂ€tskriterien getrennt werden sollten. Die Musiklernsoftware Songs2See integriert die vorgestellte Klangquellentrennung bereits in einer kommerziell erhĂ€ltlichen Anwendung.This thesis addresses the development of a system for pitch-informed solo and accompaniment separation capable of separating main instruments from music accompaniment regardless of the musical genre of the track, or type of music accompaniment. For the solo instrument, only pitched monophonic instruments were considered in a single-channel scenario where no panning or spatial location information is available. In the proposed method, pitch information is used as an initial stage of a sinusoidal modeling approach that attempts to estimate the spectral information of the solo instrument from a given audio mixture. Instead of estimating the solo instrument on a frame by frame basis, the proposed method gathers information of tone objects to perform separation. Tone-based processing allowed the inclusion of novel processing stages for attack refinement, transient interference reduction, common amplitude modulation (CAM) of tone objects, and for better estimation of non-harmonic elements that can occur in musical instrument tones. The proposed solo and accompaniment algorithm is an efficient method suitable for real-world applications. A study was conducted to better model magnitude, frequency, and phase of isolated musical instrument tones. As a result of this study, temporal envelope smoothness, inharmonicty of musical instruments, and phase expectation were exploited in the proposed separation method. Additionally, an algorithm for harmonic/percussive separation based on phase expectation was proposed. The algorithm shows improved perceptual quality with respect to state-of-the-art methods for harmonic/percussive separation. The proposed solo and accompaniment method obtained perceptual quality scores comparable to other state-of-the-art algorithms under the SiSEC 2011 and SiSEC 2013 campaigns, and outperformed the comparison algorithm on the instrumental dataset described in this thesis.As a use-case of solo and accompaniment separation, a listening test procedure was conducted to assess separation quality requirements in the context of music education. Results from the listening test showed that solo and accompaniment tracks should be optimized differently to suit quality requirements of music education. The Songs2See application was presented as commercial music learning software which includes the proposed solo and accompaniment separation method

    Automatic transcription of polyphonic music exploiting temporal evolution

    Get PDF
    PhDAutomatic music transcription is the process of converting an audio recording into a symbolic representation using musical notation. It has numerous applications in music information retrieval, computational musicology, and the creation of interactive systems. Even for expert musicians, transcribing polyphonic pieces of music is not a trivial task, and while the problem of automatic pitch estimation for monophonic signals is considered to be solved, the creation of an automated system able to transcribe polyphonic music without setting restrictions on the degree of polyphony and the instrument type still remains open. In this thesis, research on automatic transcription is performed by explicitly incorporating information on the temporal evolution of sounds. First efforts address the problem by focusing on signal processing techniques and by proposing audio features utilising temporal characteristics. Techniques for note onset and offset detection are also utilised for improving transcription performance. Subsequent approaches propose transcription models based on shift-invariant probabilistic latent component analysis (SI-PLCA), modeling the temporal evolution of notes in a multiple-instrument case and supporting frequency modulations in produced notes. Datasets and annotations for transcription research have also been created during this work. Proposed systems have been privately as well as publicly evaluated within the Music Information Retrieval Evaluation eXchange (MIREX) framework. Proposed systems have been shown to outperform several state-of-the-art transcription approaches. Developed techniques have also been employed for other tasks related to music technology, such as for key modulation detection, temperament estimation, and automatic piano tutoring. Finally, proposed music transcription models have also been utilized in a wider context, namely for modeling acoustic scenes

    Exploiting Piano Acoustics in Automatic Transcription

    Get PDF
    This work was supported by a joint Queen Mary/China Scholarship Council Scholarship.This work was supported by a joint Queen Mary/China Scholarship Council Scholarship.This work was supported by a joint Queen Mary/China Scholarship Council Scholarship.This work was supported by a joint Queen Mary/China Scholarship Council Scholarship.In this thesis we exploit piano acoustics to automatically transcribe piano recordings into a symbolic representation: the pitch and timing of each detected note. To do so we use approaches based on non-negative matrix factorisation (NMF). To motivate the main contributions of this thesis, we provide two preparatory studies: a study of using a deterministic annealing EM algorithm in a matrix factorisation-based system, and a study of decay patterns of partials in real-word piano tones. Based on these studies, we propose two generative NMF-based models which explicitly model different piano acoustical features. The first is an attack/decay model, that takes into account the time-varying timbre and decaying energy of piano sounds. The system divides a piano note into percussive attack and harmonic decay stages, and separately models the two parts using two sets of templates and amplitude envelopes. The two parts are coupled by the note activations. We simplify the decay envelope by an exponentially decaying function. The proposed method improves the performance of supervised piano transcription. The second model aims at using the spectral width of partials as an independent indicator of the duration of piano notes. Each partial is represented by a Gaussian function, with the spectral width indicated by the standard deviation. The spectral width is large in the attack part, but gradually decreases to a stable value and remains constant in the decay part. The model provides a new aspect to understand the time-varying timbre of piano notes, but furtherinvestigation is needed to use it effectively to improve piano transcription. We demonstrate the utility of the proposed systems in piano music transcription and analysis. Results show that explicitly modelling piano acoustical features, especially temporal features, can improve the transcription performance.Queen Mary/China Scholarship Council Scholarship

    Extended Nonnegative Tensor Factorisation Models for Musical Sound Source Separation

    Get PDF
    Recently, shift-invariant tensor factorisation algorithms have been proposed for the purposes of sound source separation of pitched musical instruments. However, in practice, existing algorithms require the use of log-frequency spectrograms to allow shift invariance in frequency which causes problems when attempting to resynthesise the separated sources. Further, it is difficult to impose harmonicity constraints on the recovered basis functions. This paper proposes a new additive synthesis-based approach which allows the use of linear-frequency spectrograms as well as imposing strict harmonic constraints, resulting in an improved model. Further, these additional constraints allow the addition of a source filter model to the factorisation framework, and an extended model which is capable of separating mixtures of pitched and percussive instruments simultaneously

    Optimal spectral transportation with application to music transcription

    Get PDF
    International audienceMany spectral unmixing methods rely on the non-negative decomposition of spectral data onto a dictionary of spectral templates. In particular, state-of-the-art music transcription systems decompose the spectrogram of the input signal onto a dictionary of representative note spectra. The typical measures of fit used to quantify the adequacy of the decomposition compare the data and template entries frequency-wise. As such, small displacements of energy from a frequency bin to another as well as variations of timbre can disproportionally harm the fit. We address these issues by means of optimal transportation and propose a new measure of fit that treats the frequency distributions of energy holistically as opposed to frequency-wise. Building on the harmonic nature of sound, the new measure is invariant to shifts of energy to harmonically-related frequencies, as well as to small and local displacements of energy. Equipped with this new measure of fit, the dictionary of note templates can be considerably simplified to a set of Dirac vectors located at the target fundamental frequencies (musical pitch values). This in turns gives ground to a very fast and simple decomposition algorithm that achieves state-of-the-art performance on real musical data. 1 Context Many of nowadays spectral unmixing techniques rely on non-negative matrix decompositions. This concerns for example hyperspectral remote sensing (with applications in Earth observation, astronomy, chemistry, etc.) or audio signal processing. The spectral sample v n (the spectrum of light observed at a given pixel n, or the audio spectrum in a given time frame n) is decomposed onto a dictionary W of elementary spectral templates, characteristic of pure materials or sound objects, such that v n ≈ Wh n. The composition of sample n can be inferred from the non-negative expansion coefficients h n. This paradigm has led to state-of-the-art results for various tasks (recognition, classification, denoising, separation) in the aforementioned areas, and in particular in music transcription, the central application of this paper. In state-of-the-art music transcription systems, the spectrogram V (with columns v n) of a musical signal is decomposed onto a dictionary of pure notes (in so-called multi-pitch estimation) or chords. V typically consists of (power-)magnitude values of a regular short-time Fourier transform (Smaragdis and Brown, 2003). It may also consists of an audio-specific spectral transform such as the Mel-frequency transform, like in (Vincent et al., 2010), or the Q-constant based transform, like in (Oudre et al., 2011). The success of the transcription system depends of course on the adequacy of the time-frequency transform & the dictionary to represent the data V

    Automatic musical instrument recognition for multimedia indexing

    Get PDF
    Trabalho apresentado no ùmbito do Mestrado em Engenharia Informåtica, como requisito parcial para obtenção do grau de Mestre em Engenharia InformåticaThe subject of automatic indexing of multimedia has been a target of numerous discussion and study. This interest is due to the exponential growth of multimedia content and the subsequent need to create methods that automatically catalogue this data. To fulfil this idea, several projects and areas of study have emerged. The most relevant of these are the MPEG-7 standard, which defines a standardized system for the representation and automatic extraction of information present in the content, and Music Information Retrieval (MIR), which gathers several paradigms and areas of study relating to music. The main approach to this indexing problem relies on analysing data to obtain and identify descriptors that can help define what we intend to recognize (as, for instance,musical instruments, voice, facial expressions, and so on), this then provides us with information we can use to index the data. This dissertation will focus on audio indexing in music, specifically regarding the recognition of musical instruments from recorded musical notes. Moreover, the developed system and techniques will also be tested for the recognition of ambient sounds (such as the sound of running water, cars driving by, and so on). Our approach will use non-negative matrix factorization to extract features from various types of sounds, these will then be used to train a classification algorithm that will be then capable of identifying new sounds
    corecore