105 research outputs found

    Identifying Cover Songs Using Information-Theoretic Measures of Similarity

    Get PDF
    This work is licensed under a Creative Commons Attribution 3.0 License. For more information, see http://creativecommons.org/licenses/by/3.0/This paper investigates methods for quantifying similarity between audio signals, specifically for the task of cover song detection. We consider an information-theoretic approach, where we compute pairwise measures of predictability between time series. We compare discrete-valued approaches operating on quantized audio features, to continuous-valued approaches. In the discrete case, we propose a method for computing the normalized compression distance, where we account for correlation between time series. In the continuous case, we propose to compute information-based measures of similarity as statistics of the prediction error between time series. We evaluate our methods on two cover song identification tasks using a data set comprised of 300 Jazz standards and using the Million Song Dataset. For both datasets, we observe that continuous-valued approaches outperform discrete-valued approaches. We consider approaches to estimating the normalized compression distance (NCD) based on string compression and prediction, where we observe that our proposed normalized compression distance with alignment (NCDA) improves average performance over NCD, for sequential compression algorithms. Finally, we demonstrate that continuous-valued distances may be combined to improve performance with respect to baseline approaches. Using a large-scale filter-and-refine approach, we demonstrate state-of-the-art performance for cover song identification using the Million Song Dataset.The work of P. Foster was supported by an Engineering and Physical Sciences Research Council Doctoral Training Account studentship

    Real-time Sound Source Separation For Music Applications

    Get PDF
    Sound source separation refers to the task of extracting individual sound sources from some number of mixtures of those sound sources. In this thesis, a novel sound source separation algorithm for musical applications is presented. It leverages the fact that the vast majority of commercially recorded music since the 1950s has been mixed down for two channel reproduction, more commonly known as stereo. The algorithm presented in Chapter 3 in this thesis requires no prior knowledge or learning and performs the task of separation based purely on azimuth discrimination within the stereo field. The algorithm exploits the use of the pan pot as a means to achieve image localisation within stereophonic recordings. As such, only an interaural intensity difference exists between left and right channels for a single source. We use gain scaling and phase cancellation techniques to expose frequency dependent nulls across the azimuth domain, from which source separation and resynthesis is carried out. The algorithm is demonstrated to be state of the art in the field of sound source separation but also to be a useful pre-process to other tasks such as music segmentation and surround sound upmixing

    Upmixing from Mono : a Source Separation Approach

    Get PDF
    We present a system for upmixing mono recordings to stereo through the use of sound source separation techniques. The use of sound source separation has the advantage of allowing sources to be placed at distinct points in the stereo field, resulting in more natural sounding upmixes. The system separates an input signal into a number of sources, which can then be imported into a digital audio workstation for upmixing to stereo. Considerations to be taken into account when upmixing are discussed, and a brief overview of the various sound source separation techniques used in the system are given. The effectiveness of the proposed system is then demonstrated on real-world mono recordings

    Extended Nonnegative Tensor Factorisation Models for Musical Sound Source Separation

    Get PDF
    Recently, shift-invariant tensor factorisation algorithms have been proposed for the purposes of sound source separation of pitched musical instruments. However, in practice, existing algorithms require the use of log-frequency spectrograms to allow shift invariance in frequency which causes problems when attempting to resynthesise the separated sources. Further, it is difficult to impose harmonicity constraints on the recovered basis functions. This paper proposes a new additive synthesis-based approach which allows the use of linear-frequency spectrograms as well as imposing strict harmonic constraints, resulting in an improved model. Further, these additional constraints allow the addition of a source filter model to the factorisation framework, and an extended model which is capable of separating mixtures of pitched and percussive instruments simultaneously

    Sound Source Separation using Shifted Non-negative Tensor Factorisation

    Get PDF
    Recently, shifted non-negative Matrix Factorisation was developed as a means of separating harmonic instruments from single channel mixtures. However, in many cases two or more channels are available, in which case it would be advantageous to have a multichannel version of the algorithm. To this end, a shifted Non-negative Tensor Factorisation algorithm is derived, which extends shifted Non-negative Matrix Factoristiaon to the multi channel case. The use of this algorithm for multi-channel sound source separation of harmonic instruments is demonstrated. Further, it is shown that the algorithm can be used to perform Non-negative Tensor Deconvolution, to separate sound sources which have time evolving spectra from multi-channel signals

    Automatic cymbal classification

    Get PDF
    Dissertação apresentada na Faculdade de Ciências e Tecnologia da Universidade Nova de Lisboa para a obtenção do grau de Mestre em Engenharia InformáticaMost of the research on automatic music transcription is focused on the transcription of pitched instruments, like the guitar and the piano. Little attention has been given to unpitched instruments, such as the drum kit, which is a collection of unpitched instruments. Yet, over the last few years this type of instrument started to garner more attention, perhaps due to increasing popularity of the drum kit in the western music. There has been work on automatic music transcription of the drum kit, especially the snare drum, bass drum, and hi-hat. Still, much work has to be done in order to achieve automatic music transcription of all unpitched instruments. An example of a type of unpitched instrument that has very particular acoustic characteristics and that has deserved almost no attention by the research community is the drum kit cymbals. A drum kit contains several cymbals and usually these are treated as a single instrument or are totally disregarded by automatic music classificators of unpitched instruments. We propose to fill this gap and as such, the goal of this dissertation is automatic music classification of drum kit cymbal events, and the identification of which class of cymbals they belong to. As stated, the majority of work developed on this area is mostly done with very different percussive instruments, like the snare drum, bass drum, and hi-hat. On the other hand, cymbals are very similar between them. Their geometry, type of alloys, spectral and sound traits shows us just that. Thus, the great achievement of this work is not only being able to correctly classify the different cymbals, but to be able to identify such similar instruments, which makes this task even harder

    An review of automatic drum transcription

    Get PDF
    In Western popular music, drums and percussion are an important means to emphasize and shape the rhythm, often defining the musical style. If computers were able to analyze the drum part in recorded music, it would enable a variety of rhythm-related music processing tasks. Especially the detection and classification of drum sound events by computational methods is considered to be an important and challenging research problem in the broader field of Music Information Retrieval. Over the last two decades, several authors have attempted to tackle this problem under the umbrella term Automatic Drum Transcription(ADT).This paper presents a comprehensive review of ADT research, including a thorough discussion of the task-specific challenges, categorization of existing techniques, and evaluation of several state-of-the-art systems. To provide more insights on the practice of ADT systems, we focus on two families of ADT techniques, namely methods based on Nonnegative Matrix Factorization and Recurrent Neural Networks. We explain the methods’ technical details and drum-specific variations and evaluate these approaches on publicly available datasets with a consistent experimental setup. Finally, the open issues and under-explored areas in ADT research are identified and discussed, providing future directions in this fiel

    Player vs Transcriber: A Game approach to automatic music transcription

    Get PDF
    State-of-the-art automatic drum transcription (ADT) ap-proaches utilise deep learning methods reliant on time-consuming manual annotations and require congruence be-tween training and testing data. When these conditionsare not held, they often fail to generalise. We proposea game approach to ADT, termed player vs transcriber(PvT), in which a player model aims to reduce transcrip-tion accuracy of a transcriber model by manipulating train-ing data in two ways. First, existing data may be aug-mented, allowing the transcriber to be trained using record-ings with modified timbres. Second, additional individualrecordings from sample libraries are included to generaterare combinations. We present three versions of the PvTmodel:AugExist, which augments pre-existing record-ings;AugAddExist, which adds additional samples ofdrum hits to theAugExistsystem; andGenerate, whichgenerates training examples exclusively from individualdrum hits from sample libraries. The three versions areevaluated alongside a state-of-the-art deep learning ADTsystem using two evaluation strategies. The results demon-strate that including the player network improves the ADTperformance and suggests that this is due to improved gen-eralisability. The results also indicate that although theGeneratemodel achieves relatively low results, it is a vi-able choice when annotations are not accessible

    Harmonic/Percussive Separation Using Median Filtering

    Get PDF
    In this paper, we present a fast, simple and effective method to separate the harmonic and percussive parts of a monaural audio signal.The technique involves the use of median filtering on a spectrogram of the audio signal, with median filtering performed across successive frames to suppress percussive events and enhance harmonic components, while median filtering is also performed across frequency bins to enhance percussive events and supress harmonic components. The two resulting median filtered spectrograms are then used to generate masks which are then applied to the original spectrogram to separate the harmonic and percussive parts of the signal. We illustrate the use of the algorithm in the context of remixing audio material from commercial recordings
    corecore