161 research outputs found
Harmonic/Percussive Separation Using Median Filtering
In this paper, we present a fast, simple and effective method to separate the harmonic and percussive parts of a monaural audio signal.The technique involves the use of median filtering on a spectrogram of the audio signal, with median filtering performed across successive frames to suppress percussive events and enhance harmonic components, while median filtering is also performed across frequency bins to enhance percussive events and supress harmonic components. The two resulting median filtered spectrograms are then used to generate masks which are then applied to the original spectrogram to separate the harmonic and percussive parts of the signal. We illustrate the use of the algorithm in the context of remixing audio material from commercial recordings
Automatic Drum Transcription and Source Separation
While research has been carried out on automated polyphonic music transcription, to-date the problem of automated polyphonic percussion transcription has not received the same degree of attention. A related problem is that of sound source separation, which attempts to separate a mixture signal into its constituent sources. This thesis focuses on the task of polyphonic percussion transcription and sound source separation of a limited set of drum instruments, namely the drums found in the standard rock/pop drum kit. As there was little previous research on polyphonic percussion transcription a broad review of music information retrieval methods, including previous polyphonic percussion systems, was also carried out to determine if there were any methods which were of potential use in the area of polyphonic drum transcription. Following on from this a review was conducted of general source separation and redundancy reduction techniques, such as Independent Component Analysis and Independent Subspace Analysis, as these techniques have shown potential in separating mixtures of sources. Upon completion of the review it was decided that a combination of the blind separation approach, Independent Subspace Analysis (ISA), with the use of prior knowledge as used in music information retrieval methods, was the best approach to tackling the problem of polyphonic percussion transcription as well as that of sound source separation. A number of new algorithms which combine the use of prior knowledge with the source separation abilities of techniques such as ISA are presented. These include sub-band ISA, Prior Subspace Analysis (PSA), and an automatic modelling and grouping technique which is used in conjunction with PSA to perform polyphonic percussion transcription. These approaches are demonstrated to be effective in the task of polyphonic percussion transcription, and PSA is also demonstrated to be capable of transcribing drums in the presence of pitched instruments
Upmixing from Mono : a Source Separation Approach
We present a system for upmixing mono recordings to stereo through the use of sound source separation techniques. The use of sound source separation has the advantage of allowing sources to be placed at distinct points in the stereo field, resulting in more natural sounding upmixes. The system separates an input signal into a number of sources, which can then be imported into a digital audio workstation for upmixing to stereo. Considerations to be taken into account when upmixing are discussed, and a brief overview of the various sound source separation techniques used in the system are given. The effectiveness of the proposed system is then demonstrated on real-world mono recordings
On the Use of Masking Filters in Sound Source Separation
Many sound source separation algorithms, such as NMF and related approaches, disregard phase information and operate only on magnitude or power spectrograms. In this context, generalised Wiener filters have been widely used to generate masks which are applied to the original complex-valued spectrogram before inversion to the time domain, as these masks have been shown to give good results. However, these masks may not be optimal from a perceptual point of view. To this end, we propose new families of masks and compare their performance to generalised Wiener filter masks using three different factorisation-based separation algorithms. Further, to-date no analysis of how the performance of masking varies with the number of iterations performed when estimating the separated sources. We perform such an analysis and show that when using these masks, running to convergence may not be required in order to obtain good separation performance
Independent Subspace Analysis using Locally Linear Embedding
While Independent Subspace Analysis provides a means of blindly separating sound souces from a single channel signal, it does have a number of problems. In particular the amount of information required for separation of sources varies with the signal. This is a result of the variance-based nature of Principal Component Analysis, which is used for dimensional reduction in the Independent Subspace Analysis algorithm. In an attempt to overcome this problem the use of a non-variance based dimensional reduction method, Locally Linear Embedding, is proposed. Locally Linear Embedding is a geometry based dimensional reduction technique. The use of this approach is demonstrated by its application to single channel source separation and its merits discusse
Simulation of Textured Audio Harmonics Using Random Fractal Phaselets
We present a method of simulating audio signals using the prin- ciples of random fractal geometry which, in the context of this paper, is concerned with the analysis of statistically self-affine ‘phaselets’. The approach is used to generate audio signals that are characterised by texture and timbre through the Fractal Dimension such as those associated with bowed stringed instruments. The paper provides a short overview on potential simulation methods us- ing Artificial Neural Networks and Evolutionary Computing and on the problems associated with using a deterministic approach based on solutions to the acoustic wave equation. This serves to quantify the origins of the ‘noise’ associated with multiple scatter- ing events that often characterises texture and timbre in an audio signal. We then explore a method to compute the phaselet of a phase signal which is the primary phase function from which a phase signal is, to a good approximation, a periodic replica and show that, by modelling the phaselet as a random fractal signal, it can be characterised by the Fractal Dimension. The Fractal Dimension is then used to synthesise a phaselet from which the phase function is computed through multiple concatenations of the phaselet. The paper provides details of the principal steps associ- ated with the method considered and examines some example re- sults, providing a URL to m-coded functions for interested readers to repeat the results obtained and develop the algorithms further
Sub-band Independent Subspace Analysis for Drum Transcription
While Independent Subspace Analysis provides a means of separating sound sources from a single channel signal, making it an effective tool for drum transcription, it does have a number of problems. Not least of these is that the amount of information required to allow separation of sound sources varies from signal to signal. To overcome this indeterminacy and improve the robustness of transcription an extension of Independent Subspace Analysis to include sub-band processing is proposed. The use of this approach is demonstrated by its application in a simple drum transcription algorithm
Extended Nonnegative Tensor Factorisation Models for Musical Sound Source Separation
Recently, shift-invariant tensor factorisation algorithms have been proposed for the purposes of sound source separation of
pitched musical instruments. However, in practice, existing algorithms require the use of log-frequency spectrograms to allow
shift invariance in frequency which causes problems when attempting to resynthesise the separated sources. Further, it is difficult
to impose harmonicity constraints on the recovered basis functions. This paper proposes a new additive synthesis-based
approach which allows the use of linear-frequency spectrograms as well as imposing strict harmonic constraints, resulting in
an improved model. Further, these additional constraints allow the addition of a source filter model to the factorisation framework,
and an extended model which is capable of separating mixtures of pitched and percussive instruments simultaneously
Key Signature Estimation
The problem of automatic key signature detection has been the focus of much research in recent years. Previous methods of key estimation have focused on chromagrams and key profiling techniques. This paper presents a remarkably simple but effective method of estimating key signature from musical recordings. The algorithm introduces the keyogram , a concept resembling the chromagram, and is aimed for use on traditional Irish music. The keyogram is a measure of the likelihood of each possible major key signature based on a masked scoring system
Drum Transcription using Automatic Grouping of Events and Prior Subspace Analysis
While Prior Subspace Analysis (PSA) has proved an effective tool for transcribung mixtures of snare, kick drum and hi-hat, attempts to extend it to increased numbers of drum types have met with mixed results. To overcome this an automatic grouping method has been developed to group drum events on their similarity in frequency content. Combined with PSA this creates a system able to handle robustly greater numbers of drum types. The effectiveness of this method is demonstrated in a drum transcription algorithm
- …