15 research outputs found

    The influence of sampling frequency on tone recognition of musical instruments

    Get PDF
    Sampling frequency of musical instruments tone recognition generally follows the Shannon sampling theorem. This paper explores the influence of sampling frequency that does not follow the Shannon sampling theorem, in the tone recognition system using segment averaging for feature extraction and template matching for classification. The musical instruments we used were bellyra, flute, and pianica, where each of them represented a musical instrument that had one, a few, and many significant local peaks in the Discrete Fourier Transform (DFT) domain. Based on our experiments, until the sampling frequency is as low as 312 Hz, recognition rate performance of bellyra and flute tones were influenced a little since it reduced in the range of 5%. However, recognition rate performance of pianica tones was not influenced by that sampling frequency. Therefore, if that kind of reduced recognition rate could be accepted, the sampling frequency as low as 312 Hz could be used for tone recognition of musical instruments

    Multipitch tracking in music signals using Echo State Networks

    Get PDF
    Currently, convolutional neural networks (CNNs) define the state of the art for multipitch tracking in music signals. Echo State Networks (ESNs), a recently introduced recurrent neural network architecture, achieved similar results as CNNs for various tasks, such as phoneme or digit recognition. However, they have not yet received much attention in the community of Music Information Retrieval. The core of ESNs is a group of unordered, randomly connected neurons, i.e., the reservoir, by which the low-dimensional input space is non-linearly transformed into a high-dimensional feature space. Because only the weights of the connections between the reservoir and the output are trained using linear regression, ESNs are easier to train than deep neural networks. This paper presents a first exploration of ESNs for the challenging task of multipitch tracking in music signals. The best results presented in this paper were achieved with a bidirectional two-layer ESN with 20 000 neurons in each layer. Although the final F -score of 0.7198 still falls below the state of the art (0.7370), the proposed ESN-based approach serves as a baseline for further investigations of ESNs in audio signal processing in the future

    Dct based feature extraction and support vector machine classification for musical instruments tone recognition

    Full text link
    The conducted research proposes a feature extraction and classification combination method that is used in a tone recognition system for musical instruments. It is expected that by implementing this combination, the tone recognition system will require fewer feature extraction coefficients than those previously investigated. The proposed combination comprises of feature extraction using discrete cosine transform (DCT) and classification using support vector machine (SVM). Bellyra, clarinet, and pianica tones were used in the experiment, with each indicating a tone with one, several, or many major local peaks in the transform domain. Based on the results of the tests, the proposed combination is efficient enough to be used in a tone recognition system for musical instruments. This is indicated in recognizing a tone, it only needs at least eight feature extraction coefficients

    Real-time detection of overlapping sound events with non-negative matrix factorization

    Get PDF
    International audienceIn this paper, we investigate the problem of real-time detection of overlapping sound events by employing non-negative matrix factorization techniques. We consider a setup where audio streams arrive in real-time to the system and are decomposed onto a dictionary of event templates learned off-line prior to the decomposition. An important drawback of existing approaches in this context is the lack of controls on the decomposition. We propose and compare two provably convergent algorithms that address this issue, by controlling respectively the sparsity of the decomposition and the trade-off of the decomposition between the different frequency components. Sparsity regularization is considered in the framework of convex quadratic programming, while frequency compromise is introduced by employing the beta-divergence as a cost function. The two algorithms are evaluated on the multi-source detection tasks of polyphonic music transcription, drum transcription and environmental sound recognition. The obtained results show how the proposed approaches can improve detection in such applications, while maintaining low computational costs that are suitable for real-time

    Multiple Fundamental Frequency Estimation and Polyphony Inference of Polyphonic Music Signals

    No full text
    cote interne IRCAM: Yeh10aNone / NoneNational audienceThis article presents a frame-based system for estimating multiple fundamental frequencies (F0s) of polyphonic music signals based on the STFT (short-time Fourier transform) representation. To estimate the number of sources along with their F0s, it is proposed to estimate the noise level beforehand and then jointly evaluate all the possible combinations among pre-selected F0 candidates. Given a set of F0 hypotheses, their hypothetical partial sequences are derived, taking into account where partial overlap may occur. A score function is used to select the plausible sets of F0 hypotheses. To infer the best combination, hypothetical sources are progressively combined and iteratively verified. A hypothetical source is considered valid if it either explains more energy than the noise, or improves significantly the envelope smoothness once the overlapping partials are treated. The proposed system has been submitted to MIREX (Music Information Retrieval Evaluation eXchange) 2007 and 2008 contests where the accuracy has been evaluated with respect to the number of sources inferred and the precision of the F0s estimated. The encouraging results demonstrate its competitive performance among the state-of-the-art methods

    Automatic transcription of music using deep learning techniques

    Get PDF
    Music transcription is the problem of detecting notes that are being played in a musical piece. This is a difficult task that only trained people are capable of doing. Due to its difficulty, there have been a high interest in automate it. However, automatic music transcription encompasses several fields of research such as, digital signal processing, machine learning, music theory and cognition, pitch perception and psychoacoustics. All of this, makes automatic music transcription an hard problem to solve. In this work we present a novel approach of automatically transcribing piano musical pieces using deep learning techniques. We take advantage of deep learning techniques to build several classifiers, each one responsible for detecting only one musical note. In theory, this division of work would enhance the ability of each classifier to transcribe. Apart from that, we also apply two additional stages, pre-processing and post-processing, to improve the efficiency of our system. The pre-processing stage aims at improving the quality of the input data before the classification/transcription stage, while the post-processing aims at fixing errors originated during the classification stage. In the initial steps, preliminary experiments have been performed to fine tune our model, in both three stages: pre-processing, classification and post-processing. The experimental setup, using those optimized techniques and parameters, is shown and a comparison is given with other two state-of-the-art works that apply the same dataset as well as the same deep learning technique but using a different approach. By different approach we mean that a single neural network is used to detect all the musical notes rather than one neural network per each note. Our approach was able to surpass in frame-based metrics these works, while reaching close results in onset-based metrics, demonstrating the feasability of our approach
    corecore