35 research outputs found

    Multiple-F0 estimation of piano sounds exploiting spectral structure and temporal evolution

    Get PDF
    This paper proposes a system for multiple fundamental frequency estimation of piano sounds using pitch candidate selection rules which employ spectral structure and temporal evolution. As a time-frequency representation, the Resonator Time-Frequency Image of the input signal is employed, a noise suppression model is used, and a spectral whitening procedure is performed. In addition, a spectral flux-based onset detector is employed in order to select the steady-state region of the produced sound. In the multiple-F0 estimation stage, tuning and inharmonicity parameters are extracted and a pitch salience function is proposed. Pitch presence tests are performed utilizing information from the spectral structure of pitch candidates, aiming to suppress errors occurring at multiples and sub-multiples of the true pitches. A novel feature for the estimation of harmonically related pitches is proposed, based on the common amplitude modulation assumption. Experiments are performed on the MAPS database using 8784 piano samples of classical, jazz, and random chords with polyphony levels between 1 and 6. The proposed system is computationally inexpensive, being able to perform multiple-F0 estimation experiments in realtime. Experimental results indicate that the proposed system outperforms state-of-the-art approaches for the aforementioned task in a statistically significant manner. Index Terms: multiple-F0 estimation, resonator timefrequency image, common amplitude modulatio

    A parametric method for pitch estimation of piano tones

    Get PDF
    The efficiency of most pitch estimation methods declines when the analyzed frame is shortened and/or when a wide fundamental frequency (F0) range is targeted. The technique proposed herein jointly uses a periodicity analysis and a spectral matching process to improve the F0 estimation performance in such an adverse context: a 60ms-long data frame together with the whole, 7 1 /4-octaves, piano tessitura. The enhancements are obtained thanks to a parametric approach which, among other things, models the inharmonicity of piano tones. The performance of the algorithm is assessed, is compared to the results obtained from other estimators and is discussed in order to characterize their behavior and typical misestimations. Index Terms — audio processing, pitch estimation 1

    The Multi-Pitch Estimation Problem: Some New Solutions

    Get PDF
    n this paper, we formulate the multi-pitch estimation problem and propose a number of methods to estimate the set of fundamental frequencies. The methods, which are based on nonlinear least-squares, multiple signal classification (MUSIC) and the Capon principles, have in common the fact that the multiple fundamental frequencies are estimated by means of a one-dimensional search. The statistical properties of the methods are evaluated via Monte Carlo simulation

    Multipitch tracking in music signals using Echo State Networks

    Get PDF
    Currently, convolutional neural networks (CNNs) define the state of the art for multipitch tracking in music signals. Echo State Networks (ESNs), a recently introduced recurrent neural network architecture, achieved similar results as CNNs for various tasks, such as phoneme or digit recognition. However, they have not yet received much attention in the community of Music Information Retrieval. The core of ESNs is a group of unordered, randomly connected neurons, i.e., the reservoir, by which the low-dimensional input space is non-linearly transformed into a high-dimensional feature space. Because only the weights of the connections between the reservoir and the output are trained using linear regression, ESNs are easier to train than deep neural networks. This paper presents a first exploration of ESNs for the challenging task of multipitch tracking in music signals. The best results presented in this paper were achieved with a bidirectional two-layer ESN with 20 000 neurons in each layer. Although the final F -score of 0.7198 still falls below the state of the art (0.7370), the proposed ESN-based approach serves as a baseline for further investigations of ESNs in audio signal processing in the future

    Three-Parametric Cubic Interpolation for Estimating the Fundamental Frequency of the Speech Signal

    Get PDF
    In this paper, we propose a three-parametric convolution kernel which is based on the one-parameter Keys kernel. The first part of the paper describes the structure of the three-parameter convolution kernel. Then, a certain analytical expression for finding the position of the maximum of the reconstructed function is given. The second part presents an algorithm for estimating the fundamental frequency of the speech signal processing in the frequency domain using Picking Picks methods and parametric cubic convolution. Furthermore, the results of experiments give the estimated fundamental frequency of speech and sinusoidal signals in order to select the optimal values of the parameters of the proposed convolution kernel. The results of the fundamental frequency estimation according to the mean square error are given by tables and graphics. Consequently, it is used as a basis for a comparative analysis. The analysis derived the optimal parameters of the kernel and the window function that generates the least MSE. Results showed a higher efficiency in comparison to two or three-parameter convolution kernel

    Multiple Fundamental Frequency Pitch Detection for Real Time MIDI Applications

    Get PDF
    This study aimed to develop a real time multiple fundamental frequency detection algorithm for real time pitch to MIDI conversion applications. The algorithm described here uses neural network classifiers to make classifications in order to define a chord pattern (combination of multiple fundamental frequencies). The first classification uses a binary decision tree that determines the root note (first note) in a combination of notes; this is achieved through a neural network binary classifier. For each leaf of the binary tree, each classifier determines the frequency group of the root note (low or high frequency) until only two frequencies are left to choose from. The second classifier determines the amount of polyphony, or number of notes played. This classifier is designed in the same fashion as the first, using a binary tree made up of neural network classifiers. The third classifier classifies the chord pattern that has been played. The chord classifier is chosen based on the root note and amount of polyphony, the first two classifiers constrain the third classifier to chords containing only a specific root not and a set polyphony. This allows for the classifier to be more focused and of a higher accuracy. To further increase accuracy, an error correction scheme was devised based on repetitive coding, a technique that holds out multiple frames and compares them in order to detect and correct errors. Repetitive coding significantly increases the classifiers accuracy; it was found that holding out three frames was suitable for real-time operation in terms of throughput, though holding out more frames further increases accuracy it was not suitable real time operation. The algorithm was tested on a common embedded platform, which through benchmarking showed the algorithm was well suited for real time operation

    Guitar Chords Classification Using Uncertainty Measurements of Frequency Bins

    Get PDF
    This paper presents a method to perform chord classification from recorded audio. The signal harmonics are obtained by using the Fast Fourier Transform, and timbral information is suppressed by spectral whitening. A multiple fundamental frequency estimation of whitened data is achieved by adding attenuated harmonics by a weighting function. This paper proposes a method that performs feature selection by using a thresholding of the uncertainty of all frequency bins. Those measurements under the threshold are removed from the signal in the frequency domain. This allows a reduction of 95.53% of the signal characteristics, and the other 4.47% of frequency bins are used as enhanced information for the classifier. An Artificial Neural Network was utilized to classify four types of chords: major, minor, major 7th, and minor 7th. Those, played in the twelve musical notes, give a total of 48 different chords. Two reference methods (based on Hidden Markov Models) were compared with the method proposed in this paper by having the same database for the evaluation test. In most of the performed tests, the proposed method achieved a reasonably high performance, with an accuracy of 93%

    Automatic Music Transcription: Breaking the Glass Ceiling

    Get PDF
    Automatic music transcription is considered by many to be the Holy Grail in the field of music signal analysis. However, the performance of transcription systems is still significantly below that of a human expert, and accuracies reported in recent years seem to have reached a limit, although the field is still very active. In this paper we analyse limitations of current methods and identify promising directions for future research. Current transcription methods use general purpose models which are unable to capture the rich diversity found in music signals. In order to overcome the limited performance of transcription systems, algorithms have to be tailored to specific use-cases. Semi-automatic approaches are another way of achieving a more reliable transcription. Also, the wealth of musical scores and corresponding audio data now available are a rich potential source of training data, via forced alignment of audio to scores, but large scale utilisation of such data has yet to be attempted. Other promising approaches include the integration of information across different methods and musical aspects
    corecore