Search CORE

31,454 research outputs found

Joint Multi-Pitch Detection Using Harmonic Envelope Estimation for Polyphonic Music Transcription

Author: Emmanouil Benetos
Simon Dixon
Student Member
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

In this paper, a method for automatic transcription of music signals based on joint multiple-F0 estimation is proposed. As a time-frequency representation, the constant-Q resonator time-frequency image is employed, while a novel noise suppression technique based on pink noise assumption is applied in a preprocessing step. In the multiple-F0 estimation stage, the optimal tuning and inharmonicity parameters are computed and a salience function is proposed in order to select pitch candidates. For each pitch candidate combination, an overlapping partial treatment procedure is used, which is based on a novel spectral envelope estimation procedure for the log-frequency domain, in order to compute the harmonic envelope of candidate pitches. In order to select the optimal pitch combination for each time frame, a score function is proposed which combines spectral and temporal characteristics of the candidate pitches and also aims to suppress harmonic errors. For postprocessing, hidden Markov models (HMMs) and conditional random fields (CRFs) trained on MIDI data are employed, in order to boost transcription accuracy. The system was trained on isolated piano sounds from the MAPS database and was tested on classic and jazz recordings from the RWC database, as well as on recordings from a Disklavier piano. A comparison with several state-of-the-art systems is provided using a variety of error metrics, where encouraging results are indicated

CiteSeerX

City Research Online

Multiple-F0 estimation of piano sounds exploiting spectral structure and temporal evolution

Author: Benetos E.
Dixon S.
Publication venue
Publication date: 01/01/2010
Field of study

This paper proposes a system for multiple fundamental frequency estimation of piano sounds using pitch candidate selection rules which employ spectral structure and temporal evolution. As a time-frequency representation, the Resonator Time-Frequency Image of the input signal is employed, a noise suppression model is used, and a spectral whitening procedure is performed. In addition, a spectral flux-based onset detector is employed in order to select the steady-state region of the produced sound. In the multiple-F0 estimation stage, tuning and inharmonicity parameters are extracted and a pitch salience function is proposed. Pitch presence tests are performed utilizing information from the spectral structure of pitch candidates, aiming to suppress errors occurring at multiples and sub-multiples of the true pitches. A novel feature for the estimation of harmonically related pitches is proposed, based on the common amplitude modulation assumption. Experiments are performed on the MAPS database using 8784 piano samples of classical, jazz, and random chords with polyphony levels between 1 and 6. The proposed system is computationally inexpensive, being able to perform multiple-F0 estimation experiments in realtime. Experimental results indicate that the proposed system outperforms state-of-the-art approaches for the aforementioned task in a statistically significant manner. Index Terms: multiple-F0 estimation, resonator timefrequency image, common amplitude modulatio

CiteSeerX

City Research Online

Polyphonic music transcription using note onset and offset detection

Author: Benetos E.
Dixon S.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

In this paper, an approach for polyphonic music transcription based on joint multiple-F0 estimation and note onset/offset detection is proposed. For preprocessing, the resonator time-frequency image of the input music signal is extracted and noise suppression is performed. A pitch salience function is extracted for each frame along with tuning and inharmonicity parameters. For onset detection, late fusion is employed by combining a novel spectral flux-based feature which incorporates pitch tuning information and a novel salience function-based descriptor. For each segment defined by two onsets, an overlapping partial treatment procedure is used and a pitch set score function is proposed. A note offset detection procedure is also proposed using HMMs trained on MIDI data. The system was trained on piano chords and tested on classic and jazz recordings from the RWC database. Improved transcription results are reported compared to state-of-the-art approaches

CiteSeerX

City Research Online

Structured Sparsity Models for Multiparty Speech Recovery from Reverberant Recordings

Author: Asaei Afsaneh
Bourlard Hervé
Cevher Volkan
Golbabaee Mohammad
Publication venue
Publication date: 01/01/2012
Field of study

We tackle the multi-party speech recovery problem through modeling the acoustic of the reverberant chambers. Our approach exploits structured sparsity models to perform room modeling and speech recovery. We propose a scheme for characterizing the room acoustic from the unknown competing speech sources relying on localization of the early images of the speakers by sparse approximation of the spatial spectra of the virtual sources in a free-space model. The images are then clustered exploiting the low-rank structure of the spectro-temporal components belonging to each source. This enables us to identify the early support of the room impulse response function and its unique map to the room geometry. To further tackle the ambiguity of the reflection ratios, we propose a novel formulation of the reverberation model and estimate the absorption coefficients through a convex optimization exploiting joint sparsity model formulated upon spatio-spectral sparsity of concurrent speech representation. The acoustic parameters are then incorporated for separating individual speech signals through either structured sparse recovery or inverse filtering the acoustic channels. The experiments conducted on real data recordings demonstrate the effectiveness of the proposed approach for multi-party speech recovery and recognition.Comment: 31 page

arXiv.org e-Print Archive

Adaptive Vectorial Filter for Grid Synchronization of Power Converters Under Unbalanced and/or Distorted Grid Conditions

Author: Carrasco Solís Juan Manuel
León Galván José Ignacio
Reyes Díaz Manuel Rafael
Sánchez Segura Juan Antonio
Vázquez Pérez Sergio
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

This paper presents a new synchronization scheme for detecting multiple positive-/negative-sequence frequency harmonics in three-phase systems for grid-connected power converters. The proposed technique is called MAVF-FLL because it is based on the use of multiple adaptive vectorial filters (AVFs) working together inside a harmonic decoupling network, resting on a frequency-locked loop (FLL) which makes the system frequency adaptive. The method uses the vectorial properties of the three-phase input signal in the αβ reference frame in order to obtain the different harmonic components. The MAVF-FLL is fully designed and analyzed, addressing the tuning procedure in order to obtain the desired and predefined performance. The proposed algorithm is evaluated by both simulation and experimental results, demonstrating its ability to perform as required for detecting different harmonic components under a highly unbalanced and distorted input grid voltage

Multiple source direction of arrival estimation using subspace pseudointensity vectors

Author: Moore Alastair H.
Publication venue
Publication date: 28/11/2018
Field of study

The recently proposed subspace pseudointensity method for direction of arrival estimation is applied in the context of Tasks 1 and 2 of the LOCATA Challenge using the Eigenmike recordings. Specific implementation details are described and results reported for the development dataset, for which the ground truth source directions are available. For both single and multiple source scenarios, the average absolute error angle is about 9 degrees.Comment: In Proceedings of the LOCATA Challenge Workshop - a satellite event of IWAENC 2018 (arXiv:1811.08482

arXiv.org e-Print Archive

Spiral - Imperial College Digital Repository