1,236 research outputs found
Speaker recognition using frequency filtered spectral energies
The spectral parameters that result from filtering the
frequency sequence of log mel-scaled filter-bank energies
with a simple first or second order FIR filter have proved
to be an efficient speech representation in terms of both
speech recognition rate and computational load. Recently,
the authors have shown that this frequency filtering can
approximately equalize the cepstrum variance enhancing
the oscillations of the spectral envelope curve that are
most effective for discrimination between speakers. Even
better speaker identification results than using melcepstrum
have been obtained on the TIMIT database,
especially when white noise was added. On the other
hand, the hybridization of both linear prediction and
filter-bank spectral analysis using either cepstral
transformation or the alternative frequency filtering has
been explored for speaker verification. The combination
of hybrid spectral analysis and frequency filtering, that
had shown to be able to outperform the conventional
techniques in clean and noisy word recognition, has yield
good text-dependent speaker verification results on the
new speaker-oriented telephone-line POLYCOST
database.Peer ReviewedPostprint (published version
Joint Multi-Pitch Detection Using Harmonic Envelope Estimation for Polyphonic Music Transcription
In this paper, a method for automatic transcription of music signals based on joint multiple-F0 estimation is proposed. As a time-frequency representation, the constant-Q resonator time-frequency image is employed, while a novel noise suppression technique based on pink noise assumption is applied in a preprocessing step. In the multiple-F0 estimation stage, the optimal tuning and inharmonicity parameters are computed and a salience function is proposed in order to select pitch candidates. For each pitch candidate combination, an overlapping partial treatment procedure is used, which is based on a novel spectral envelope estimation procedure for the log-frequency domain, in order to compute the harmonic envelope of candidate pitches. In order to select the optimal pitch combination for each time frame, a score function is proposed which combines spectral and temporal characteristics of the candidate pitches and also aims to suppress harmonic errors. For postprocessing, hidden Markov models (HMMs) and conditional random fields (CRFs) trained on MIDI data are employed, in order to boost transcription accuracy. The system was trained on isolated piano sounds from the MAPS database and was tested on classic and jazz recordings from the RWC database, as well as on recordings from a Disklavier piano. A comparison with several state-of-the-art systems is provided using a variety of error metrics, where encouraging results are indicated
- …