1,828 research outputs found
Speaker recognition using frequency filtered spectral energies
The spectral parameters that result from filtering the
frequency sequence of log mel-scaled filter-bank energies
with a simple first or second order FIR filter have proved
to be an efficient speech representation in terms of both
speech recognition rate and computational load. Recently,
the authors have shown that this frequency filtering can
approximately equalize the cepstrum variance enhancing
the oscillations of the spectral envelope curve that are
most effective for discrimination between speakers. Even
better speaker identification results than using melcepstrum
have been obtained on the TIMIT database,
especially when white noise was added. On the other
hand, the hybridization of both linear prediction and
filter-bank spectral analysis using either cepstral
transformation or the alternative frequency filtering has
been explored for speaker verification. The combination
of hybrid spectral analysis and frequency filtering, that
had shown to be able to outperform the conventional
techniques in clean and noisy word recognition, has yield
good text-dependent speaker verification results on the
new speaker-oriented telephone-line POLYCOST
database.Peer ReviewedPostprint (published version
Sound Source Localization in a Multipath Environment Using Convolutional Neural Networks
The propagation of sound in a shallow water environment is characterized by
boundary reflections from the sea surface and sea floor. These reflections
result in multiple (indirect) sound propagation paths, which can degrade the
performance of passive sound source localization methods. This paper proposes
the use of convolutional neural networks (CNNs) for the localization of sources
of broadband acoustic radiated noise (such as motor vessels) in shallow water
multipath environments. It is shown that CNNs operating on cepstrogram and
generalized cross-correlogram inputs are able to more reliably estimate the
instantaneous range and bearing of transiting motor vessels when the source
localization performance of conventional passive ranging methods is degraded.
The ensuing improvement in source localization performance is demonstrated
using real data collected during an at-sea experiment.Comment: 5 pages, 5 figures, Final draft of paper submitted to 2018 IEEE
International Conference on Acoustics, Speech and Signal Processing (ICASSP)
15-20 April 2018 in Calgary, Alberta, Canada. arXiv admin note: text overlap
with arXiv:1612.0350
On the decorrelation of filter-bank energies in speech recognition
Cepstral coefficients are widely used in speech recognition. In this paper, we claim that they are not the best way of representing the spectral envelope, at least for some usual speech recognition systems. In fact, cepstrum has several disadvantages: poor physical meaning, need of transformation, and low capacity of adaptation to some recognition systems. In this paper, we propose a new representation that significantly outperforms both mel-cepstrum and LPC-cepstrum techniques in both recognition rate and computational cost. It consists of filtering the frequency sequence of filter-bank energies with an extremely simple filter that equalizes the variance of the cepstral coefficients. Excellent results of the new technique using a continuous observation density HMM recognition system and two very different recognition tasks, connected digits and phone recognition, are presented.Peer ReviewedPostprint (published version
Reducing Audible Spectral Discontinuities
In this paper, a common problem in diphone synthesis is discussed, viz., the occurrence of audible discontinuities at diphone boundaries. Informal observations show that spectral mismatch is most likely the cause of this phenomenon.We first set out to find an objective spectral measure for discontinuity. To this end, several spectral distance measures are related to the results of a listening experiment. Then, we studied the feasibility of extending the diphone database with context-sensitive diphones to reduce the occurrence of audible discontinuities. The number of additional diphones is limited by clustering consonant contexts that have a similar effect on the surrounding vowels on the basis of the best performing distance measure. A listening experiment has shown that the addition of these context-sensitive diphones significantly reduces the amount of audible discontinuities
Analysis of Speech Recognition Techniques
Speech recognition has been an intregral part of human life acting as one of the five senses of human body, because of which application developed on the basis of speech recognition has high degree of acceptance. Here in this project we tried to analyse the different steps involved in
artificial speech recognition by man-machine interface. The various steps we followed in speech recognition are feature extraction, distance calculation, dynamic time wrapping. We have tried to find out an approach which is both simple and efficient so that it can be utilised in embedded
systems. After analysing the steps above we realised the process using small programs using MATLAB which is able to do small no. of isolated word recognition
Speaker segmentation and clustering
This survey focuses on two challenging speech processing topics, namely: speaker segmentation and speaker clustering. Speaker segmentation aims at finding speaker change points in an audio stream, whereas speaker clustering aims at grouping speech segments based on speaker characteristics. Model-based, metric-based, and hybrid speaker segmentation algorithms are reviewed. Concerning speaker clustering, deterministic and probabilistic algorithms are examined. A comparative assessment of the reviewed algorithms is undertaken, the algorithm advantages and disadvantages are indicated, insight to the algorithms is offered, and deductions as well as recommendations are given. Rich transcription and movie analysis are candidate applications that benefit from combined speaker segmentation and clustering. © 2007 Elsevier B.V. All rights reserved
- …