24 research outputs found

    Spectral Entropy Based Feature for Robust ASR

    Get PDF
    In general, entropy gives us a measure of the number of bits required to represent some information. When applied to probability mass function (PMF), entropy can also be used to measure the ``peakiness'' of a distribution. In this paper, we propose using the entropy of short time Fourier transform spectrum, normalised as PMF, as an additional feature for automatic speech recognition (ASR). It is indeed expected that a peaky spectrum, representation of clear formant structure in the case of voiced sounds, will have low entropy, while a flatter spectrum corresponding to non-speech or noisy regions will have higher entropy. Extending this reasoning further, we introduce the idea of multi-band/multi-resolution entropy feature where we divide the spectrum into equal size sub-bands and compute entropy in each sub-band. The results presented in this paper show that multi-band entropy features used in conjunction with normal cepstral features improve the performance of ASR system

    Multi-resolution Spectral Entropy Based Feature for Robust ASR

    Get PDF
    Recently, entropy measures at different stages of recognition have been used in automatic speech recognition (ASR) task. In a recent paper, we proposed that formant positions of a spectrum can be captured by multi-resolution spectral entropy feature. In this paper, we suggest modifications to the spectral entropy feature extraction approach and compute entropy contribution from each sub-band to the total entropy of the normalized spectrum. Further, we explore the ideas of overlapping sub-bands and the time derivatives of the spectral entropy feature. The modified feature is robust to additive wide-band noise and performs well at low SNRs. In the last, in the frame work of TANDEM, we show that the system using combined entropy and PLP features works better than the baseline PLP feature for additive wide-band noise at different SNRs

    UPM-UC3M system for music and speech segmentation

    Get PDF
    This paper describes the UPM-UC3M system for the Albayzín evaluation 2010 on Audio Segmentation. This evaluation task consists of segmenting a broadcast news audio document into clean speech, music, speech with noise in background and speech with music in background. The UPM-UC3M system is based on Hidden Markov Models (HMMs), including a 3-state HMM for every acoustic class. The number of states and the number of Gaussian per state have been tuned for this evaluation. The main analysis during system development has been focused on feature selection. Also, two different architectures have been tested: the first one corresponds to an one-step system whereas the second one is a hierarchical system in which different features have been used for segmenting the different audio classes. For both systems, we have considered long term statistics of MFCC (Mel Frequency Ceptral Coefficients), spectral entropy and CHROMA coefficients. For the best configuration of the one-step system, we have obtained a 25.3% average error rate and 18.7% diarization error (using the NIST tool) and a 23.9% average error rate and 17.9% diarization error for the hierarchical one

    Spectral Non-Stationarity in Road Vehicle Vibrations

    Get PDF
    Road-induced vibrations are in the scope of various environmental testing protocols, e.g., for packaging vibration testing (PVT) purposes. This field matures with well-understood methods for analyzing amplitude-type non-stationarity (NS) in road vehicle vibrations (RVV). Albeit frequency-type NS is well known, only suggestions are provided for processing the phenomenon in PVT. Both types of NS can be jointly investigated in the time-frequency domain; thus, the current study initiates the investigation of spectral non-stationarities (SNS) in RVV. Three vibration series were recorded from 118 km traveled distance supplying an empirical insight

    New experiments on speaker diarization for unsupervised speaking style voice building for speech synthesis

    Full text link
    El uso universal de síntesis de voz en diferentes aplicaciones requeriría un desarrollo sencillo de las nuevas voces con poca intervención manual. Teniendo en cuenta la cantidad de datos multimedia disponibles en Internet y los medios de comunicación, un objetivo interesante es el desarrollo de herramientas y métodos para construir automáticamente las voces de estilo de varios de ellos. En un trabajo anterior se esbozó una metodología para la construcción de este tipo de herramientas, y se presentaron experimentos preliminares con una base de datos multiestilo. En este artículo investigamos más a fondo esta tarea y proponemos varias mejoras basadas en la selección del número apropiado de hablantes iniciales, el uso o no de filtros de reducción de ruido, el uso de la F0 y el uso de un algoritmo de detección de música. Hemos demostrado que el mejor sistema usando un algoritmo de detección de música disminuye el error de precisión 22,36% relativo para el conjunto de desarrollo y 39,64% relativo para el montaje de ensayo en comparación con el sistema base, sin degradar el factor de mérito. La precisión media para el conjunto de prueba es 90.62% desde 76.18% para los reportajes de 99,93% para los informes meteorológicos

    Fingerprinting Smart Devices Through Embedded Acoustic Components

    Full text link
    The widespread use of smart devices gives rise to both security and privacy concerns. Fingerprinting smart devices can assist in authenticating physical devices, but it can also jeopardize privacy by allowing remote identification without user awareness. We propose a novel fingerprinting approach that uses the microphones and speakers of smart phones to uniquely identify an individual device. During fabrication, subtle imperfections arise in device microphones and speakers which induce anomalies in produced and received sounds. We exploit this observation to fingerprint smart devices through playback and recording of audio samples. We use audio-metric tools to analyze and explore different acoustic features and analyze their ability to successfully fingerprint smart devices. Our experiments show that it is even possible to fingerprint devices that have the same vendor and model; we were able to accurately distinguish over 93% of all recorded audio clips from 15 different units of the same model. Our study identifies the prominent acoustic features capable of fingerprinting devices with high success rate and examines the effect of background noise and other variables on fingerprinting accuracy

    Spectral Entropy Feature in Full-Combination Multi-stream for Robust ASR

    Get PDF
    In a recent paper, we reported promising automatic speech recognition results obtained by appending spectral entropy features to PLP features. In the present paper, spectral entropy features are used along with PLP features in the framework of multi-stream combination. In a full-combination multi-stream hidden Markov model/artificial neural network (HMM/ANN) hybrid system, we train a separate multi-layered perceptron (MLP) for PLP features, for spectral entropy features and for both combined by concatenation. The output posteriors from these three MLPs are combined with weights inversely proportional to the entropies of their respective posterior distributions. We show that on the Numbers95 database, this approach yields a significant improvement under both clean and noisy conditions as compared to simply appending the features. Further, in the framework of a Tandem HMM/ANN system, we apply the same inverse entropy weighting to combine the outputs of the MLPs before the softmax non-linearity. Feeding the combined and decorrelated MLP outputs to the HMM gives a 9.2\% relative error reduction as compared to the baseline
    corecore