137 research outputs found
Effects of audio compression in automatic detection of voice pathologies
This paper investigates the performance of an automatic system for voice pathology detection when the voice samples have been compressed in MP3 format and different binary rates (160, 96, 64, 48, 24, and 8 kb/s). The detectors employ cepstral and noise measurements, along with their derivatives, to characterize the voice signals. The classification is performed using Gaussian mixtures models and support vector machines. The results between the different proposed detectors are compared by means of detector error tradeoff (DET) and receiver operating characteristic (ROC) curves, concluding that there are no significant differences in the performance of the detector when the binary rates of the compressed data are above 64 kb/s. This has useful applications in telemedicine, reducing the storage space of voice recordings or transmitting them over narrow-band communications channels
A Voice Disease Detection Method Based on MFCCs and Shallow CNN
The incidence rate of voice diseases is increasing year by year. The use of
software for remote diagnosis is a technical development trend and has
important practical value. Among voice diseases, common diseases that cause
hoarseness include spasmodic dysphonia, vocal cord paralysis, vocal nodule, and
vocal cord polyp. This paper presents a voice disease detection method that can
be applied in a wide range of clinical. We cooperated with Xiangya Hospital of
Central South University to collect voice samples from sixty-one different
patients. The Mel Frequency Cepstrum Coefficient (MFCC) parameters are
extracted as input features to describe the voice in the form of data. An
innovative model combining MFCC parameters and single convolution layer CNN is
proposed for fast calculation and classification. The highest accuracy we
achieved was 92%, it is fully ahead of the original research results and
internationally advanced. And we use Advanced Voice Function Assessment
Databases (AVFAD) to evaluate the generalization ability of the method we
proposed, which achieved an accuracy rate of 98%. Experiments on clinical and
standard datasets show that for the pathological detection of voice diseases,
our method has greatly improved in accuracy and computational efficiency
Audio Signal Processing Using Time-Frequency Approaches: Coding, Classification, Fingerprinting, and Watermarking
Audio signals are information rich nonstationary signals that play an important role in our day-to-day communication, perception of environment, and entertainment. Due to its non-stationary nature, time- or frequency-only approaches are inadequate in analyzing these signals. A joint time-frequency (TF) approach would be a better choice to efficiently process these signals. In this digital era, compression, intelligent indexing for content-based retrieval, classification, and protection of digital audio content are few of the areas that encapsulate a majority of the audio signal processing applications. In this paper, we present a comprehensive array of TF methodologies that successfully address applications in all of the above mentioned areas. A TF-based audio coding scheme with novel psychoacoustics model, music classification, audio classification of environmental sounds, audio fingerprinting, and audio watermarking will be presented to demonstrate the advantages of using time-frequency approaches in analyzing and extracting information from audio signals.</p
An Automatic Digital Audio Authentication/Forensics System
With the continuous rise in ingenious forgery, a wide range of digital audio authentication applications are emerging as a preventive and detective control in real-world circumstances, such as forged evidence, breach of copyright protection, and unauthorized data access. To investigate and verify, this paper presents a novel automatic authentication system that differentiates between the forged and original audio. The design philosophy of the proposed system is primarily based on three psychoacoustic principles of hearing, which are implemented to simulate the human sound perception system. Moreover, the proposed system is able to classify between the audio of different environments recorded with the same microphone. To authenticate the audio and environment classification, the computed features based on the psychoacoustic principles of hearing are dangled to the Gaussian mixture model to make automatic decisions. It is worth mentioning that the proposed system authenticates an unknown speaker irrespective of the audio content i.e., independent of narrator and text. To evaluate the performance of the proposed system, audios in multi-environments are forged in such a way that a human cannot recognize them. Subjective evaluation by three human evaluators is performed to verify the quality of the generated forged audio. The proposed system provides a classification accuracy of 99.2% ± 2.6. Furthermore, the obtained accuracy for the other scenarios, such as text-dependent and text-independent audio authentication, is 100% by using the proposed system
- …