7,759 research outputs found

    DeepVoCoder: A CNN model for compression and coding of narrow band speech

    Get PDF
    This paper proposes a convolutional neural network (CNN)-based encoder model to compress and code speech signal directly from raw input speech. Although the model can synthesize wideband speech by implicit bandwidth extension, narrowband is preferred for IP telephony and telecommunications purposes. The model takes time domain speech samples as inputs and encodes them using a cascade of convolutional filters in multiple layers, where pooling is applied after some layers to downsample the encoded speech by half. The final bottleneck layer of the CNN encoder provides an abstract and compact representation of the speech signal. In this paper, it is demonstrated that this compact representation is sufficient to reconstruct the original speech signal in high quality using the CNN decoder. This paper also discusses the theoretical background of why and how CNN may be used for end-to-end speech compression and coding. The complexity, delay, memory requirements, and bit rate versus quality are discussed in the experimental results.Web of Science7750897508

    Frame Theory for Signal Processing in Psychoacoustics

    Full text link
    This review chapter aims to strengthen the link between frame theory and signal processing tasks in psychoacoustics. On the one side, the basic concepts of frame theory are presented and some proofs are provided to explain those concepts in some detail. The goal is to reveal to hearing scientists how this mathematical theory could be relevant for their research. In particular, we focus on frame theory in a filter bank approach, which is probably the most relevant view-point for audio signal processing. On the other side, basic psychoacoustic concepts are presented to stimulate mathematicians to apply their knowledge in this field

    Level discrimination of speech sounds by hearing-impaired individuals with and without hearing amplification

    Get PDF
    Objectives: The current study was designed to see how hearing-impaired individuals judge level differences between speech sounds with and without hearing amplification. It was hypothesized that hearing aid compression should adversely affect the user's ability to judge level differences. Design: Thirty-eight hearing-impaired participants performed an adaptive tracking procedure to determine their level-discrimination thresholds for different word and sentence tokens, as well as speech-spectrum noise, with and without their hearing aids. Eight normal-hearing participants performed the same task for comparison. Results: Level discrimination for different word and sentence tokens was more difficult than the discrimination of stationary noises. Word level discrimination was significantly more difficult than sentence level discrimination. There were no significant differences, however, between mean performance with and without hearing aids and no correlations between performance and various hearing aid measurements. Conclusions: There is a clear difficulty in judging the level differences between words or sentences relative to differences between broadband noises, but this difficulty was found for both hearing-impaired and normal-hearing individuals and had no relation to hearing aid compression measures. The lack of a clear adverse effect of hearing aid compression on level discrimination is suggested to be due to the low effective compression ratios of currently fit hearing aids

    Adaptive Variable Degree-k Zero-Trees for Re-Encoding of Perceptually Quantized Wavelet-Packet Transformed Audio and High Quality Speech

    Full text link
    A fast, efficient and scalable algorithm is proposed, in this paper, for re-encoding of perceptually quantized wavelet-packet transform (WPT) coefficients of audio and high quality speech and is called "adaptive variable degree-k zero-trees" (AVDZ). The quantization process is carried out by taking into account some basic perceptual considerations, and achieves good subjective quality with low complexity. The performance of the proposed AVDZ algorithm is compared with two other zero-tree-based schemes comprising: 1- Embedded Zero-tree Wavelet (EZW) and 2- The set partitioning in hierarchical trees (SPIHT). Since EZW and SPIHT are designed for image compression, some modifications are incorporated in these schemes for their better matching to audio signals. It is shown that the proposed modifications can improve their performance by about 15-25%. Furthermore, it is concluded that the proposed AVDZ algorithm outperforms these modified versions in terms of both output average bit-rates and computation times.Comment: 30 pages (Double space), 15 figures, 5 tables, ISRN Signal Processing (in Press

    Electroacoustic and Behavioural Evaluation of Hearing Aid Digital Signal Processing Features

    Get PDF
    Modern digital hearing aids provide an array of features to improve the user listening experience. As the features become more advanced and interdependent, it becomes increasingly necessary to develop accurate and cost-effective methods to evaluate their performance. Subjective experiments are an accurate method to determine hearing aid performance but they come with a high monetary and time cost. Four studies that develop and evaluate electroacoustic hearing aid feature evaluation techniques are presented. The first study applies a recent speech quality metric to two bilateral wireless hearing aids with various features enabled in a variety of environmental conditions. The study shows that accurate speech quality predictions are made with a reduced version of the original metric, and that a portion of the original metric does not perform well when applied to a novel subjective speech quality rating database. The second study presents a reference free (non-intrusive) electroacoustic speech quality metric developed specifically for hearing aid applications and compares its performance to a recent intrusive metric. The non-intrusive metric offers the advantage of eliminating the need for a shaped reference signal and can be used in real time applications but requires a sacrifice in prediction accuracy. The third study investigates the digital noise reduction performance of seven recent hearing aid models. An electroacoustic measurement system is presented that allows the noise and speech signals to be separated from hearing aid recordings. It is shown how this can be used to investigate digital noise reduction performance through the application of speech quality and speech intelligibility measures. It is also shown how the system can be used to quantify digital noise reduction attack times. The fourth study presents a turntable-based system to investigate hearing aid directionality performance. Two methods to extract the signal of interest are described. Polar plots are presented for a number of hearing aid models from recordings generated in both the free-field and from a head-and-torso simulator. It is expected that the proposed electroacoustic techniques will assist Audiologists and hearing researchers in choosing, benchmarking, and fine-tuning hearing aid features

    Band-pass filtering of the time sequences of spectral parameters for robust wireless speech recognition

    Get PDF
    In this paper we address the problem of automatic speech recognition when wireless speech communication systems are involved. In this context, three main sources of distortion should be considered: acoustic environment, speech coding and transmission errors. Whilst the first one has already received a lot of attention, the last two deserve further investigation in our opinion. We have found out that band-pass filtering of the recognition features improves ASR performance when distortions due to these particular communication systems are present. Furthermore, we have evaluated two alternative configurations at different bit error rates (BER) typical of these channels: band-pass filtering the LP-MFCC parameters or a modification of the RASTA-PLP using a sharper low-pass section perform consistently better than LP-MFCC and RASTA-PLP, respectively.Publicad
    • 

    corecore