Search CORE

7,759 research outputs found

DeepVoCoder: A CNN model for compression and coding of narrow band speech

Author: Ilk Hakki Gokhan
Keles Hacer Yalim
Rozhon Jan
Vozňák Miroslav
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

This paper proposes a convolutional neural network (CNN)-based encoder model to compress and code speech signal directly from raw input speech. Although the model can synthesize wideband speech by implicit bandwidth extension, narrowband is preferred for IP telephony and telecommunications purposes. The model takes time domain speech samples as inputs and encodes them using a cascade of convolutional filters in multiple layers, where pooling is applied after some layers to downsample the encoded speech by half. The final bottleneck layer of the CNN encoder provides an abstract and compact representation of the speech signal. In this paper, it is demonstrated that this compact representation is sufficient to reconstruct the original speech signal in high quality using the CNN decoder. This paper also discusses the theoretical background of why and how CNN may be used for end-to-end speech compression and coding. The complexity, delay, memory requirements, and bit rate versus quality are discussed in the experimental results.Web of Science7750897508

DSpace at VSB Technical University of Ostrava

Frame Theory for Signal Processing in Psychoacoustics

Author: A. Bregman
A. Janssen
A. Ron
A.V. Oppenheim
A.V. Oppenheim
B. Laback
B. Laback
B.C.J. Moore
B.C.J. Moore
B.R. Glasberg
C. Heil
C. Heil
C. Wiesmeyr
C.J. Plack
D. Soderquist
D. Wang
D.D. Greenwood
D.T. Stoeva
D.T. Stoeva
E. Hernández
E. Ravelli
E. Zwicker
E. Zwicker
E.A. Lopez-Poveda
G. Chardon
G. Kidd Jr
G. Matz
H. Bölcskei
H. Fastl
I. Daubechies
J. Kovačević
J. Leng
J.J. Benedetto
J.J. O’Donovan
J.S. Garofolo
K. Gröchenig
L. Chai
L.N. Trefethen
M. Bownik
M. Bézat
M. Elad
M. Unoki
M. Vetterli
N. Holighaus
N. Holighaus
N. Perraudin
N.K. Bari
O. Christensen
O. Christensen
O. Christensen
P. Balazs
P. Balazs
P. Balazs
P. Balazs
P. Balazs
P. Balazs
P. Balazs
P. Balazs
P. Casazza
P. Søndergaard
P. Vaidyanathan
P.G. Casazza
P.G. Casazza
R.D. Patterson
R.J. Duffin
R.M. Young
S. Strahl
T. Irino
T. Painter
T. Werther
T.S. Gunawan
W. Jesteadt
X. Valero
X. Zhao
Z. Cvetković
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 03/11/2016
Field of study

This review chapter aims to strengthen the link between frame theory and signal processing tasks in psychoacoustics. On the one side, the basic concepts of frame theory are presented and some proofs are provided to explain those concepts in some detail. The goal is to reveal to hearing scientists how this mathematical theory could be relevant for their research. In particular, we focus on frame theory in a filter bank approach, which is probably the most relevant view-point for audio signal processing. On the other side, basic psychoacoustic concepts are presented to stimulate mathematicians to apply their knowledge in this field

arXiv.org e-Print Archive

Crossref

Level discrimination of speech sounds by hearing-impaired individuals with and without hearing amplification

Author: Akeroyd Michael A.
Whitmer William M.
Publication venue: 'Ovid Technologies (Wolters Kluwer Health)'
Publication date: 01/01/2011
Field of study

Objectives: The current study was designed to see how hearing-impaired individuals judge level differences between speech sounds with and without hearing amplification. It was hypothesized that hearing aid compression should adversely affect the user's ability to judge level differences. Design: Thirty-eight hearing-impaired participants performed an adaptive tracking procedure to determine their level-discrimination thresholds for different word and sentence tokens, as well as speech-spectrum noise, with and without their hearing aids. Eight normal-hearing participants performed the same task for comparison. Results: Level discrimination for different word and sentence tokens was more difficult than the discrimination of stationary noises. Word level discrimination was significantly more difficult than sentence level discrimination. There were no significant differences, however, between mean performance with and without hearing aids and no correlations between performance and various hearing aid measurements. Conclusions: There is a clear difficulty in judging the level differences between words or sentences relative to differences between broadband noises, but this difficulty was found for both hearing-impaired and normal-hearing individuals and had no relation to hearing aid compression measures. The lack of a clear adverse effect of hearing aid compression on level discrimination is suggested to be due to the low effective compression ratios of currently fit hearing aids

PubMed Central

Enlighten

Adaptive Variable Degree-k Zero-Trees for Re-Encoding of Perceptually Quantized Wavelet-Packet Transformed Audio and High Quality Speech

Author: Ghahabi Omid
Savoji Mohammad H.
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2011
Field of study

A fast, efficient and scalable algorithm is proposed, in this paper, for re-encoding of perceptually quantized wavelet-packet transform (WPT) coefficients of audio and high quality speech and is called "adaptive variable degree-k zero-trees" (AVDZ). The quantization process is carried out by taking into account some basic perceptual considerations, and achieves good subjective quality with low complexity. The performance of the proposed AVDZ algorithm is compared with two other zero-tree-based schemes comprising: 1- Embedded Zero-tree Wavelet (EZW) and 2- The set partitioning in hierarchical trees (SPIHT). Since EZW and SPIHT are designed for image compression, some modifications are incorporated in these schemes for their better matching to audio signals. It is shown that the proposed modifications can improve their performance by about 15-25%. Furthermore, it is concluded that the proposed AVDZ algorithm outperforms these modified versions in terms of both output average bit-rates and computation times.Comment: 30 pages (Double space), 15 figures, 5 tables, ISRN Signal Processing (in Press

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

Electroacoustic and Behavioural Evaluation of Hearing Aid Digital Signal Processing Features

Author: Suelzle David J O
Publication venue: Scholarship@Western
Publication date: 19/04/2013
Field of study

Modern digital hearing aids provide an array of features to improve the user listening experience. As the features become more advanced and interdependent, it becomes increasingly necessary to develop accurate and cost-effective methods to evaluate their performance. Subjective experiments are an accurate method to determine hearing aid performance but they come with a high monetary and time cost. Four studies that develop and evaluate electroacoustic hearing aid feature evaluation techniques are presented. The first study applies a recent speech quality metric to two bilateral wireless hearing aids with various features enabled in a variety of environmental conditions. The study shows that accurate speech quality predictions are made with a reduced version of the original metric, and that a portion of the original metric does not perform well when applied to a novel subjective speech quality rating database. The second study presents a reference free (non-intrusive) electroacoustic speech quality metric developed specifically for hearing aid applications and compares its performance to a recent intrusive metric. The non-intrusive metric offers the advantage of eliminating the need for a shaped reference signal and can be used in real time applications but requires a sacrifice in prediction accuracy. The third study investigates the digital noise reduction performance of seven recent hearing aid models. An electroacoustic measurement system is presented that allows the noise and speech signals to be separated from hearing aid recordings. It is shown how this can be used to investigate digital noise reduction performance through the application of speech quality and speech intelligibility measures. It is also shown how the system can be used to quantify digital noise reduction attack times. The fourth study presents a turntable-based system to investigate hearing aid directionality performance. Two methods to extract the signal of interest are described. Polar plots are presented for a number of hearing aid models from recordings generated in both the free-field and from a head-and-torso simulator. It is expected that the proposed electroacoustic techniques will assist Audiologists and hearing researchers in choosing, benchmarking, and fine-tuning hearing aid features

Scholarship@Western

Band-pass filtering of the time sequences of spectral parameters for robust wireless speech recognition

Author: Díaz de María Fernando
Gallardo Antolín Ascensión
Peláez Moreno Carmen
Vicente Peña Jesús de
Publication venue: 'Elsevier BV'
Publication date: 01/01/2006
Field of study

In this paper we address the problem of automatic speech recognition when wireless speech communication systems are involved. In this context, three main sources of distortion should be considered: acoustic environment, speech coding and transmission errors. Whilst the first one has already received a lot of attention, the last two deserve further investigation in our opinion. We have found out that band-pass filtering of the recognition features improves ASR performance when distortions due to these particular communication systems are present. Furthermore, we have evaluated two alternative configurations at different bit error rates (BER) typical of these channels: band-pass filtering the LP-MFCC parameters or a modification of the RASTA-PLP using a sharper low-pass section perform consistently better than LP-MFCC and RASTA-PLP, respectively.Publicad

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Universidad Carlos III de Madrid e-Archivo