7,529 research outputs found
An evaluation of intrusive instrumental intelligibility metrics
Instrumental intelligibility metrics are commonly used as an alternative to
listening tests. This paper evaluates 12 monaural intrusive intelligibility
metrics: SII, HEGP, CSII, HASPI, NCM, QSTI, STOI, ESTOI, MIKNN, SIMI, SIIB, and
. In addition, this paper investigates the ability of
intelligibility metrics to generalize to new types of distortions and analyzes
why the top performing metrics have high performance. The intelligibility data
were obtained from 11 listening tests described in the literature. The stimuli
included Dutch, Danish, and English speech that was distorted by additive
noise, reverberation, competing talkers, pre-processing enhancement, and
post-processing enhancement. SIIB and HASPI had the highest performance
achieving a correlation with listening test scores on average of
and , respectively. The high performance of SIIB may, in part, be
the result of SIIBs developers having access to all the intelligibility data
considered in the evaluation. The results show that intelligibility metrics
tend to perform poorly on data sets that were not used during their
development. By modifying the original implementations of SIIB and STOI, the
advantage of reducing statistical dependencies between input features is
demonstrated. Additionally, the paper presents a new version of SIIB called
, which has similar performance to SIIB and HASPI,
but takes less time to compute by two orders of magnitude.Comment: Published in IEEE/ACM Transactions on Audio, Speech, and Language
Processing, 201
Objective measures for predicting the intelligibility of spectrally smoothed speech with artificial excitation
A study is presented on how well objective measures of speech quality and intelligibility can predict the subjective in- telligibility of speech that has undergone spectral envelope smoothing and simplification of its excitation. Speech modi- fications are made by resynthesising speech that has been spec- trally smoothed. Objective measures are applied to the mod- ified speech and include measures of speech quality, signal- to-noise ratio and intelligibility, as well as proposing the nor- malised frequency-weighted spectral distortion (NFD) measure. The measures are compared to subjective intelligibility scores where it is found that several have high correlation (|r| ≥ 0.7), with NFD achieving the highest correlation (r = −0.81
Perceptually-Driven Video Coding with the Daala Video Codec
The Daala project is a royalty-free video codec that attempts to compete with
the best patent-encumbered codecs. Part of our strategy is to replace core
tools of traditional video codecs with alternative approaches, many of them
designed to take perceptual aspects into account, rather than optimizing for
simple metrics like PSNR. This paper documents some of our experiences with
these tools, which ones worked and which did not. We evaluate which tools are
easy to integrate into a more traditional codec design, and show results in the
context of the codec being developed by the Alliance for Open Media.Comment: 19 pages, Proceedings of SPIE Workshop on Applications of Digital
Image Processing (ADIP), 201
Perfectly Secure Steganography: Capacity, Error Exponents, and Code Constructions
An analysis of steganographic systems subject to the following perfect
undetectability condition is presented in this paper. Following embedding of
the message into the covertext, the resulting stegotext is required to have
exactly the same probability distribution as the covertext. Then no statistical
test can reliably detect the presence of the hidden message. We refer to such
steganographic schemes as perfectly secure. A few such schemes have been
proposed in recent literature, but they have vanishing rate. We prove that
communication performance can potentially be vastly improved; specifically, our
basic setup assumes independently and identically distributed (i.i.d.)
covertext, and we construct perfectly secure steganographic codes from public
watermarking codes using binning methods and randomized permutations of the
code. The permutation is a secret key shared between encoder and decoder. We
derive (positive) capacity and random-coding exponents for perfectly-secure
steganographic systems. The error exponents provide estimates of the code
length required to achieve a target low error probability. We address the
potential loss in communication performance due to the perfect-security
requirement. This loss is the same as the loss obtained under a weaker order-1
steganographic requirement that would just require matching of first-order
marginals of the covertext and stegotext distributions. Furthermore, no loss
occurs if the covertext distribution is uniform and the distortion metric is
cyclically symmetric; steganographic capacity is then achieved by randomized
linear codes. Our framework may also be useful for developing computationally
secure steganographic systems that have near-optimal communication performance.Comment: To appear in IEEE Trans. on Information Theory, June 2008; ignore
Version 2 as the file was corrupte
Perfectly Secure Steganography: Capacity, Error Exponents, and Code Constructions
An analysis of steganographic systems subject to the following perfect
undetectability condition is presented in this paper. Following embedding of
the message into the covertext, the resulting stegotext is required to have
exactly the same probability distribution as the covertext. Then no statistical
test can reliably detect the presence of the hidden message. We refer to such
steganographic schemes as perfectly secure. A few such schemes have been
proposed in recent literature, but they have vanishing rate. We prove that
communication performance can potentially be vastly improved; specifically, our
basic setup assumes independently and identically distributed (i.i.d.)
covertext, and we construct perfectly secure steganographic codes from public
watermarking codes using binning methods and randomized permutations of the
code. The permutation is a secret key shared between encoder and decoder. We
derive (positive) capacity and random-coding exponents for perfectly-secure
steganographic systems. The error exponents provide estimates of the code
length required to achieve a target low error probability. We address the
potential loss in communication performance due to the perfect-security
requirement. This loss is the same as the loss obtained under a weaker order-1
steganographic requirement that would just require matching of first-order
marginals of the covertext and stegotext distributions. Furthermore, no loss
occurs if the covertext distribution is uniform and the distortion metric is
cyclically symmetric; steganographic capacity is then achieved by randomized
linear codes. Our framework may also be useful for developing computationally
secure steganographic systems that have near-optimal communication performance.Comment: To appear in IEEE Trans. on Information Theory, June 2008; ignore
Version 2 as the file was corrupte
Deep Denoising for Hearing Aid Applications
Reduction of unwanted environmental noises is an important feature of today's
hearing aids (HA), which is why noise reduction is nowadays included in almost
every commercially available device. The majority of these algorithms, however,
is restricted to the reduction of stationary noises. In this work, we propose a
denoising approach based on a three hidden layer fully connected deep learning
network that aims to predict a Wiener filtering gain with an asymmetric input
context, enabling real-time applications with high constraints on signal delay.
The approach is employing a hearing instrument-grade filter bank and complies
with typical hearing aid demands, such as low latency and on-line processing.
It can further be well integrated with other algorithms in an existing HA
signal processing chain. We can show on a database of real world noise signals
that our algorithm is able to outperform a state of the art baseline approach,
both using objective metrics and subject tests.Comment: submitted to IWAENC 201
- …