7,529 research outputs found

    Video streaming

    Get PDF

    An evaluation of intrusive instrumental intelligibility metrics

    Full text link
    Instrumental intelligibility metrics are commonly used as an alternative to listening tests. This paper evaluates 12 monaural intrusive intelligibility metrics: SII, HEGP, CSII, HASPI, NCM, QSTI, STOI, ESTOI, MIKNN, SIMI, SIIB, and sEPSMcorr\text{sEPSM}^\text{corr}. In addition, this paper investigates the ability of intelligibility metrics to generalize to new types of distortions and analyzes why the top performing metrics have high performance. The intelligibility data were obtained from 11 listening tests described in the literature. The stimuli included Dutch, Danish, and English speech that was distorted by additive noise, reverberation, competing talkers, pre-processing enhancement, and post-processing enhancement. SIIB and HASPI had the highest performance achieving a correlation with listening test scores on average of ρ=0.92\rho=0.92 and ρ=0.89\rho=0.89, respectively. The high performance of SIIB may, in part, be the result of SIIBs developers having access to all the intelligibility data considered in the evaluation. The results show that intelligibility metrics tend to perform poorly on data sets that were not used during their development. By modifying the original implementations of SIIB and STOI, the advantage of reducing statistical dependencies between input features is demonstrated. Additionally, the paper presents a new version of SIIB called SIIBGauss\text{SIIB}^\text{Gauss}, which has similar performance to SIIB and HASPI, but takes less time to compute by two orders of magnitude.Comment: Published in IEEE/ACM Transactions on Audio, Speech, and Language Processing, 201

    Objective measures for predicting the intelligibility of spectrally smoothed speech with artificial excitation

    Get PDF
    A study is presented on how well objective measures of speech quality and intelligibility can predict the subjective in- telligibility of speech that has undergone spectral envelope smoothing and simplification of its excitation. Speech modi- fications are made by resynthesising speech that has been spec- trally smoothed. Objective measures are applied to the mod- ified speech and include measures of speech quality, signal- to-noise ratio and intelligibility, as well as proposing the nor- malised frequency-weighted spectral distortion (NFD) measure. The measures are compared to subjective intelligibility scores where it is found that several have high correlation (|r| ≥ 0.7), with NFD achieving the highest correlation (r = −0.81

    Perceptually-Driven Video Coding with the Daala Video Codec

    Full text link
    The Daala project is a royalty-free video codec that attempts to compete with the best patent-encumbered codecs. Part of our strategy is to replace core tools of traditional video codecs with alternative approaches, many of them designed to take perceptual aspects into account, rather than optimizing for simple metrics like PSNR. This paper documents some of our experiences with these tools, which ones worked and which did not. We evaluate which tools are easy to integrate into a more traditional codec design, and show results in the context of the codec being developed by the Alliance for Open Media.Comment: 19 pages, Proceedings of SPIE Workshop on Applications of Digital Image Processing (ADIP), 201

    Perfectly Secure Steganography: Capacity, Error Exponents, and Code Constructions

    Full text link
    An analysis of steganographic systems subject to the following perfect undetectability condition is presented in this paper. Following embedding of the message into the covertext, the resulting stegotext is required to have exactly the same probability distribution as the covertext. Then no statistical test can reliably detect the presence of the hidden message. We refer to such steganographic schemes as perfectly secure. A few such schemes have been proposed in recent literature, but they have vanishing rate. We prove that communication performance can potentially be vastly improved; specifically, our basic setup assumes independently and identically distributed (i.i.d.) covertext, and we construct perfectly secure steganographic codes from public watermarking codes using binning methods and randomized permutations of the code. The permutation is a secret key shared between encoder and decoder. We derive (positive) capacity and random-coding exponents for perfectly-secure steganographic systems. The error exponents provide estimates of the code length required to achieve a target low error probability. We address the potential loss in communication performance due to the perfect-security requirement. This loss is the same as the loss obtained under a weaker order-1 steganographic requirement that would just require matching of first-order marginals of the covertext and stegotext distributions. Furthermore, no loss occurs if the covertext distribution is uniform and the distortion metric is cyclically symmetric; steganographic capacity is then achieved by randomized linear codes. Our framework may also be useful for developing computationally secure steganographic systems that have near-optimal communication performance.Comment: To appear in IEEE Trans. on Information Theory, June 2008; ignore Version 2 as the file was corrupte

    Perfectly Secure Steganography: Capacity, Error Exponents, and Code Constructions

    Full text link
    An analysis of steganographic systems subject to the following perfect undetectability condition is presented in this paper. Following embedding of the message into the covertext, the resulting stegotext is required to have exactly the same probability distribution as the covertext. Then no statistical test can reliably detect the presence of the hidden message. We refer to such steganographic schemes as perfectly secure. A few such schemes have been proposed in recent literature, but they have vanishing rate. We prove that communication performance can potentially be vastly improved; specifically, our basic setup assumes independently and identically distributed (i.i.d.) covertext, and we construct perfectly secure steganographic codes from public watermarking codes using binning methods and randomized permutations of the code. The permutation is a secret key shared between encoder and decoder. We derive (positive) capacity and random-coding exponents for perfectly-secure steganographic systems. The error exponents provide estimates of the code length required to achieve a target low error probability. We address the potential loss in communication performance due to the perfect-security requirement. This loss is the same as the loss obtained under a weaker order-1 steganographic requirement that would just require matching of first-order marginals of the covertext and stegotext distributions. Furthermore, no loss occurs if the covertext distribution is uniform and the distortion metric is cyclically symmetric; steganographic capacity is then achieved by randomized linear codes. Our framework may also be useful for developing computationally secure steganographic systems that have near-optimal communication performance.Comment: To appear in IEEE Trans. on Information Theory, June 2008; ignore Version 2 as the file was corrupte

    Deep Denoising for Hearing Aid Applications

    Full text link
    Reduction of unwanted environmental noises is an important feature of today's hearing aids (HA), which is why noise reduction is nowadays included in almost every commercially available device. The majority of these algorithms, however, is restricted to the reduction of stationary noises. In this work, we propose a denoising approach based on a three hidden layer fully connected deep learning network that aims to predict a Wiener filtering gain with an asymmetric input context, enabling real-time applications with high constraints on signal delay. The approach is employing a hearing instrument-grade filter bank and complies with typical hearing aid demands, such as low latency and on-line processing. It can further be well integrated with other algorithms in an existing HA signal processing chain. We can show on a database of real world noise signals that our algorithm is able to outperform a state of the art baseline approach, both using objective metrics and subject tests.Comment: submitted to IWAENC 201
    corecore