1,544 research outputs found
An evaluation of intrusive instrumental intelligibility metrics
Instrumental intelligibility metrics are commonly used as an alternative to
listening tests. This paper evaluates 12 monaural intrusive intelligibility
metrics: SII, HEGP, CSII, HASPI, NCM, QSTI, STOI, ESTOI, MIKNN, SIMI, SIIB, and
. In addition, this paper investigates the ability of
intelligibility metrics to generalize to new types of distortions and analyzes
why the top performing metrics have high performance. The intelligibility data
were obtained from 11 listening tests described in the literature. The stimuli
included Dutch, Danish, and English speech that was distorted by additive
noise, reverberation, competing talkers, pre-processing enhancement, and
post-processing enhancement. SIIB and HASPI had the highest performance
achieving a correlation with listening test scores on average of
and , respectively. The high performance of SIIB may, in part, be
the result of SIIBs developers having access to all the intelligibility data
considered in the evaluation. The results show that intelligibility metrics
tend to perform poorly on data sets that were not used during their
development. By modifying the original implementations of SIIB and STOI, the
advantage of reducing statistical dependencies between input features is
demonstrated. Additionally, the paper presents a new version of SIIB called
, which has similar performance to SIIB and HASPI,
but takes less time to compute by two orders of magnitude.Comment: Published in IEEE/ACM Transactions on Audio, Speech, and Language
Processing, 201
The Bayesian Analysis of Complex, High-Dimensional Models: Can It Be CODA?
We consider the Bayesian analysis of a few complex, high-dimensional models
and show that intuitive priors, which are not tailored to the fine details of
the model and the estimated parameters, produce estimators which perform poorly
in situations in which good, simple frequentist estimators exist. The models we
consider are: stratified sampling, the partial linear model, linear and
quadratic functionals of white noise and estimation with stopping times. We
present a strong version of Doob's consistency theorem which demonstrates that
the existence of a uniformly -consistent estimator ensures that the
Bayes posterior is -consistent for values of the parameter in subsets
of prior probability 1. We also demonstrate that it is, at least, in principle,
possible to construct Bayes priors giving both global and local minimax rates,
using a suitable combination of loss functions. We argue that there is no
contradiction in these apparently conflicting findings.Comment: Published in at http://dx.doi.org/10.1214/14-STS483 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Event Weighted Tests for Detecting Periodicity in Photon Arrival Times
This paper treats the problem of detecting periodicity in a sequence of
photon arrival times, which occurs, for example, in attempting to detect
gamma-ray pulsars. A particular focus is on how auxiliary information,
typically source intensity, background intensity, and incidence angles and
energies associated with each photon arrival should be used to maximize the
detection power. We construct a class of likelihood-based tests, score tests,
which give rise to event weighting in a principled and natural way, and derive
expressions quantifying the power of the tests. These results can be used to
compare the efficacies of different weight functions, including cuts in energy
and incidence angle. The test is targeted toward a template for the periodic
lightcurve, and we quantify how deviation from that template affects the power
of detection
Wavenet based low rate speech coding
Traditional parametric coding of speech facilitates low rate but provides
poor reconstruction quality because of the inadequacy of the model used. We
describe how a WaveNet generative speech model can be used to generate high
quality speech from the bit stream of a standard parametric coder operating at
2.4 kb/s. We compare this parametric coder with a waveform coder based on the
same generative model and show that approximating the signal waveform incurs a
large rate penalty. Our experiments confirm the high performance of the WaveNet
based coder and show that the speech produced by the system is able to
additionally perform implicit bandwidth extension and does not significantly
impair recognition of the original speaker for the human listener, even when
that speaker has not been used during the training of the generative model.Comment: 5 pages, 2 figure
- …