5,255 research outputs found
On the Computation of the Kullback-Leibler Measure for Spectral Distances
Efficient algorithms for the exact and approximate computation of the symmetrical Kullback-Leibler (1998) measure for spectral distances are presented for linear predictive coding (LPC) spectra. A interpretation of this measure is given in terms of the poles of the spectra. The performances of the algorithms in terms of accuracy and computational complexity are assessed for the application of computing concatenation costs in unit-selection-based speech synthesis. With the same complexity and storage requirements, the exact method is superior in terms of accuracy
A Generative Product-of-Filters Model of Audio
We propose the product-of-filters (PoF) model, a generative model that
decomposes audio spectra as sparse linear combinations of "filters" in the
log-spectral domain. PoF makes similar assumptions to those used in the classic
homomorphic filtering approach to signal processing, but replaces hand-designed
decompositions built of basic signal processing operations with a learned
decomposition based on statistical inference. This paper formulates the PoF
model and derives a mean-field method for posterior inference and a variational
EM algorithm to estimate the model's free parameters. We demonstrate PoF's
potential for audio processing on a bandwidth expansion task, and show that PoF
can serve as an effective unsupervised feature extractor for a speaker
identification task.Comment: ICLR 2014 conference-track submission. Added link to the source cod
Reducing Audible Spectral Discontinuities
In this paper, a common problem in diphone synthesis is discussed, viz., the occurrence of audible discontinuities at diphone boundaries. Informal observations show that spectral mismatch is most likely the cause of this phenomenon.We first set out to find an objective spectral measure for discontinuity. To this end, several spectral distance measures are related to the results of a listening experiment. Then, we studied the feasibility of extending the diphone database with context-sensitive diphones to reduce the occurrence of audible discontinuities. The number of additional diphones is limited by clustering consonant contexts that have a similar effect on the surrounding vowels on the basis of the best performing distance measure. A listening experiment has shown that the addition of these context-sensitive diphones significantly reduces the amount of audible discontinuities
Algorithms for nonnegative matrix factorization with the beta-divergence
This paper describes algorithms for nonnegative matrix factorization (NMF)
with the beta-divergence (beta-NMF). The beta-divergence is a family of cost
functions parametrized by a single shape parameter beta that takes the
Euclidean distance, the Kullback-Leibler divergence and the Itakura-Saito
divergence as special cases (beta = 2,1,0, respectively). The proposed
algorithms are based on a surrogate auxiliary function (a local majorization of
the criterion function). We first describe a majorization-minimization (MM)
algorithm that leads to multiplicative updates, which differ from standard
heuristic multiplicative updates by a beta-dependent power exponent. The
monotonicity of the heuristic algorithm can however be proven for beta in (0,1)
using the proposed auxiliary function. Then we introduce the concept of
majorization-equalization (ME) algorithm which produces updates that move along
constant level sets of the auxiliary function and lead to larger steps than MM.
Simulations on synthetic and real data illustrate the faster convergence of the
ME approach. The paper also describes how the proposed algorithms can be
adapted to two common variants of NMF : penalized NMF (i.e., when a penalty
function of the factors is added to the criterion function) and convex-NMF
(when the dictionary is assumed to belong to a known subspace).Comment: \`a para\^itre dans Neural Computatio
Fingerprint Verification Using Spectral Minutiae Representations
Most fingerprint recognition systems are based on the use of a minutiae set, which is an unordered collection of minutiae locations and orientations suffering from various deformations such as translation, rotation, and scaling. The spectral minutiae representation introduced in this paper is a novel method to represent a minutiae set as a fixed-length feature vector, which is invariant to translation, and in which rotation and scaling become translations, so that they can be easily compensated for. These characteristics enable the combination of fingerprint recognition systems with template protection schemes that require a fixed-length feature vector. This paper introduces the concept of algorithms for two representation methods: the location-based spectral minutiae representation and the orientation-based spectral minutiae representation. Both algorithms are evaluated using two correlation-based spectral minutiae matching algorithms. We present the performance of our algorithms on three fingerprint databases. We also show how the performance can be improved by using a fusion scheme and singular points
Machine Learning Approaches to Historic Music Restoration
In 1889, a representative of Thomas Edison recorded Johannes Brahms playing a piano arrangement of his piece titled âHungarian Dance No. 1â. This recording acts as a window into how musical masters played in the 19th century. Yet, due to years of damage on the original recording medium of a wax cylinder, it was un-listenable by the time it was digitized into WAV format. This thesis presents machine learning approaches to an audio restoration system for historic music, which aims to convert this poor-quality Brahms piano recording into a higher quality one. Digital signal processing is paired with two machine learning approaches: non-negative matrix factorization and deep neural networks. Our results show the advantages and disadvantages of our approaches, when we compare them to a benchmark restoration of the same recording made by the Center for Computer Research in Music and Acoustics at Stanford University. They also show how this system provides the restoration potential for a wide range of historic music artifacts like this recording, requiring minimal overhead made possible by machine learning. Finally, we go into possible future improvements to these approaches
- âŠ