5,255 research outputs found

    On the Computation of the Kullback-Leibler Measure for Spectral Distances

    Get PDF
    Efficient algorithms for the exact and approximate computation of the symmetrical Kullback-Leibler (1998) measure for spectral distances are presented for linear predictive coding (LPC) spectra. A interpretation of this measure is given in terms of the poles of the spectra. The performances of the algorithms in terms of accuracy and computational complexity are assessed for the application of computing concatenation costs in unit-selection-based speech synthesis. With the same complexity and storage requirements, the exact method is superior in terms of accuracy

    A Generative Product-of-Filters Model of Audio

    Full text link
    We propose the product-of-filters (PoF) model, a generative model that decomposes audio spectra as sparse linear combinations of "filters" in the log-spectral domain. PoF makes similar assumptions to those used in the classic homomorphic filtering approach to signal processing, but replaces hand-designed decompositions built of basic signal processing operations with a learned decomposition based on statistical inference. This paper formulates the PoF model and derives a mean-field method for posterior inference and a variational EM algorithm to estimate the model's free parameters. We demonstrate PoF's potential for audio processing on a bandwidth expansion task, and show that PoF can serve as an effective unsupervised feature extractor for a speaker identification task.Comment: ICLR 2014 conference-track submission. Added link to the source cod

    Reducing Audible Spectral Discontinuities

    Get PDF
    In this paper, a common problem in diphone synthesis is discussed, viz., the occurrence of audible discontinuities at diphone boundaries. Informal observations show that spectral mismatch is most likely the cause of this phenomenon.We first set out to find an objective spectral measure for discontinuity. To this end, several spectral distance measures are related to the results of a listening experiment. Then, we studied the feasibility of extending the diphone database with context-sensitive diphones to reduce the occurrence of audible discontinuities. The number of additional diphones is limited by clustering consonant contexts that have a similar effect on the surrounding vowels on the basis of the best performing distance measure. A listening experiment has shown that the addition of these context-sensitive diphones significantly reduces the amount of audible discontinuities

    Algorithms for nonnegative matrix factorization with the beta-divergence

    Get PDF
    This paper describes algorithms for nonnegative matrix factorization (NMF) with the beta-divergence (beta-NMF). The beta-divergence is a family of cost functions parametrized by a single shape parameter beta that takes the Euclidean distance, the Kullback-Leibler divergence and the Itakura-Saito divergence as special cases (beta = 2,1,0, respectively). The proposed algorithms are based on a surrogate auxiliary function (a local majorization of the criterion function). We first describe a majorization-minimization (MM) algorithm that leads to multiplicative updates, which differ from standard heuristic multiplicative updates by a beta-dependent power exponent. The monotonicity of the heuristic algorithm can however be proven for beta in (0,1) using the proposed auxiliary function. Then we introduce the concept of majorization-equalization (ME) algorithm which produces updates that move along constant level sets of the auxiliary function and lead to larger steps than MM. Simulations on synthetic and real data illustrate the faster convergence of the ME approach. The paper also describes how the proposed algorithms can be adapted to two common variants of NMF : penalized NMF (i.e., when a penalty function of the factors is added to the criterion function) and convex-NMF (when the dictionary is assumed to belong to a known subspace).Comment: \`a para\^itre dans Neural Computatio

    Fingerprint Verification Using Spectral Minutiae Representations

    Get PDF
    Most fingerprint recognition systems are based on the use of a minutiae set, which is an unordered collection of minutiae locations and orientations suffering from various deformations such as translation, rotation, and scaling. The spectral minutiae representation introduced in this paper is a novel method to represent a minutiae set as a fixed-length feature vector, which is invariant to translation, and in which rotation and scaling become translations, so that they can be easily compensated for. These characteristics enable the combination of fingerprint recognition systems with template protection schemes that require a fixed-length feature vector. This paper introduces the concept of algorithms for two representation methods: the location-based spectral minutiae representation and the orientation-based spectral minutiae representation. Both algorithms are evaluated using two correlation-based spectral minutiae matching algorithms. We present the performance of our algorithms on three fingerprint databases. We also show how the performance can be improved by using a fusion scheme and singular points

    Machine Learning Approaches to Historic Music Restoration

    Get PDF
    In 1889, a representative of Thomas Edison recorded Johannes Brahms playing a piano arrangement of his piece titled “Hungarian Dance No. 1”. This recording acts as a window into how musical masters played in the 19th century. Yet, due to years of damage on the original recording medium of a wax cylinder, it was un-listenable by the time it was digitized into WAV format. This thesis presents machine learning approaches to an audio restoration system for historic music, which aims to convert this poor-quality Brahms piano recording into a higher quality one. Digital signal processing is paired with two machine learning approaches: non-negative matrix factorization and deep neural networks. Our results show the advantages and disadvantages of our approaches, when we compare them to a benchmark restoration of the same recording made by the Center for Computer Research in Music and Acoustics at Stanford University. They also show how this system provides the restoration potential for a wide range of historic music artifacts like this recording, requiring minimal overhead made possible by machine learning. Finally, we go into possible future improvements to these approaches
    • 

    corecore