30 research outputs found

    A Perceptually Reweighted Mixed-Norm Method for Sparse Approximation of Audio Signals

    Get PDF

    Sparsity in Linear Predictive Coding of Speech

    Get PDF
    nrpages: 197status: publishe

    Group-Sparse Signal Denoising: Non-Convex Regularization, Convex Optimization

    Full text link
    Convex optimization with sparsity-promoting convex regularization is a standard approach for estimating sparse signals in noise. In order to promote sparsity more strongly than convex regularization, it is also standard practice to employ non-convex optimization. In this paper, we take a third approach. We utilize a non-convex regularization term chosen such that the total cost function (consisting of data consistency and regularization terms) is convex. Therefore, sparsity is more strongly promoted than in the standard convex formulation, but without sacrificing the attractive aspects of convex optimization (unique minimum, robust algorithms, etc.). We use this idea to improve the recently developed 'overlapping group shrinkage' (OGS) algorithm for the denoising of group-sparse signals. The algorithm is applied to the problem of speech enhancement with favorable results in terms of both SNR and perceptual quality.Comment: 14 pages, 11 figure

    Spatial Acoustic Vector Based Sound Field Reproduction

    Get PDF
    Spatial sound field reproduction aims to recreate an immersive sound field over a spatial region. The existing sound pressure based approaches to spatial sound field reproduction focus on the accurate approximation of original sound pressure over space, which ignores the perceptual accuracy of the reproduced sound field. The acoustic vectors of particle velocity and sound intensity appear to be closely linked with human perception of sound localization in literature. Therefore, in this thesis, we explore the spatial distributions of the acoustic vectors, and seek to develop algorithms to perceptually reproduce the original sound field over a continuous spatial region based on the vectors. A theory of spatial acoustic vectors is first developed, where the spatial distributions of particle velocity and sound intensity are derived from sound pressure. To extract the desired sound pressure from a mixed sound field environment, a 3D sound field separation technique is also formulated. Based on this theory, a series of reproduction techniques are proposed to improve the perceptual performance. The outcomes resulting from this theory are: (i) derivation of a particle velocity assisted 3D sound field reproduction technique which allows for non-uniform loudspeaker geometry with a limited number of loudspeakers, (ii) design of particle velocity based mixed-source sound field translation technique for binaural reproduction that can provide sound field translation with good perceptual experience over a large space, (iii) derivation of an intensity matching technique that can reproduce the desired sound field in a spherical region by controlling the sound intensity on the surface of the region, and (iv) two intensity based multizone sound field reproduction algorithms that can reproduce the desired sound field over multiple spatial zones. Finally, these techniques are evaluated by comparing to the conventional approaches through numerical simulations and real-world experiments

    Novel multiscale methods for nonlinear speech analysis

    Get PDF
    Cette thèse présente une recherche exploratoire sur l'application du Formalisme Microcanonique Multiéchelles (FMM) à l'analyse de la parole. Dérivé de principes issus en physique statistique, le FMM permet une analyse géométrique précise de la dynamique non linéaire des signaux complexes. Il est fondé sur l'estimation des paramètres géométriques locaux (les exposants de singularité) qui quantifient le degré de prédictibilité à chaque point du signal. Si correctement définis est estimés, ils fournissent des informations précieuses sur la dynamique locale de signaux complexes. Nous démontrons le potentiel du FMM dans l'analyse de la parole en développant: un algorithme performant pour la segmentation phonétique, un nouveau codeur, un algorithme robuste pour la détection précise des instants de fermeture glottale, un algorithme rapide pour l analyse par prédiction linéaire parcimonieuse et une solution efficace pour l approximation multipulse du signal source d'excitation.This thesis presents an exploratory research on the application of a nonlinear multiscale formalism, called the Microcanonical Multiscale Formalism (the MMF), to the analysis of speech signals. Derived from principles in Statistical Physics, the MMF allows accurate analysis of the nonlinear dynamics of complex signals. It relies on the estimation of local geometrical parameters, the singularity exponents (SE), which quantify the degree of predictability at each point of the signal domain. When correctly defined and estimated, these exponents can provide valuable information about the local dynamics of complex signals and has been successfully used in many applications ranging from signal representation to inference and prediction.We show the relevance of the MMF to speech analysis and develop several applications to show the strength and potential of the formalism. Using the MMF, in this thesis we introduce: a novel and accurate text-independent phonetic segmentation algorithm, a novel waveform coder, a robust accurate algorithm for detection of the Glottal Closure Instants, a closed-form solution for the problem of sparse linear prediction analysis and finally, an efficient algorithm for estimation of the excitation source signal.BORDEAUX1-Bib.electronique (335229901) / SudocSudocFranceF

    Sparse Modeling of Grouped Line Spectra

    Get PDF
    This licentiate thesis focuses on clustered parametric models for estimation of line spectra, when the spectral content of a signal source is assumed to exhibit some form of grouping. Different from previous parametric approaches, which generally require explicit knowledge of the model orders, this thesis exploits sparse modeling, where the orders are implicitly chosen. For line spectra, the non-linear parametric model is approximated by a linear system, containing an overcomplete basis of candidate frequencies, called a dictionary, and a large set of linear response variables that selects and weights the components in the dictionary. Frequency estimates are obtained by solving a convex optimization program, where the sum of squared residuals is minimized. To discourage overfitting and to infer certain structure in the solution, different convex penalty functions are introduced into the optimization. The cost trade-off between fit and penalty is set by some user parameters, as to approximate the true number of spectral lines in the signal, which implies that the response variable will be sparse, i.e., have few non-zero elements. Thus, instead of explicit model orders, the orders are implicitly set by this trade-off. For grouped variables, the dictionary is customized, and appropriate convex penalties selected, so that the solution becomes group sparse, i.e., has few groups with non-zero variables. In an array of sensors, the specific time-delays and attenuations will depend on the source and sensor positions. By modeling this, one may estimate the location of a source. In this thesis, a novel joint location and grouped frequency estimator is proposed, which exploits sparse modeling for both spectral and spatial estimates, showing robustness against sources with overlapping frequency content. For audio signals, this thesis uses two different features for clustering. Pitch is a perceptual property of sound that may be described by the harmonic model, i.e., by a group of spectral lines at integer multiples of a fundamental frequency, which we estimate by exploiting a novel adaptive total variation penalty. The other feature, chroma, is a concept in musical theory, collecting pitches at powers of 2 from each other into groups. Using a chroma dictionary, together with appropriate group sparse penalties, we propose an automatic transcription of the chroma content of a signal
    corecore