3 research outputs found

    Audio Inpainting

    Get PDF
    (c) 2012 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works. Published version: IEEE Transactions on Audio, Speech and Language Processing 20(3): 922-932, Mar 2012. DOI: 10.1090/TASL.2011.2168211

    Sparse Approximation and Dictionary Learning with Applications to Audio Signals

    Get PDF
    PhDOver-complete transforms have recently become the focus of a wide wealth of research in signal processing, machine learning, statistics and related fields. Their great modelling flexibility allows to find sparse representations and approximations of data that in turn prove to be very efficient in a wide range of applications. Sparse models express signals as linear combinations of a few basis functions called atoms taken from a so-called dictionary. Finding the optimal dictionary from a set of training signals of a given class is the objective of dictionary learning and the main focus of this thesis. The experimental evidence presented here focuses on the processing of audio signals, and the role of sparse algorithms in audio applications is accordingly highlighted. The first main contribution of this thesis is the development of a pitch-synchronous transform where the frame-by-frame analysis of audio data is adapted so that each frame analysing periodic signals contains an integer number of periods. This algorithm presents a technique for adapting transform parameters to the audio signal to be analysed, it is shown to improve the sparsity of the representation if compared to a non pitchsynchronous approach and further evaluated in the context of source separation by binary masking. A second main contribution is the development of a novel model and relative algorithm for dictionary learning of convolved signals, where the observed variables are sparsely approximated by the atoms contained in a convolved dictionary. An algorithm is devised to learn the impulse response applied to the dictionary and experimental results on synthetic data show the superior approximation performance of the proposed method compared to a state-of-the-art dictionary learning algorithm. Finally, a third main contribution is the development of methods for learning dictionaries that are both well adapted to a training set of data and mutually incoherent. Two novel algorithms namely the incoherent k-svd and the iterative projections and rotations (ipr) algorithm are introduced and compared to different techniques published in the literature in a sparse approximation context. The ipr algorithm in particular is shown to outperform the benchmark techniques in learning very incoherent dictionaries while maintaining a good signal-to-noise ratio of the representation

    Représentations redondantes et hiérarchiques pour l'archivage et la compression de scènes sonores

    Get PDF
    L'objet de cette thèse est l'analyse et le traitement automatique de grands volumes de données audio. Plus particulièrement, on s'intéresse à l'archivage, tâche qui regroupe, au moins, deux problématiques: la compression des données, et l'indexation du contenu de celles-ci. Ces deux problématiques définissent chacune des objectifs, parfois concurrents, dont la prise en compte simultanée s'avère donc difficile. Au centre de cette thèse, il y a donc la volonté de construire un cadre cohérent à la fois pour la compression et pour l'indexation d'archives sonores. Les représentations parcimonieuses de signaux dans des dictionnaires redondants ont récemment montré leur capacité à remplir une telle fonction. Leurs propriétés ainsi que les méthodes et algorithmes permettant de les obtenir sont donc étudiés dans une première partie de cette thèse. Le cadre applicatif relativement contraignant (volume des données) va nous amener à choisir parmi ces derniers des algorithmes itératifs, appelés également gloutons. Une première contribution de cette thèse consiste en la proposition de variantes du célèbre Matching Pursuit basées sur un sous-échantillonnage aléatoire et dynamique de dictionnaires. L'adaptation au cas de dictionnaires temps-fréquence structurés (union de bases de cosinus locaux) nous permet d'espérer une amélioration significative des performances en compression de scènes sonores. Ces nouveaux algorithmes s'accompagnent d'une modélisation statistique originale des propriétés de convergence usant d'outils empruntés à la théorie des valeurs extrêmes. Les autres contributions de cette thèse s'attaquent au second membre du problème d'archivage: l'indexation. Le même cadre est cette fois-ci envisagé pour mettre à jour les différents niveaux de structuration des données. Au premier plan, la détection de redondances et répétitions. A grande échelle, un système robuste de détection de motifs récurrents dans un flux radiophonique par comparaison d'empreintes est proposé. Ses performances comparatives sur une campagne d'évaluation du projet QUAERO confirment la pertinence de cette approche. L'exploitation des structures pour un contexte autre que la compression est également envisagé. Nous proposons en particulier une application à la séparation de sources informée par la redondance pour illustrer la variété de traitements que le cadre choisi autorise. La synthèse des différents éléments permet alors d'envisager un système d'archivage répondant aux contraintes par la hiérarchisation des objectifs et des traitements.The main goal of this work is automated processing of large volumes of audio data. Most specifically, one is interested in archiving, a process that encompass at least two distinct problems: data compression and data indexing. Jointly addressing these problems is a difficult task since many of their objectives may be concurrent. Therefore, building a consistent framework for audio archival is the matter of this thesis. Sparse representations of signals in redundant dictionaries have recently been found of interest for many sub-problems of the archival task. Sparsity is a desirable property both for compression and for indexing. Methods and algorithms to build such representations are the first topic of this thesis. Given the dimensionality of the considered data, greedy algorithms will be particularly studied. A first contribution of this thesis is the proposal of a variant of the famous Matching Pursuit algorithm, that exploits randomness and sub-sampling of very large time frequency dictionaries. We show that audio compression (especially at low bit-rate) can be improved using this method. This new algorithms comes with an original modeling of asymptotic pursuit behaviors, using order statistics and tools from extreme values theory. Other contributions deal with the second member of the archival problem: indexing. The same framework is used and applied to different layers of signal structures. First, redundancies and musical repetition detection is addressed. At larger scale, we investigate audio fingerprinting schemes and apply it to radio broadcast on-line segmentation. Performances have been evaluated during an international campaign within the QUAERO project. Finally, the same framework is used to perform source separation informed by the redundancy. All these elements validate the proposed framework for the audio archiving task. The layered structures of audio data are accessed hierarchically by greedy decomposition algorithms and allow processing the different objectives of archival at different steps, thus addressing them within the same framework.PARIS-Télécom ParisTech (751132302) / SudocSudocFranceF