Sequential modeling, generative recurrent neural networks, and their applications to audio

Mehri, Soroush

thesis

Sequential modeling, generative recurrent neural networks, and their applications to audio

Authors: Soroush Mehri
Publication date: 1 December 2016
Publisher

Abstract

L'apprentissage profond s'est imposé comme étant le cadre de concrétisation d'une intelligence artificielle spécialisée; le chemin rêvé de beaucoup vers un futur où l'IA est omniprésente ou ce qu'on appellerait une intelligence artificielle générale. Durant ce projet, notre motivation a été l'envie de dompter cette puissante approche d'apprentissage afin de réaliser une avancée considérable vers la création d'une ``Machine Parlante''. Cette thèse décrit un modèle statistique paramétrique pour la génération inconditionnelle et de bout en bout de séquences audio dont la parole, des onomatopées et de la musique. Contrairement aux travaux réalisés dans ce sens dans le domaine du traitement du signal, les modèles qu'on propose se basent uniquement sur les échantillons audio bruts sans aucune manipulation ou extraction préalable de caractéristiques. La dimension générale de notre approche lui permet d'être appliquée à tout autre domaine - à savoir le traitement naturel du langage - dont les données requièrent une représentation séquentielle des données. Les chapitres 1 et 2 sont consacrés aux principes de bases de l'apprentissage automatique et de l'apprentissage profond. Les chapitres suivants détaillent l'approche adoptée afin d'atteindre notre but.By far Deep Learning showed to be the most promising venue of achieving applied Artificial Intelligence which has been the dream of many as the path toward AI-powered future and eventually the Artificial General Intelligence. In this work we are interested in harnessing this powerful method to make bigger strides in the direction of creating a ``Talking Machine''. This thesis is dedicated to presenting a parametric statistical model for generating unconditional audio sequences including speech, onomatopoeia, and music in an end-to-end manner. Proposed model does not benefit from any handcrafted features that are developed over the course of many years in the field of signal processing rather operates on raw sample audio. As a general framework it can also potentially be applied in other domains that require modeling sequential data; e.g. Natural Language Processing. Chapter 1 and 2 give a brief overview of the background topics including machine learning and basic building blocks of deep learning algorithms. Following chapters of this thesis present our endeavor toward the aforementioned goal

Similar works

Full text

Available Versions

Dépôt Institutionnel Numérique

oai:papyrus.bib.umontreal.ca:1...

Last time updated on 04/07/2017