335 research outputs found

    Design, implementation and evaluation of an acoustic source localization system using Deep Learning techniques

    Get PDF
    This Master Thesis presents a novel approach for indoor acoustic source localization using microphone arrays, based on a Convolutional Neural Network (CNN) that we call the ASLNet. It directly estimates the three-dimensional position of a single acoustic source using as inputs the raw audio signals from a set of microphones. We use supervised learning methods to train our network end-to-end. The amount of labeled training data available for this problem is however small. This Thesis presents a training strategy based on two steps that mitigates this problem. We first train our network using semi-synthetic data generated from close talk speech recordings and a mathematical model for signal propagation from the source to the microphones. The amount of semi-synthetic data can be virtually as large as needed. We then fine tune the resulting network using a small amount of real data. Our experimental results, evaluated on a publicly available dataset recorded in a real room, show that this approach is able to improve existing localization methods based on SRP-PHAT strategies and also those presented in very recent proposals based on Convolutional Recurrent Neural Networks (CRNN). In addition, our experiments show that the performance of the ASLNet does not show a relevant dependency on the speaker’s gender, nor on the size of the signal window being used. This work also investigates methods to improve the generalization properties of our network using only semi-synthetic data for training. This is a highly important objective due to the cost of labelling localization data. We proceed by including specific effects in the input signals to force the network to be insensitive to multipath, high noise and distortion likely to be present in real scenarios. We obtain promising results with this strategy although they still lack behind strategies based on fine-tuning.Máster Universitario en Ingeniería de Telecomunicación (M125

    Efficient Continual Learning:Approaches and Measures

    Get PDF

    Learning to Generate 3D Training Data

    Full text link
    Human-level visual 3D perception ability has long been pursued by researchers in computer vision, computer graphics, and robotics. Recent years have seen an emerging line of works using synthetic images to train deep networks for single image 3D perception. Synthetic images rendered by graphics engines are a promising source for training deep neural networks because it comes with perfect 3D ground truth for free. However, the 3D shapes and scenes to be rendered are largely made manual. Besides, it is challenging to ensure that synthetic images collected this way can help train a deep network to perform well on real images. This is because graphics generation pipelines require numerous design decisions such as the selection of 3D shapes and the placement of the camera. In this dissertation, we propose automatic generation pipelines of synthetic data that aim to improve the task performance of a trained network. We explore both supervised and unsupervised directions for automatic optimization of 3D decisions. For supervised learning, we demonstrate how to optimize 3D parameters such that a trained network can generalize well to real images. We first show that we can construct a pure synthetic 3D shape to achieve state-of-the-art performance on a shape-from-shading benchmark. We further parameterize the decisions as a vector and propose a hybrid gradient approach to efficiently optimize the vector towards usefulness. Our hybrid gradient is able to outperform classic black-box approaches on a wide selection of 3D perception tasks. For unsupervised learning, we propose a novelty metric for 3D parameter evolution based on deep autoregressive models. We show that without any extrinsic motivation, the novelty computed from autoregressive models alone is helpful. Our novelty metric can consistently encourage a random synthetic generator to produce more useful training data for downstream 3D perception tasks.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/163240/1/ydawei_1.pd

    Deep networks training and generalization: insights from linearization

    Full text link
    Bien qu'ils soient capables de représenter des fonctions très complexes, les réseaux de neurones profonds sont entraînés à l'aide de variations autour de la descente de gradient, un algorithme qui est basé sur une simple linéarisation de la fonction de coût à chaque itération lors de l'entrainement. Dans cette thèse, nous soutenons qu'une approche prometteuse pour élaborer une théorie générale qui expliquerait la généralisation des réseaux de neurones, est de s'inspirer d'une analogie avec les modèles linéaires, en étudiant le développement de Taylor au premier ordre qui relie des pas dans l'espace des paramètres à des modifications dans l'espace des fonctions. Cette thèse par article comprend 3 articles ainsi qu'une bibliothèque logicielle. La bibliothèque NNGeometry (chapitre 3) sert de fil rouge à l'ensemble des projets, et introduit une Interface de Programmation Applicative (API) simple pour étudier la dynamique d'entrainement linéarisée de réseaux de neurones, en exploitant des méthodes récentes ainsi que de nouvelles accélérations algorithmiques. Dans l'article EKFAC (chapitre 4), nous proposons une approchée de la Matrice d'Information de Fisher (FIM), utilisée dans l'algorithme d'optimisation du gradient naturel. Dans l'article Lazy vs Hasty (chapitre 5), nous comparons la fonction obtenue par dynamique d'entrainement linéarisée (par exemple dans le régime limite du noyau tangent (NTK) à largeur infinie), au régime d'entrainement réel, en utilisant des groupes d'exemples classés selon différentes notions de difficulté. Dans l'article NTK alignment (chapitre 6), nous révélons un effet de régularisation implicite qui découle de l'alignement du NTK au noyau cible, au fur et à mesure que l'entrainement progresse.Despite being able to represent very complex functions, deep artificial neural networks are trained using variants of the basic gradient descent algorithm, which relies on linearization of the loss at each iteration during training. In this thesis, we argue that a promising way to tackle the challenge of elaborating a comprehensive theory explaining generalization in deep networks, is to take advantage of an analogy with linear models, by studying the first order Taylor expansion that maps parameter space updates to function space progress. This thesis by publication is made of 3 papers and a software library. The library NNGeometry (chapter 3) serves as a common thread for all projects, and introduces a simple Application Programming Interface (API) to study the linearized training dynamics of deep networks using recent methods and contributed algorithmic accelerations. In the EKFAC paper (chapter 4), we propose an approximate to the Fisher Information Matrix (FIM), used in the natural gradient optimization algorithm. In the Lazy vs Hasty paper (chapter 5), we compare the function obtained while training using a linearized dynamics (e.g. in the infinite width Neural Tangent Kernel (NTK) limit regime), to the actual training regime, by means of examples grouped using different notions of difficulty. In the NTK alignment paper (chapter 6), we reveal an implicit regularization effect arising from the alignment of the NTK to the target kernel as training progresses

    Auto-Encoders, Distributed Training and Information Representation in Deep Neural Networks

    Get PDF
    L'objectif de cette thèse est de présenter ma modeste contribution à l'effort collectif de l'humanité pour comprendre l'intelligence et construire des machines intelligentes. Ceci est une thèse par articles (cinq au total), tous représentant une entreprise personnelle dans laquelle j'ai consacré beaucoup d'énergie. Les articles sont présentés en ordre chronologique, et ils touchent principalement à deux sujets : l'apprentissage de représentations et l'optimisation. Les articles des chapitres 3, 5 et 9 sont dans la première catégorie, et ceux des chapitres 7 et 11 sont dans la seconde catégorie. Dans le premier article, nous partons de l'idée de modéliser la géométrie des données en entraînant un auto-encodeur débruitant qui reconstruit les données après qu'on les ait perturbées. Nous établissons un lien entre les auto-encodeurs contractifs et les auto-encodeurs débruitants. Notre contribution majeure consiste à démontrer mathématiquement une propriété intéressante qu'ont les solutions optimales aux auto-encodeurs débruitants lorsqu'ils sont définis à partir de bruit additif gaussien. Plus spécifiquement, nous démontrons qu'ils apprennent le score de la densité de probabilité. Nous présentons un ensemble de méthodes pratiques par lesquelles ce résultat nous permet de transformer un auto-encodeur en modèle génératif. Nous menons certaines expériences dans le but d'apprendre la géométrie locale des distributions de données. Dans le second article, nous continuons dans la même ligne d'idées en construisant un modèle génératif basé sur l'apprentissage de distributions conditionnelles. Cet exercice se fait dans un cadre plus général et nous nous concentrons sur les propriétés de la chaine de Markov obtenu par échantillonnage de Gibbs. à l'aide d'une petite modification lors de la construction de la chaine de Markov, nous obtenons un modèle que l'on nomme "Generative Stochastic Networks". Plusieurs copies de ce modèle peuvent se combiner pour créer une hiérarchie de représentations abstraites servant à mieux représenter la nature des données. Nous présentons des expériences sur l'ensemble de données MNIST et sur le remplissage d'images trouées. Dans notre troisième article, nous présentons un nouveau paradigme pour l'optimisation parallèle. Nous proposons d'utiliser un ensemble de noeuds de calcul pour évaluer les coefficients nécessaires à faire de l'échantillonnage préférentiel sur les données d'entraînement. Cette idée ressemble beaucoup à l'apprentissage avec curriculum qui est une méthode dans laquelle l'ordre des données fournies au modèle est choisi avec beaucoup de soin dans le but de faciliter l'apprentissage. Nous comparons les résultats expérimentaux observés à ceux anticipés en terme de réduction de variance sur les gradients. Dans notre quatrième article, nous revenons au concept d'apprentissage de représentations et nous cherchons à savoir s'il serait possible de définir une notion utile de "contenu en information" dans le contexte de couches de réseaux neuronaux. Ceci nous intéresse en particulier parce qu'il y a une sorte de paradoxe avec les réseaux profonds qui sont déterministes. Les couches les plus profondes ont des meilleures représentations que les premières couches, mais si l'on regarde strictement avec le point de vue de l'entropie (venant de la théorie de l'information) il est impossible qu'une couche plus profonde contienne plus d'information qu'une couche à l'entrée. Nous développons une méthode d'entraînement de classifieur linéaire sur chaque couche du modèle étudié (dont les paramètres sont maintenant figés pendant l'étude). Nous appelons ces classifeurs des "sondes linéaires de classification", et nous nous en servons pour mieux comprendre la dynamique particulière de l'entraînement d'un réseau profond. Nous présentons des expériences menées sur des gros modèles (Inception v3 et ResNet-50), et nous découvrons une propriété étonnante : la performance de ces sondes augmente de manière monotone lorsque l'on descend dans les couches plus profondes. Dans le cinquième article, nous retournons à l'optimisation, et nous étudions la courbure de l'espace de la fonction de perte. Nous regardons les vecteurs propres dominants de la matrice hessienne, et nous explorons les gains potentiels dans ces directions s'il était possible de faire un pas d'une longueur optimale. Nous sommes principalement intéressés par les gains dans les directions associées aux valeurs propres négatives car celles-ci sont généralement ignorées par les méthodes populaire d'optimisation convexes. L'étude de la matrice hessienne demande des coûts énormes en calcul, et nous devons nous limiter à des expérience sur les données MNIST. Nous découvrons que des gains très importants peuvent être réalisés dans les directions de courbure négative, et que les longueurs de pas optimales sont beaucoup plus grandes que celles suggérées par la littérature existante.The goal of this thesis is to present a body of work that serves as my modest contribution to humanity's quest to understand intelligence and to implement intelligent systems. This is a thesis by articles, containing five articles, not all of equal impact, but all representing a very meaningful personal endeavor. The articles are presented in chronological order, and they cluster around two general topics : representation learning and optimization. Articles from chapters 3, 5, and 9 are in the former category, whereas articles from chapters 7 and 11 are in the latter. In the first article, we start with the idea of manifold learning through training a denoising auto-encoder to locally reconstruct data after perturbations. We establish a connection between contractive auto-encoders and denoising auto-encoders. More importantly, we prove mathematically a very interesting property from the optimal solution to denoising auto-encoders with additive gaussian noise. Namely, the fact that they learn exactly the score of the probability density function of the training distribution. We present a collection of ways in which this allows us to turn an auto-encoder into a generative model. We provide experiments all related to the goal of local manifold learning. In the second article, we continue with that idea of building a generative model by learning conditional distributions. We do that in a more general setting and we focus more on the properties of the Markov chain obtained by Gibbs sampling. With a small modification in the construction of the Markov chain, we obtain the more general "Generative Stochastic Networks", which we can then stack together into a structure that can represent more accurately the different levels of abstraction of the data modeled. We present experiments involving the generation of MNIST digits and image inpainting. In the third article, we present a novel idea for distributed optimization. Our proposal uses a collection of worker nodes to compute the importance weights to be used by one master node to perform Importance Sampling. This paradigm has a lot in common with the idea of curriculum learning, whereby the order of training examples is taken to have a significant impact on the training performance. We present results to compare the potential reduction in variance for gradient estimates with the practical reduction in variance observed. In the fourth article, we go back to the concept of representation learning by asking whether there would be any measurable quantity in a neural network layer that would correspond intuitively to its "information contents". This is particularly interesting because there is a kind of paradox in deterministic neural networks : deeper layers encode better representations of the input signal, but they carry less (or equal) information than the raw inputs (in terms of entropy). By training a linear classifier on every layer in a neural network (with frozen parameters), we are able to measure linearly separability of the representations at every layer. We call these "linear classifier probes", and we show how they can be used to better understand the dynamics of training a neural network. We present experiments with large models (Inception v3 and ResNet-50) and uncover a surprizing property : linear separability increases in a strictly monotonic relationship with the layer depth. In the fifth article, we revisit optimization again, but now we study the negative curvature of the loss function. We look at the most dominant eigenvalues and eigenvectors of the Hessian matrix, and we explore the gains to be made by modifying the model parameters along that direction with an optimal step size. We are mainly interested in the potential gains for directions of negative curvature, because those are ignored by the very popular convex optimization methods used by the deep learning community. Due to the large computational costs of anything dealing with the Hessian matrix, we run a small model on MNIST. We find that large gains can be made in directions of negative curvature, and that the optimal step sizes involved are larger than the current literature would recommend

    Data-efficient deep representation learning

    Get PDF
    Current deep learning methods succeed in many data-intensive applications, but they are still not able to produce robust performance due to the lack of training samples. To investigate how to improve the performance of deep learning paradigms when training samples are limited, data-efficient deep representation learning (DDRL) is proposed in this study. DDRL as a sub area of representation learning mainly addresses the following problem: How can the performance of a deep learning method be maintained when the number of training samples is significantly reduced? This is vital for many applications where collecting data is highly costly, such as medical image analysis. Incorporating a certain kind of prior knowledge into the learning paradigm is key to achieving data efficiency. Deep learning as a sub-area of machine learning can be divided into three parts (locations) in its learning process, namely Data, Optimisation and Model. Integrating prior knowledge into these three locations is expected to bring data efficiency into a learning paradigm, which can dramatically increase the model performance under the condition of limited training data. In this thesis, we aim to develop novel deep learning methods for achieving data-efficient training, each of which integrates a certain kind of prior knowledge into three different locations respectively. We make the following contributions. First, we propose an iterative solution based on deep learning for medical image segmentation tasks, where dynamical systems are integrated into the segmentation labels in order to improve both performance and data efficiency. The proposed method not only shows a superior performance and better data efficiency compared to the state-of-the-art methods, but also has better interpretability and rotational invariance which are desired for medical imagining applications. Second, we propose a novel training framework which adaptively selects more informative samples for training during the optimization process. The adaptive selection or sampling is performed based on a hardness-aware strategy in the latent space constructed by a generative model. We show that the proposed framework outperforms a random sampling method, which demonstrates effectiveness of the proposed framework. Thirdly, we propose a deep neural network model which produces the segmentation maps in a coarse-to-fine manner. The proposed architecture is a sequence of computational blocks containing a number of convolutional layers in which each block provides its successive block with a coarser segmentation map as a reference. Such mechanisms enable us to train the network with limited training samples and produce more interpretable results.Open Acces

    On Improving Generalization of CNN-Based Image Classification with Delineation Maps Using the CORF Push-Pull Inhibition Operator

    Get PDF
    Deployed image classification pipelines are typically dependent on the images captured in real-world environments. This means that images might be affected by different sources of perturbations (e.g. sensor noise in low-light environments). The main challenge arises by the fact that image quality directly impacts the reliability and consistency of classification tasks. This challenge has, hence, attracted wide interest within the computer vision communities. We propose a transformation step that attempts to enhance the generalization ability of CNN models in the presence of unseen noise in the test set. Concretely, the delineation maps of given images are determined using the CORF push-pull inhibition operator. Such an operation transforms an input image into a space that is more robust to noise before being processed by a CNN. We evaluated our approach on the Fashion MNIST data set with an AlexNet model. It turned out that the proposed CORF-augmented pipeline achieved comparable results on noise-free images to those of a conventional AlexNet classification model without CORF delineation maps, but it consistently achieved significantly superior performance on test images perturbed with different levels of Gaussian and uniform noise
    • …
    corecore