335 research outputs found
Design, implementation and evaluation of an acoustic source localization system using Deep Learning techniques
This Master Thesis presents a novel approach for indoor acoustic source localization using microphone
arrays, based on a Convolutional Neural Network (CNN) that we call the ASLNet. It directly estimates
the three-dimensional position of a single acoustic source using as inputs the raw audio signals from a set
of microphones. We use supervised learning methods to train our network end-to-end. The amount of
labeled training data available for this problem is however small. This Thesis presents a training strategy
based on two steps that mitigates this problem. We first train our network using semi-synthetic data
generated from close talk speech recordings and a mathematical model for signal propagation from the
source to the microphones. The amount of semi-synthetic data can be virtually as large as needed. We
then fine tune the resulting network using a small amount of real data. Our experimental results, evaluated
on a publicly available dataset recorded in a real room, show that this approach is able to improve existing
localization methods based on SRP-PHAT strategies and also those presented in very recent proposals
based on Convolutional Recurrent Neural Networks (CRNN). In addition, our experiments show that the
performance of the ASLNet does not show a relevant dependency on the speaker’s gender, nor on the
size of the signal window being used. This work also investigates methods to improve the generalization
properties of our network using only semi-synthetic data for training. This is a highly important objective
due to the cost of labelling localization data. We proceed by including specific effects in the input signals
to force the network to be insensitive to multipath, high noise and distortion likely to be present in real
scenarios. We obtain promising results with this strategy although they still lack behind strategies based
on fine-tuning.Máster Universitario en IngenierĂa de TelecomunicaciĂłn (M125
Recommended from our members
Principled control of approximate programs
In conventional computing, most programs are treated as implementations of mathematical functions for which there is an exact output that must computed from a given input. However, in many problem domains, it is sufficient to produce some approximation of this output. For example, when rendering a scene in graphics, it is acceptable to take computational short-cuts if human beings cannot tell the difference in the rendered scene. In other problem domains like machine learning, programs are often implementations of heuristic approaches to solving problems and therefore already compute approximate solutions to the original problem.
This is the key insight for the new research area, approximate computing, which attempts to trade-off such approximations against the cost of computational resources such as program execution time, energy consumption, and memory usage. We believe that approximate computing is an important step towards a more fundamental and comprehensive goal that we call information-efficiency. Current applications compute more information (bits) than are needed to produce their outputs, and since producing and transporting bits of information inside a computer requires energy/computation time/memory usage, information-inefficient computing leads directly to resources inefficiency.
Although there is now a fairly large literature on approximate computing, system researchers have focused mostly on what we can call the forward problem; that is, they have explored different ways in both hardware and software to introduce approximations in a program and have demonstrated that these approximations can enable significant execution speedups and energy savings with some quality degradation of the result. However, these efforts do not provide any guarantee on the amount of the quality degradation. Since the acceptable amount of degradation usually depends on the scenario in which the application is deployed, it is very important to be able to control the degree of approximation. In this dissertation, we refer to this problem as the inverse problem. Relatively little is known about how to solve the inverse problem in a disciplined way.
This dissertation makes two contributions towards solving the inverse problem. First, we investigate a large set of approximate algorithms from a variety of domains in order to understand how approximation is used in real-world applications. From this investigation, we determine that many approximate programs are tunable approximate programs. Tunable approximate programs have one or more parameters called knobs that can be changed to vary the quality of the output of the approximate computation as well as the corresponding cost. For example, an iterative linear equation solver can vary the number of iterations to trade quality of the solution versus the execution time, a Monte Carlo path tracer can change the number of sampling light paths to trade the quality of the resulting image against execution time, etc. Tunable approximate programs provide many opportunities for trading accuracy versus cost. By carefully analyzing these algorithms, we have found a set of patterns for how approximation is applied in tunable programs. Our classification can be used to identify new approximation opportunities in programs.
A second contribution of this dissertation is an approach to solving the inverse problem for tunable approximate programs. Concretely, the problem is to determine knob settings to minimize the cost while keeping the quality degradation within a given bound. There are four challenges: i) for real-world applications, the quality and cost are usually complex non-linear functions of the knobs and these functions are usually hard to express analytically; ii) the quality and the cost for an application vary greatly for different inputs; iii) when an acceptable quality degradation bound is presented, determining the knob setting has to be very efficient so that the extra overhead incurred by the identification will not exceed the cost saved by the approximation; and iv) the approach should be general so that it can be applied to many applications.
To meet these requirements, we formulate the inverse problem as a constrained optimization problem and solve it using a machine learning based approach. We build a system which uses machine learning techniques to learn cost and quality models for the program by profiling the program with a set of representative inputs. Then, when a quality degradation bound is presented, the system searches these error and cost models to identify the knob settings which can achieve the best cost savings while simultaneously guaranteeing the quality degradation bound statistically. We evaluate the system with a set of real world applications, including a social network graph partitioner, an image search engine, a 2-D graph layout engine, a 3-D game physics engine, a SVM solver and a radar signal processing engine. The experiments showed great savings in execution time and energy savings for a variety of quality bounds.Computer Science
Learning to Generate 3D Training Data
Human-level visual 3D perception ability has long been pursued by researchers in computer vision, computer graphics, and robotics. Recent years have seen an emerging line of works using synthetic images to train deep networks for single image 3D perception. Synthetic images rendered by graphics engines are a promising source for training deep neural networks because it comes with perfect 3D ground truth for free. However, the 3D shapes and scenes to be rendered are largely made manual. Besides, it is challenging to ensure that synthetic images collected this way can help train a deep network to perform well on real images. This is because graphics generation pipelines require numerous design decisions such as the selection of 3D shapes and the placement of the camera.
In this dissertation, we propose automatic generation pipelines of synthetic data that aim to improve the task performance of a trained network. We explore both supervised and unsupervised directions for automatic optimization of 3D decisions. For supervised learning, we demonstrate how to optimize 3D parameters such that a trained network can generalize well to real images. We first show that we can construct a pure synthetic 3D shape to achieve state-of-the-art performance on a shape-from-shading benchmark. We further parameterize the decisions as a vector and propose a hybrid gradient approach to efficiently optimize the vector towards usefulness. Our hybrid gradient is able to outperform classic black-box approaches on a wide selection of 3D perception tasks. For unsupervised learning, we propose a novelty metric for 3D parameter evolution based on deep autoregressive models. We show that without any extrinsic motivation, the novelty computed from autoregressive models alone is helpful. Our novelty metric can consistently encourage a random synthetic generator to produce more useful training data for downstream 3D perception tasks.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/163240/1/ydawei_1.pd
Deep networks training and generalization: insights from linearization
Bien qu'ils soient capables de représenter des fonctions très complexes, les réseaux de neurones profonds sont entraînés à l'aide de variations autour de la descente de gradient, un algorithme qui est basé sur une simple linéarisation de la fonction de coût à chaque itération lors de l'entrainement. Dans cette thèse, nous soutenons qu'une approche prometteuse pour élaborer une théorie générale qui expliquerait la généralisation des réseaux de neurones, est de s'inspirer d'une analogie avec les modèles linéaires, en étudiant le développement de Taylor au premier ordre qui relie des pas dans l'espace des paramètres à des modifications dans l'espace des fonctions.
Cette thèse par article comprend 3 articles ainsi qu'une bibliothèque logicielle. La bibliothèque NNGeometry (chapitre 3) sert de fil rouge à l'ensemble des projets, et introduit une Interface de Programmation Applicative (API) simple pour étudier la dynamique d'entrainement linéarisée de réseaux de neurones, en exploitant des méthodes récentes ainsi que de nouvelles accélérations algorithmiques. Dans l'article EKFAC (chapitre 4), nous proposons une approchée de la Matrice d'Information de Fisher (FIM), utilisée dans l'algorithme d'optimisation du gradient naturel. Dans l'article Lazy vs Hasty (chapitre 5), nous comparons la fonction obtenue par dynamique d'entrainement linéarisée (par exemple dans le régime limite du noyau tangent (NTK) à largeur infinie), au régime d'entrainement réel, en utilisant des groupes d'exemples classés selon différentes notions de difficulté. Dans l'article NTK alignment (chapitre 6), nous révélons un effet de régularisation implicite qui découle de l'alignement du NTK au noyau cible, au fur et à mesure que l'entrainement progresse.Despite being able to represent very complex functions, deep artificial neural networks are trained using variants of the basic gradient descent algorithm, which relies on linearization of the loss at each iteration during training. In this thesis, we argue that a promising way to tackle the challenge of elaborating a comprehensive theory explaining generalization in deep networks, is to take advantage of an analogy with linear models, by studying the first order Taylor expansion that maps parameter space updates to function space progress.
This thesis by publication is made of 3 papers and a software library. The library NNGeometry (chapter 3) serves as a common thread for all projects, and introduces a simple Application Programming Interface (API) to study the linearized training dynamics of deep networks using recent methods and contributed algorithmic accelerations. In the EKFAC paper (chapter 4), we propose an approximate to the Fisher Information Matrix (FIM), used in the natural gradient optimization algorithm. In the Lazy vs Hasty paper (chapter 5), we compare the function obtained while training using a linearized dynamics (e.g. in the infinite width Neural Tangent Kernel (NTK) limit regime), to the actual training regime, by means of examples grouped using different notions of difficulty. In the NTK alignment paper (chapter 6), we reveal an implicit regularization effect arising from the alignment of the NTK to the target kernel as training progresses
Auto-Encoders, Distributed Training and Information Representation in Deep Neural Networks
L'objectif de cette thèse est de présenter ma modeste contribution à l'effort collectif de l'humanité pour comprendre l'intelligence et construire des machines intelligentes.
Ceci est une thèse par articles (cinq au total), tous représentant une entreprise personnelle dans laquelle j'ai consacré beaucoup d'énergie.
Les articles sont présentés en ordre chronologique, et ils touchent principalement à deux sujets : l'apprentissage de représentations et l'optimisation. Les articles des chapitres 3, 5 et 9
sont dans la première catégorie, et ceux des chapitres 7 et 11 sont dans la seconde catégorie.
Dans le premier article, nous partons de l'idée de modéliser la géométrie des données en entraînant un auto-encodeur débruitant qui reconstruit les données après qu'on les ait perturbées. Nous établissons un lien entre les auto-encodeurs contractifs et les auto-encodeurs débruitants. Notre contribution majeure consiste à démontrer mathématiquement une propriété intéressante qu'ont les solutions optimales aux auto-encodeurs débruitants lorsqu'ils sont définis à partir de bruit additif gaussien. Plus spécifiquement, nous démontrons qu'ils apprennent le score de la densité de probabilité. Nous présentons un ensemble de méthodes pratiques par lesquelles ce résultat nous permet de transformer un auto-encodeur en modèle génératif. Nous menons certaines expériences dans le but d'apprendre la géométrie locale des distributions de données.
Dans le second article, nous continuons dans la même ligne d'idées en construisant un modèle génératif basé sur l'apprentissage de distributions conditionnelles. Cet exercice se fait dans un cadre plus général et nous nous concentrons sur les propriétés de la chaine de Markov obtenu par échantillonnage de Gibbs. à l'aide d'une petite modification lors de la construction de la chaine de Markov, nous obtenons un modèle que l'on nomme "Generative Stochastic Networks". Plusieurs copies de ce modèle peuvent se combiner pour créer une hiérarchie de représentations abstraites servant à mieux représenter la nature des données. Nous présentons des expériences sur l'ensemble de données MNIST et sur le remplissage d'images trouées.
Dans notre troisième article, nous présentons un nouveau paradigme pour l'optimisation parallèle. Nous proposons d'utiliser un ensemble de noeuds de calcul pour évaluer les coefficients nécessaires à faire de l'échantillonnage préférentiel sur les données d'entraînement. Cette idée ressemble beaucoup à l'apprentissage avec curriculum qui est une méthode dans laquelle l'ordre des données fournies au modèle est choisi avec beaucoup de soin dans le but de faciliter l'apprentissage.
Nous comparons les résultats expérimentaux observés à ceux anticipés en terme de réduction de variance sur les gradients.
Dans notre quatrième article, nous revenons au concept d'apprentissage de représentations et nous cherchons à savoir s'il serait possible de définir une notion utile de "contenu en information" dans le contexte de couches de réseaux neuronaux.
Ceci nous intéresse en particulier parce qu'il y a une sorte de paradoxe avec les réseaux profonds qui sont déterministes. Les couches les plus profondes ont des meilleures représentations que les premières couches, mais si l'on regarde strictement avec le point de vue de l'entropie (venant de la théorie de l'information) il est impossible qu'une couche plus profonde contienne plus d'information qu'une couche à l'entrée.
Nous développons une méthode d'entraînement de classifieur linéaire sur chaque couche du modèle étudié (dont les paramètres sont maintenant figés pendant l'étude). Nous appelons ces classifeurs des "sondes linéaires de classification", et nous nous en servons pour mieux comprendre la dynamique particulière de l'entraînement d'un réseau profond.
Nous présentons des expériences menées sur des gros modèles (Inception v3 et ResNet-50), et nous découvrons une propriété étonnante : la performance de ces sondes augmente de manière monotone lorsque l'on descend dans les couches plus profondes.
Dans le cinquième article, nous retournons à l'optimisation, et nous étudions la courbure de l'espace de la fonction de perte. Nous regardons les vecteurs propres dominants de la matrice hessienne, et nous explorons les gains potentiels dans ces directions s'il était possible de faire un pas d'une longueur optimale.
Nous sommes principalement intéressés par les gains dans les directions associées aux valeurs propres négatives car celles-ci sont généralement ignorées par les méthodes populaire d'optimisation convexes. L'étude de la matrice hessienne demande des coûts énormes en calcul, et nous devons nous limiter à des expérience sur les données MNIST. Nous découvrons que des gains très importants peuvent être réalisés dans les directions de courbure négative, et que les longueurs de pas optimales sont beaucoup plus grandes que celles suggérées par la littérature existante.The goal of this thesis is to present a body of work that serves as my modest
contribution to humanity's
quest to understand intelligence and to implement intelligent systems.
This is a thesis by articles, containing five articles, not all of equal impact,
but all representing a very meaningful personal endeavor.
The articles are presented in chronological order, and they cluster
around two general topics : representation learning and optimization. Articles from chapters 3, 5, and 9
are in the former category, whereas articles from chapters 7 and 11 are in the latter.
In the first article, we start with the idea of manifold learning through training
a denoising auto-encoder to locally reconstruct data after perturbations.
We establish a connection between contractive auto-encoders and denoising auto-encoders.
More importantly, we prove mathematically a very interesting property from the
optimal solution to denoising auto-encoders with additive gaussian noise.
Namely, the fact that they learn exactly the score of the probability density function
of the training distribution. We present a collection of ways in which this
allows us to turn an auto-encoder into a generative model.
We provide experiments all related to the goal of local manifold learning.
In the second article, we continue with that idea of building a generative model
by learning conditional distributions. We do that in a more general setting
and we focus more on the properties of the Markov chain obtained by Gibbs sampling.
With a small modification in the construction of the Markov chain,
we obtain the more general "Generative Stochastic Networks",
which we can then stack together into a structure that can represent more
accurately the different levels of abstraction of the data modeled.
We present experiments involving the generation of MNIST digits and image inpainting.
In the third article, we present a novel idea for distributed optimization.
Our proposal uses
a collection of worker nodes to compute the importance weights to be used
by one master node to perform Importance Sampling.
This paradigm has a lot in common with the idea of curriculum learning,
whereby the order of training examples is taken to have a significant impact on the
training performance.
We present results to compare the potential reduction in variance
for gradient estimates with the practical reduction in variance observed.
In the fourth article, we go back to the concept of representation learning
by asking whether there would be any measurable quantity in a neural network layer
that would correspond intuitively to its "information contents".
This is particularly interesting because there is a kind of paradox in
deterministic neural networks : deeper layers encode better representations of the
input signal, but they carry less (or equal) information than the raw inputs (in terms of entropy).
By training a linear classifier on every layer in a neural network (with frozen parameters),
we are able to measure linearly separability of the representations at every layer.
We call these "linear classifier probes", and we show how they
can be used to better understand the dynamics of training a neural network.
We present experiments with large models (Inception v3 and ResNet-50) and uncover
a surprizing property : linear separability increases in a strictly monotonic
relationship with the layer depth.
In the fifth article, we revisit optimization again, but now we study the negative
curvature of the loss function. We look at the most dominant eigenvalues and
eigenvectors of the Hessian matrix, and we explore the gains to be made
by modifying the model parameters along that direction with an optimal step size.
We are mainly interested in the potential gains for directions of negative curvature,
because those are ignored by the very popular convex optimization
methods used by the deep learning community.
Due to the large computational costs of anything dealing with
the Hessian matrix, we run a small model on MNIST. We find that large gains
can be made in directions of negative curvature, and that the optimal step sizes
involved are larger than the current literature would recommend
Data-efficient deep representation learning
Current deep learning methods succeed in many data-intensive applications, but they are still not able to produce robust performance due to the lack of training samples. To investigate how to improve the performance of deep learning paradigms when training samples are limited, data-efficient deep representation learning (DDRL) is proposed in this study. DDRL as a sub area of representation learning mainly addresses the following problem: How can the performance of a deep learning method be maintained when the number of training samples is significantly reduced? This is vital for many applications where collecting data is highly costly, such as medical image analysis. Incorporating a certain kind of prior knowledge into the learning paradigm is key to achieving data efficiency.
Deep learning as a sub-area of machine learning can be divided into three parts (locations) in its learning process, namely Data, Optimisation and Model. Integrating prior knowledge into these three locations is expected to bring data efficiency into a learning paradigm, which can dramatically increase the model performance under the condition of limited training data.
In this thesis, we aim to develop novel deep learning methods for achieving data-efficient training, each of which integrates a certain kind of prior knowledge into three different locations respectively. We make the following contributions. First, we propose an iterative solution based on deep learning for medical image segmentation tasks, where dynamical systems are integrated into the segmentation labels in order to improve both performance and data efficiency. The proposed method not only shows a superior performance and better data efficiency compared to the state-of-the-art methods, but also has better interpretability and rotational invariance which are desired for medical imagining applications. Second, we propose a novel training framework which adaptively selects more informative samples for training during the optimization process. The adaptive selection or sampling is performed based on a hardness-aware strategy in the latent space constructed by a generative model.
We show that the proposed framework outperforms a random sampling method, which demonstrates effectiveness of the proposed framework. Thirdly, we propose a deep neural network model which produces the segmentation maps in a coarse-to-fine manner. The proposed architecture is a sequence of computational blocks containing a number of convolutional layers in which each block provides its successive block with a coarser segmentation map as a reference. Such mechanisms enable us to train the network with limited training samples and produce more interpretable results.Open Acces
On Improving Generalization of CNN-Based Image Classification with Delineation Maps Using the CORF Push-Pull Inhibition Operator
Deployed image classification pipelines are typically dependent on the images captured in real-world environments. This means that images might be affected by different sources of perturbations (e.g. sensor noise in low-light environments). The main challenge arises by the fact that image quality directly impacts the reliability and consistency of classification tasks. This challenge has, hence, attracted wide interest within the computer vision communities. We propose a transformation step that attempts to enhance the generalization ability of CNN models in the presence of unseen noise in the test set. Concretely, the delineation maps of given images are determined using the CORF push-pull inhibition operator. Such an operation transforms an input image into a space that is more robust to noise before being processed by a CNN. We evaluated our approach on the Fashion MNIST data set with an AlexNet model. It turned out that the proposed CORF-augmented pipeline achieved comparable results on noise-free images to those of a conventional AlexNet classification model without CORF delineation maps, but it consistently achieved significantly superior performance on test images perturbed with different levels of Gaussian and uniform noise
- …