28 research outputs found

    Laplacian Mixture Modeling for Network Analysis and Unsupervised Learning on Graphs

    Full text link
    Laplacian mixture models identify overlapping regions of influence in unlabeled graph and network data in a scalable and computationally efficient way, yielding useful low-dimensional representations. By combining Laplacian eigenspace and finite mixture modeling methods, they provide probabilistic or fuzzy dimensionality reductions or domain decompositions for a variety of input data types, including mixture distributions, feature vectors, and graphs or networks. Provable optimal recovery using the algorithm is analytically shown for a nontrivial class of cluster graphs. Heuristic approximations for scalable high-performance implementations are described and empirically tested. Connections to PageRank and community detection in network analysis demonstrate the wide applicability of this approach. The origins of fuzzy spectral methods, beginning with generalized heat or diffusion equations in physics, are reviewed and summarized. Comparisons to other dimensionality reduction and clustering methods for challenging unsupervised machine learning problems are also discussed.Comment: 13 figures, 35 reference

    Applications of nonlinear approximation for problems in learning theory and applied mathematics

    Get PDF
    A major pillar of approximation theory in establishing the ability of one class of functions to be represented by another. Establishing such a relationship often leads to efficient numerical approximation methods. In this work, several expressibility theorems are established and several novel numerical approximation techniques are also presented. Not only are these novel methods supported by the presented theory, but also, provided numerical experiments show that these novel methods may be applied to a wide range of applications from image compression to the solutions of high-dimensional PDE

    A Study Of The Mathematics Of Deep Learning

    Get PDF
    "Deep Learning"/"Deep Neural Nets" is a technological marvel that is now increasingly deployed at the cutting-edge of artificial intelligence tasks. This ongoing revolution can be said to have been ignited by the iconic 2012 paper from the University of Toronto titled ``ImageNet Classification with Deep Convolutional Neural Networks'' by Alex Krizhevsky, Ilya Sutskever and Geoffrey E. Hinton. This paper showed that deep nets can be used to classify images into meaningful categories with almost human-like accuracies! As of 2020 this approach continues to produce unprecedented performance for an ever widening variety of novel purposes ranging from playing chess to self-driving cars to experimental astrophysics and high-energy physics. But this new found astonishing success of deep neural nets in the last few years has been hinged on an enormous amount of heuristics and it has turned out to be extremely challenging to be mathematically rigorously explainable. In this thesis we take several steps towards building strong theoretical foundations for these new paradigms of deep-learning. Our proofs here can be broadly grouped into three categories, 1. Understanding Neural Function Spaces We show new circuit complexity theorems for deep neural functions over real and Boolean inputs and prove classification theorems about these function spaces which in turn lead to exact algorithms for empirical risk minimization for depth 2 ReLU nets. We also motivate a measure of complexity of neural functions and leverage techniques from polytope geometry to constructively establish the existence of high-complexity neural functions. 2. Understanding Deep Learning Algorithms We give fast iterative stochastic algorithms which can learn near optimal approximations of the true parameters of a \relu gate in the realizable setting. (There are improved versions of this result available in our papers https://arxiv.org/abs/2005.01699 and https://arxiv.org/abs/2005.04211 which are not included in the thesis.) We also establish the first ever (a) mathematical control on the behaviour of noisy gradient descent on a ReLU gate and (b) proofs of convergence of stochastic and deterministic versions of the widely used adaptive gradient deep-learning algorithms, RMSProp and ADAM. This study also includes a first-of-its-kind detailed empirical study of the hyper-parameter values and neural net architectures when these modern algorithms have a significant advantage over classical acceleration based methods. 3. Understanding The Risk Of (Stochastic) Neural Nets We push forward the emergent technology of PAC-Bayesian bounds for the risk of stochastic neural nets to get bounds which are not only empirically smaller than contemporary theories but also demonstrate smaller rates of growth w.r.t increase in width and depth of the net in experimental tests. These critically depend on our novel theorems proving noise resilience of nets. This work also includes an experimental investigation of the geometric properties of the path in weight space that is traced out by the net during the training. This leads us to uncover certain seemingly uniform and surprising geometric properties of this process which can potentially be leveraged into better bounds in future

    Interpretable Machine Learning for Electro-encephalography

    Get PDF
    While behavioral, genetic and psychological markers can provide important information about brain health, research in that area over the last decades has much focused on imaging devices such as magnetic resonance tomography (MRI) to provide non-invasive information about cognitive processes. Unfortunately, MRI based approaches, able to capture the slow changes in blood oxygenation levels, cannot capture electrical brain activity which plays out on a time scale up to three orders of magnitude faster. Electroencephalography (EEG), which has been available in clinical settings for over 60 years, is able to measure brain activity based on rapidly changing electrical potentials measured non-invasively on the scalp. Compared to MRI based research into neurodegeneration, EEG based research has, over the last decade, received much less interest from the machine learning community. But generally, EEG in combination with sophisticated machine learning offers great potential such that neglecting this source of information, compared to MRI or genetics, is not warranted. In collaborating with clinical experts, the ability to link any results provided by machine learning to the existing body of research is especially important as it ultimately provides an intuitive or interpretable understanding. Here, interpretable means the possibility for medical experts to translate the insights provided by a statistical model into a working hypothesis relating to brain function. To this end, we propose in our first contribution a method allowing for ultra-sparse regression which is applied on EEG data in order to identify a small subset of important diagnostic markers highlighting the main differences between healthy brains and brains affected by Parkinson's disease. Our second contribution builds on the idea that in Parkinson's disease impaired functioning of the thalamus causes changes in the complexity of the EEG waveforms. The thalamus is a small region in the center of the brain affected early in the course of the disease. Furthermore, it is believed that the thalamus functions as a pacemaker - akin to a conductor of an orchestra - such that changes in complexity are expressed and quantifiable based on EEG. We use these changes in complexity to show their association with future cognitive decline. In our third contribution we propose an extension of archetypal analysis embedded into a deep neural network. This generative version of archetypal analysis allows to learn an appropriate representation where every sample of a data set can be decomposed into a weighted sum of extreme representatives, the so-called archetypes. This opens up an interesting possibility of interpreting a data set relative to its most extreme representatives. In contrast, clustering algorithms describe a data set relative to its most average representatives. For Parkinson's disease, we show based on deep archetypal analysis, that healthy brains produce archetypes which are different from those produced by brains affected by neurodegeneration

    Alternating Optimization: Constrained Problems, Adversarial Networks, and Robust Models

    Get PDF
    Data-driven machine learning methods have achieved impressive performance for many industrial applications and academic tasks. Machine learning methods usually have two stages: training a model from large-scale samples, and inference on new samples after the model is deployed. The training of modern models relies on solving difficult optimization problems that involve nonconvex, nondifferentiable objective functions and constraints, which is sometimes slow and often requires expertise to tune hyperparameters. While inference is much faster than training, it is often not fast enough for real-time applications.We focus on machine learning problems that can be formulated as a minimax problem in training, and study alternating optimization methods served as fast, scalable, stable and automated solvers. First, we focus on the alternating direction method of multipliers (ADMM) for constrained problem in classical convex and nonconvex optimization. Some popular machine learning applications including sparse and low-rank models, regularized linear models, total variation image processing, semidefinite programming, and consensus distributed computing. We propose adaptive ADMM (AADMM), which is a fully automated solver achieving fast practical convergence by adapting the only free parameter in ADMM. We further automate several variants of ADMM (relaxed ADMM, multi-block ADMM and consensus ADMM), and prove convergence rate guarantees that are widely applicable to variants of ADMM with changing parameters. We release the fast implementation for more than ten applications and validate the efficiency with several benchmark datasets for each application. Second, we focus on the minimax problem of generative adversarial networks (GAN). We apply prediction steps to stabilize stochastic alternating methods for the training of GANs, and demonstrate advantages of GAN-based losses for image processing tasks. We also propose GAN-based knowledge distillation methods to train small neural networks for inference acceleration, and empirically study the trade-off between acceleration and accuracy.Third, we present preliminary results on adversarial training for robust models. We study fast algorithms for the attack and defense for universal perturbations, and then explore network architectures to boost robustness
    corecore