287 research outputs found

    Towards the Explanation of Bayesian Neural Networks

    Get PDF

    Statistical computation with kernels

    Get PDF
    Modern statistical inference has seen a tremendous increase in the size and complexity of models and datasets. As such, it has become reliant on advanced com- putational tools for implementation. A first canonical problem in this area is the numerical approximation of integrals of complex and expensive functions. Numerical integration is required for a variety of tasks, including prediction, model comparison and model choice. A second canonical problem is that of statistical inference for models with intractable likelihoods. These include models with intractable normal- isation constants, or models which are so complex that their likelihood cannot be evaluated, but from which data can be generated. Examples include large graphical models, as well as many models in imaging or spatial statistics. This thesis proposes to tackle these two problems using tools from the kernel methods and Bayesian non-parametrics literature. First, we analyse a well-known algorithm for numerical integration called Bayesian quadrature, and provide consis- tency and contraction rates. The algorithm is then assessed on a variety of statistical inference problems, and extended in several directions in order to reduce its compu- tational requirements. We then demonstrate how the combination of reproducing kernels with Stein’s method can lead to computational tools which can be used with unnormalised densities, including numerical integration and approximation of probability measures. We conclude by studying two minimum distance estimators derived from kernel-based statistical divergences which can be used for unnormalised and generative models. In each instance, the tractability provided by reproducing kernels and their properties allows us to provide easily-implementable algorithms whose theoretical foundations can be studied in depth

    Neural function approximation on graphs: shape modelling, graph discrimination & compression

    Get PDF
    Graphs serve as a versatile mathematical abstraction of real-world phenomena in numerous scientific disciplines. This thesis is part of the Geometric Deep Learning subject area, a family of learning paradigms, that capitalise on the increasing volume of non-Euclidean data so as to solve real-world tasks in a data-driven manner. In particular, we focus on the topic of graph function approximation using neural networks, which lies at the heart of many relevant methods. In the first part of the thesis, we contribute to the understanding and design of Graph Neural Networks (GNNs). Initially, we investigate the problem of learning on signals supported on a fixed graph. We show that treating graph signals as general graph spaces is restrictive and conventional GNNs have limited expressivity. Instead, we expose a more enlightening perspective by drawing parallels between graph signals and signals on Euclidean grids, such as images and audio. Accordingly, we propose a permutation-sensitive GNN based on an operator analogous to shifts in grids and instantiate it on 3D meshes for shape modelling (Spiral Convolutions). Following, we focus on learning on general graph spaces and in particular on functions that are invariant to graph isomorphism. We identify a fundamental trade-off between invariance, expressivity and computational complexity, which we address with a symmetry-breaking mechanism based on substructure encodings (Graph Substructure Networks). Substructures are shown to be a powerful tool that provably improves expressivity while controlling computational complexity, and a useful inductive bias in network science and chemistry. In the second part of the thesis, we discuss the problem of graph compression, where we analyse the information-theoretic principles and the connections with graph generative models. We show that another inevitable trade-off surfaces, now between computational complexity and compression quality, due to graph isomorphism. We propose a substructure-based dictionary coder - Partition and Code (PnC) - with theoretical guarantees that can be adapted to different graph distributions by estimating its parameters from observations. Additionally, contrary to the majority of neural compressors, PnC is parameter and sample efficient and is therefore of wide practical relevance. Finally, within this framework, substructures are further illustrated as a decisive archetype for learning problems on graph spaces.Open Acces

    Advancing efficiency and robustness of neural networks for imaging

    Get PDF
    Enabling machines to see and analyze the world is a longstanding research objective. Advances in computer vision have the potential of influencing many aspects of our lives as they can enable machines to tackle a variety of tasks. Great progress in computer vision has been made, catalyzed by recent progress in machine learning and especially the breakthroughs achieved by deep artificial neural networks. Goal of this work is to alleviate limitations of deep neural networks that hinder their large-scale adoption for real-world applications. To this end, it investigates methodologies for constructing and training deep neural networks with low computational requirements. Moreover, it explores strategies for achieving robust performance on unseen data. Of particular interest is the application of segmenting volumetric medical scans because of the technical challenges it imposes, as well as its clinical importance. The developed methodologies are generic and of relevance to a broader computer vision and machine learning audience. More specifically, this work introduces an efficient 3D convolutional neural network architecture, which achieves high performance for segmentation of volumetric medical images, an application previously hindered by high computational requirements of 3D networks. It then investigates sensitivity of network performance on hyper-parameter configuration, which we interpret as overfitting the model configuration to the data available during development. It is shown that ensembling a set of models with diverse configurations mitigates this and improves generalization. The thesis then explores how to utilize unlabelled data for learning representations that generalize better. It investigates domain adaptation and introduces an architecture for adversarial networks tailored for adaptation of segmentation networks. Finally, a novel semi-supervised learning method is proposed that introduces a graph in the latent space of a neural network to capture relations between labelled and unlabelled samples. It then regularizes the embedding to form a compact cluster per class, which improves generalization.Open Acces

    Colliding Worlds: Modern Computational Methods for Scattering Amplitude Calculations and Responding to Crisis Situations

    Get PDF
    Precision theoretical predictions for high multiplicity scattering rely on the evaluation of increasingly complicated scattering amplitudes which come with an extremely high CPU cost. For state-of-the-art processes this can cause technical bottlenecks in the production of fully differential distributions. In this thesis we explore the possibility of using neural networks to approximate multi-jet scattering amplitudes and provide efficient inputs for Monte Carlo integration. We begin by focussing on QCD corrections to e+e−→≤5e^+e^- \to \leq 5 jets up to one-loop. We demonstrate reliable interpolation when a series of networks are trained on amplitudes that have been divided into sectors defined by their infrared singularity structure. Complete simulations for one-loop distributions show speed improvements of at least an order of magnitude over standard approaches. We extend our analysis to the case of loop-induced diphoton production through gluon fusion and develop a realistic simulation method that can be applied to hadron collider observables. Specifically, we present a detailed study for 2→32\to3 and 2→42\to4 scattering problems which are extremely relevant for future phenomenological studies and find excellent agreement with amplitudes generated using traditional methods. In order to provide a useable technology, we present an interface with the \sherpa~Monte Carlo event generator. The techniques underlying our machine learning methodology and Monte Carlo event generator simulations are widely applicable in other domains as well. In this thesis we will also discuss the use of machine learning to aid in rapid response to crises situations, and the parallels between multi-particle event generators and multi-agent simulations for modelling the spread of epidemics. In this latter case, we develop a new agent-based model with highly granular resolution and discuss its applications to modelling the spread of COVID-19 in England, and in refugee and internally displaced person settlements to aid data driven decision making
    • …
    corecore