125 research outputs found

    Bayesian subset simulation

    Full text link
    We consider the problem of estimating a probability of failure α\alpha, defined as the volume of the excursion set of a function f:X⊆Rd→Rf:\mathbb{X} \subseteq \mathbb{R}^{d} \to \mathbb{R} above a given threshold, under a given probability measure on X\mathbb{X}. In this article, we combine the popular subset simulation algorithm (Au and Beck, Probab. Eng. Mech. 2001) and our sequential Bayesian approach for the estimation of a probability of failure (Bect, Ginsbourger, Li, Picheny and Vazquez, Stat. Comput. 2012). This makes it possible to estimate α\alpha when the number of evaluations of ff is very limited and α\alpha is very small. The resulting algorithm is called Bayesian subset simulation (BSS). A key idea, as in the subset simulation algorithm, is to estimate the probabilities of a sequence of excursion sets of ff above intermediate thresholds, using a sequential Monte Carlo (SMC) approach. A Gaussian process prior on ff is used to define the sequence of densities targeted by the SMC algorithm, and drive the selection of evaluation points of ff to estimate the intermediate probabilities. Adaptive procedures are proposed to determine the intermediate thresholds and the number of evaluations to be carried out at each stage of the algorithm. Numerical experiments illustrate that BSS achieves significant savings in the number of function evaluations with respect to other Monte Carlo approaches

    Array-OL Revisited, Multidimensional Intensive Signal Processing Specification

    Get PDF
    This paper presents the Array-OL specification language. It is a high-level visual language dedicated to multidimensional intensive signal processing applications. It allows to specify both the task parallelism and the data parallelism of these applications on focusing on their complex multidimensional data access patterns. This presentation includes several extensions and tools developed around Array-OL during the last few years and discusses the mapping of an Array-OL specification onto a distributed heterogeneous hardware architecture

    Some ways to reduce the space dimension in polyhedra computations

    Full text link

    Tools for efficient Deep Learning

    Get PDF
    In the era of Deep Learning (DL), there is a fast-growing demand for building and deploying Deep Neural Networks (DNNs) on various platforms. This thesis proposes five tools to address the challenges for designing DNNs that are efficient in time, in resources and in power consumption. We first present Aegis and SPGC to address the challenges in improving the memory efficiency of DL training and inference. Aegis makes mixed precision training (MPT) stabler by layer-wise gradient scaling. Empirical experiments show that Aegis can improve MPT accuracy by at most 4\%. SPGC focuses on structured pruning: replacing standard convolution with group convolution (GConv) to avoid irregular sparsity. SPGC formulates GConv pruning as a channel permutation problem and proposes a novel heuristic polynomial-time algorithm. Common DNNs pruned by SPGC have maximally 1\% higher accuracy than prior work. This thesis also addresses the challenges lying in the gap between DNN descriptions and executables by Polygeist for software and POLSCA for hardware. Many novel techniques, e.g. statement splitting and memory partitioning, are explored and used to expand polyhedral optimisation. Polygeist can speed up software execution in sequential and parallel by 2.53 and 9.47 times on Polybench/C. POLSCA achieves 1.5 times speedup over hardware designs directly generated from high-level synthesis on Polybench/C. Moreover, this thesis presents Deacon, a framework that generates FPGA-based DNN accelerators of streaming architectures with advanced pipelining techniques to address the challenges from heterogeneous convolution and residual connections. Deacon provides fine-grained pipelining, graph-level optimisation, and heuristic exploration by graph colouring. Compared with prior designs, Deacon shows resource/power consumption efficiency improvement of 1.2x/3.5x for MobileNets and 1.0x/2.8x for SqueezeNets. All these tools are open source, some of which have already gained public engagement. We believe they can make efficient deep learning applications easier to build and deploy.Open Acces

    Automated cache optimisations of stencil computations for partial differential equations

    Get PDF
    This thesis focuses on numerical methods that solve partial differential equations. Our focal point is the finite difference method, which solves partial differential equations by approximating derivatives with explicit finite differences. These partial differential equation solvers consist of stencil computations on structured grids. Stencils for computing real-world practical applications are patterns often characterised by many memory accesses and non-trivial arithmetic expressions that lead to high computational costs compared to simple stencils used in much prior proof-of-concept work. In addition, the loop nests to express stencils on structured grids may often be complicated. This work is highly motivated by a specific domain of stencil computations where one of the challenges is non-aligned to the structured grid ("off-the-grid") operations. These operations update neighbouring grid points through scatter and gather operations via non-affine memory accesses, such as {A[B[i]]}. In addition to this challenge, these practical stencils often include many computation fields (need to store multiple grid copies), complex data dependencies and imperfect loop nests. In this work, we aim to increase the performance of stencil kernel execution. We study automated cache-memory-dependent optimisations for stencil computations. This work consists of two core parts with their respective contributions.The first part of our work tries to reduce the data movement in stencil computations of practical interest. Data movement is a dominant factor affecting the performance of high-performance computing applications. It has long been a target of optimisations due to its impact on execution time and energy consumption. This thesis tries to relieve this cost by applying temporal blocking optimisations, also known as time-tiling, to stencil computations. Temporal blocking is a well-known technique to enhance data reuse in stencil computations. However, it is rarely used in practical applications but rather in theoretical examples to prove its efficacy. Applying temporal blocking to scientific simulations is more complex. More specifically, in this work, we focus on the application context of seismic and medical imaging. In this area, we often encounter scatter and gather operations due to signal sources and receivers at arbitrary locations in the computational domain. These operations make the application of temporal blocking challenging. We present an approach to overcome this challenge and successfully apply temporal blocking.In the second part of our work, we extend the first part as an automated approach targeting a wide range of simulations modelled with partial differential equations. Since temporal blocking is error-prone, tedious to apply by hand and highly complex to assimilate theoretically and practically, we are motivated to automate its application and automatically generate code that benefits from it. We discuss algorithmic approaches and present a generalised compiler pipeline to automate the application of temporal blocking. These passes are written in the Devito compiler. They are used to accelerate the computation of stencil kernels in areas such as seismic and medical imaging, computational fluid dynamics and machine learning. \href{www.devitoproject.org}{Devito} is a Python package to implement optimised stencil computation (e.g., finite differences, image processing, machine learning) from high-level symbolic problem definitions. Devito builds on \href{www.sympy.org}{SymPy} and employs automated code generation and just-in-time compilation to execute optimised computational kernels on several computer platforms, including CPUs, GPUs, and clusters thereof. We show how we automate temporal blocking code generation without user intervention and often achieve better time-to-solution. We enable domain-specific optimisation through compiler passes and offer temporal blocking gains from a high-level symbolic abstraction. These automated optimisations benefit various computational kernels for solving real-world application problems.Open Acces

    Custom optimization algorithms for efficient hardware implementation

    No full text
    The focus is on real-time optimal decision making with application in advanced control systems. These computationally intensive schemes, which involve the repeated solution of (convex) optimization problems within a sampling interval, require more efficient computational methods than currently available for extending their application to highly dynamical systems and setups with resource-constrained embedded computing platforms. A range of techniques are proposed to exploit synergies between digital hardware, numerical analysis and algorithm design. These techniques build on top of parameterisable hardware code generation tools that generate VHDL code describing custom computing architectures for interior-point methods and a range of first-order constrained optimization methods. Since memory limitations are often important in embedded implementations we develop a custom storage scheme for KKT matrices arising in interior-point methods for control, which reduces memory requirements significantly and prevents I/O bandwidth limitations from affecting the performance in our implementations. To take advantage of the trend towards parallel computing architectures and to exploit the special characteristics of our custom architectures we propose several high-level parallel optimal control schemes that can reduce computation time. A novel optimization formulation was devised for reducing the computational effort in solving certain problems independent of the computing platform used. In order to be able to solve optimization problems in fixed-point arithmetic, which is significantly more resource-efficient than floating-point, tailored linear algebra algorithms were developed for solving the linear systems that form the computational bottleneck in many optimization methods. These methods come with guarantees for reliable operation. We also provide finite-precision error analysis for fixed-point implementations of first-order methods that can be used to minimize the use of resources while meeting accuracy specifications. The suggested techniques are demonstrated on several practical examples, including a hardware-in-the-loop setup for optimization-based control of a large airliner.Open Acces

    Optimal uncertainty quantification of a risk measurement from a computer code

    Get PDF
    La quantification des incertitudes lors d'une étude de sûreté peut être réalisée en modélisant les paramètres d'entrée du système physique par des variables aléatoires. Afin de propager les incertitudes affectant les entrées, un modèle de simulation numérique reproduisant la physique du système est exécuté avec différentes combinaisons des paramètres d'entrée, générées suivant leur loi de probabilité jointe. Il est alors possible d'étudier la variabilité de la sortie du code, ou d'estimer certaines quantités d'intérêt spécifiques. Le code étant considéré comme une boîte noire déterministe, la quantité d'intérêt dépend uniquement du choix de la loi de probabilité des entrées. Toutefois, cette distribution de probabilité est elle-même incertaine. En général, elle est choisie grâce aux avis d'experts, qui sont subjectifs et parfois contradictoires, mais aussi grâce à des données expérimentales souvent en nombre insuffisant et entachées d'erreurs. Cette variabilité dans le choix de la distribution se propage jusqu'à la quantité d'intérêt. Cette thèse traite de la prise en compte de cette incertitude dite de deuxième niveau. L'approche proposée, connue sous le nom d'Optimal Uncertainty Quantification (OUQ) consiste à évaluer des bornes sur la quantité d'intérêt. De ce fait on ne considère plus une distribution fixée, mais un ensemble de mesures de probabilité sous contraintes de moments sur lequel la quantité d'intérêt est optimisée. Après avoir exposé des résultats théoriques visant à réduire l'optimisation de la quantité d'intérêt aux point extrémaux de l'espace de mesures de probabilité, nous présentons différentes quantités d'intérêt vérifiant les hypothèses du problème. Cette thèse illustre l'ensemble de la méthodologie sur plusieurs cas d'applications, l'un d'eux étant un cas réel étudiant l'évolution de la température de gaine du combustible nucléaire en cas de perte du réfrigérant.Uncertainty quantification in a safety analysis study can be conducted by considering the uncertain inputs of a physical system as a vector of random variables. The most widespread approach consists in running a computer model reproducing the physical phenomenon with different combinations of inputs in accordance with their probability distribution. Then, one can study the related uncertainty on the output or estimate a specific quantity of interest (QoI). Because the computer model is assumed to be a deterministic black-box function, the QoI only depends on the choice of the input probability measure. It is formally represented as a scalar function defined on a measure space. We propose to gain robustness on the quantification of this QoI. Indeed, the probability distributions characterizing the uncertain input may themselves be uncertain. For instance, contradictory expert opinion may make it difficult to select a single probability distribution, and the lack of information in the input variables affects inevitably the choice of the distribution. As the uncertainty on the input distributions propagates to the QoI, an important consequence is that different choices of input distributions will lead to different values of the QoI. The purpose of this thesis is to account for this second level uncertainty. We propose to evaluate the maximum of the QoI over a space of probability measures, in an approach known as optimal uncertainty quantification (OUQ). Therefore, we do not specify a single precise input distribution, but rather a set of admissible probability measures defined through moment constraints. The QoI is then optimized over this measure space. After exposing theoretical results showing that the optimization domain of the QoI can be reduced to the extreme points of the measure space, we present several interesting quantities of interest satisfying the assumption of the problem. This thesis illustrates the methodology in several application cases, one of them being a real nuclear engineering case that study the evolution of the peak cladding temperature of fuel rods in case of an intermediate break loss of coolant accident

    Q(sqrt(-3))-Integral Points on a Mordell Curve

    Get PDF
    We use an extension of quadratic Chabauty to number fields,recently developed by the author with Balakrishnan, Besser and M ̈uller,combined with a sieving technique, to determine the integral points overQ(√−3) on the Mordell curve y2 = x3 − 4
    • …
    corecore