53 research outputs found

    Discontinuous collocation and symmetric integration methods for distributionally-sourced hyperboloidal partial differential equations

    Full text link
    This work outlines a time-domain numerical integration technique for linear hyperbolic partial differential equations sourced by distributions (Dirac δ\delta-functions and their derivatives). Such problems arise when studying binary black hole systems in the extreme mass ratio limit. We demonstrate that such source terms may be converted to effective domain-wide sources when discretized, and we introduce a class of time-steppers that directly account for these discontinuities in time integration. Moreover, our time-steppers are constructed to respect time reversal symmetry, a property that has been connected to conservation of physical quantities like energy and momentum in numerical simulations. To illustrate the utility of our method, we numerically study a distributionally-sourced wave equation that shares many features with the equations governing linear perturbations to black holes sourced by a point mass.Comment: 29 pages, 4 figures

    Discontinuous collocation methods and gravitational self-force applications

    Full text link
    Numerical simulations of extereme mass ratio inspirals, the mostimportant sources for the LISA detector, face several computational challenges. We present a new approach to evolving partial differential equations occurring in black hole perturbation theory and calculations of the self-force acting on point particles orbiting supermassive black holes. Such equations are distributionally sourced, and standard numerical methods, such as finite-difference or spectral methods, face difficulties associated with approximating discontinuous functions. However, in the self-force problem we typically have access to full a-priori information about the local structure of the discontinuity at the particle. Using this information, we show that high-order accuracy can be recovered by adding to the Lagrange interpolation formula a linear combination of certain jump amplitudes. We construct discontinuous spatial and temporal discretizations by operating on the corrected Lagrange formula. In a method-of-lines framework, this provides a simple and efficient method of solving time-dependent partial differential equations, without loss of accuracy near moving singularities or discontinuities. This method is well-suited for the problem of time-domain reconstruction of the metric perturbation via the Teukolsky or Regge-Wheeler-Zerilli formalisms. Parallel implementations on modern CPU and GPU architectures are discussed.Comment: 29 pages, 5 figure

    Neural Architecture Search as Program Transformation Exploration

    Get PDF
    Improving the performance of deep neural networks (DNNs) is important to both the compiler and neural architecture search (NAS) communities. Compilers apply program transformations in order to exploit hardware parallelism and memory hierarchy. However, legality concerns mean they fail to exploit the natural robustness of neural networks. In contrast, NAS techniques mutate networks by operations such as the grouping or bottlenecking of convolutions, exploiting the resilience of DNNs. In this work, we express such neural architecture operations as program transformations whose legality depends on a notion of representational capacity. This allows them to be combined with existing transformations into a unified optimization framework. This unification allows us to express existing NAS operations as combinations of simpler transformations. Crucially, it allows us to generate and explore new tensor convolutions. We prototyped the combined framework in TVM and were able to find optimizations across different DNNs, that significantly reduce inference time - over 3×\times in the majority of cases. Furthermore, our scheme dramatically reduces NAS search time. Code is available at~\href{https://github.com/jack-willturner/nas-as-program-transformation-exploration}{this https url}

    mlirSynth: Automatic, Retargetable Program Raising in Multi-Level IR using Program Synthesis

    Get PDF
    MLIR is an emerging compiler infrastructure for modern hardware, but existing programs cannot take advantage of MLIR’s high-performance compilation if they are described in lower-level general purpose languages. Consequently, to avoid programs needing to be rewritten manually, this has led to efforts to automatically raise lower-level to higher-level dialects in MLIR. However, current methods rely on manually-defined raising rules, which limit their applicability and make them challenging to maintain as MLIR dialects evolve. We present mlirSynth – a novel approach which translates programs from lower-level MLIR dialects to high-level ones without manually defined rules. Instead, it uses available dialect definitions to construct a program space and searches it effectively using type constraints and equivalences. We demonstrate its effectiveness by raising C programs to two distinct high-level MLIR dialects, which enables us to use existing high-level dialect specific compilation flows. On Polybench, we show a greater coverage than previous approaches, resulting in geomean speedups of 2.5x (Intel) and 3.4x (AMD) over state-of-the-art compilation flows. mlirSynth also enables retargetability to domain-specific accelerators, resulting in a geomean speedup of 21.6x on a TPU

    mlirSynth: Automatic, Retargetable Program Raising in Multi-Level IR using Program Synthesis

    Full text link
    MLIR is an emerging compiler infrastructure for modern hardware, but existing programs cannot take advantage of MLIR's high-performance compilation if they are described in lower-level general purpose languages. Consequently, to avoid programs needing to be rewritten manually, this has led to efforts to automatically raise lower-level to higher-level dialects in MLIR. However, current methods rely on manually-defined raising rules, which limit their applicability and make them challenging to maintain as MLIR dialects evolve. We present mlirSynth -- a novel approach which translates programs from lower-level MLIR dialects to high-level ones without manually defined rules. Instead, it uses available dialect definitions to construct a program space and searches it effectively using type constraints and equivalences. We demonstrate its effectiveness \revi{by raising C programs} to two distinct high-level MLIR dialects, which enables us to use existing high-level dialect specific compilation flows. On Polybench, we show a greater coverage than previous approaches, resulting in geomean speedups of 2.5x (Intel) and 3.4x (AMD) over state-of-the-art compilation flows for the C programming language. mlirSynth also enables retargetability to domain-specific accelerators, resulting in a geomean speedup of 21.6x on a TPU

    SLaDe: A Portable Small Language Model Decompiler for Optimized Assembly

    Full text link
    Decompilation is a well-studied area with numerous high-quality tools available. These are frequently used for security tasks and to port legacy code. However, they regularly generate difficult-to-read programs and require a large amount of engineering effort to support new programming languages and ISAs. Recent interest in neural approaches has produced portable tools that generate readable code. However, to-date such techniques are usually restricted to synthetic programs without optimization, and no models have evaluated their portability. Furthermore, while the code generated may be more readable, it is usually incorrect. This paper presents SLaDe, a Small Language model Decompiler based on a sequence-to-sequence transformer trained over real-world code. We develop a novel tokenizer and exploit no-dropout training to produce high-quality code. We utilize type-inference to generate programs that are more readable and accurate than standard analytic and recent neural approaches. Unlike standard approaches, SLaDe can infer out-of-context types and unlike neural approaches, it generates correct code. We evaluate SLaDe on over 4,000 functions from ExeBench on two ISAs and at two optimizations levels. SLaDe is up to 6 times more accurate than Ghidra, a state-of-the-art, industrial-strength decompiler and up to 4 times more accurate than the large language model ChatGPT and generates significantly more readable code than both
    • …
    corecore