53 research outputs found
Discontinuous collocation and symmetric integration methods for distributionally-sourced hyperboloidal partial differential equations
This work outlines a time-domain numerical integration technique for linear
hyperbolic partial differential equations sourced by distributions (Dirac
-functions and their derivatives). Such problems arise when studying
binary black hole systems in the extreme mass ratio limit. We demonstrate that
such source terms may be converted to effective domain-wide sources when
discretized, and we introduce a class of time-steppers that directly account
for these discontinuities in time integration. Moreover, our time-steppers are
constructed to respect time reversal symmetry, a property that has been
connected to conservation of physical quantities like energy and momentum in
numerical simulations. To illustrate the utility of our method, we numerically
study a distributionally-sourced wave equation that shares many features with
the equations governing linear perturbations to black holes sourced by a point
mass.Comment: 29 pages, 4 figures
Discontinuous collocation methods and gravitational self-force applications
Numerical simulations of extereme mass ratio inspirals, the mostimportant
sources for the LISA detector, face several computational challenges. We
present a new approach to evolving partial differential equations occurring in
black hole perturbation theory and calculations of the self-force acting on
point particles orbiting supermassive black holes. Such equations are
distributionally sourced, and standard numerical methods, such as
finite-difference or spectral methods, face difficulties associated with
approximating discontinuous functions. However, in the self-force problem we
typically have access to full a-priori information about the local structure of
the discontinuity at the particle. Using this information, we show that
high-order accuracy can be recovered by adding to the Lagrange interpolation
formula a linear combination of certain jump amplitudes. We construct
discontinuous spatial and temporal discretizations by operating on the
corrected Lagrange formula. In a method-of-lines framework, this provides a
simple and efficient method of solving time-dependent partial differential
equations, without loss of accuracy near moving singularities or
discontinuities. This method is well-suited for the problem of time-domain
reconstruction of the metric perturbation via the Teukolsky or
Regge-Wheeler-Zerilli formalisms. Parallel implementations on modern CPU and
GPU architectures are discussed.Comment: 29 pages, 5 figure
Neural Architecture Search as Program Transformation Exploration
Improving the performance of deep neural networks (DNNs) is important to both
the compiler and neural architecture search (NAS) communities. Compilers apply
program transformations in order to exploit hardware parallelism and memory
hierarchy. However, legality concerns mean they fail to exploit the natural
robustness of neural networks. In contrast, NAS techniques mutate networks by
operations such as the grouping or bottlenecking of convolutions, exploiting
the resilience of DNNs. In this work, we express such neural architecture
operations as program transformations whose legality depends on a notion of
representational capacity. This allows them to be combined with existing
transformations into a unified optimization framework. This unification allows
us to express existing NAS operations as combinations of simpler
transformations. Crucially, it allows us to generate and explore new tensor
convolutions. We prototyped the combined framework in TVM and were able to find
optimizations across different DNNs, that significantly reduce inference time -
over 3 in the majority of cases.
Furthermore, our scheme dramatically reduces NAS search time. Code is
available
at~\href{https://github.com/jack-willturner/nas-as-program-transformation-exploration}{this
https url}
mlirSynth: Automatic, Retargetable Program Raising in Multi-Level IR using Program Synthesis
MLIR is an emerging compiler infrastructure for modern hardware, but existing programs cannot take advantage of MLIR’s high-performance compilation if they are described in lower-level general purpose languages. Consequently, to avoid programs needing to be rewritten manually, this has led to efforts to automatically raise lower-level to higher-level dialects in MLIR. However, current methods rely on manually-defined raising rules, which limit their applicability and make them challenging to maintain as MLIR dialects evolve. We present mlirSynth – a novel approach which translates programs from lower-level MLIR dialects to high-level ones without manually defined rules. Instead, it uses available dialect definitions to construct a program space and searches it effectively using type constraints and equivalences. We demonstrate its effectiveness by raising C programs to two distinct high-level MLIR dialects, which enables us to use existing high-level dialect specific compilation flows. On Polybench, we show a greater coverage than previous approaches, resulting in geomean speedups of 2.5x (Intel) and 3.4x (AMD) over state-of-the-art compilation flows. mlirSynth also enables retargetability to domain-specific accelerators, resulting in a geomean speedup of 21.6x on a TPU
mlirSynth: Automatic, Retargetable Program Raising in Multi-Level IR using Program Synthesis
MLIR is an emerging compiler infrastructure for modern hardware, but existing
programs cannot take advantage of MLIR's high-performance compilation if they
are described in lower-level general purpose languages. Consequently, to avoid
programs needing to be rewritten manually, this has led to efforts to
automatically raise lower-level to higher-level dialects in MLIR. However,
current methods rely on manually-defined raising rules, which limit their
applicability and make them challenging to maintain as MLIR dialects evolve.
We present mlirSynth -- a novel approach which translates programs from
lower-level MLIR dialects to high-level ones without manually defined rules.
Instead, it uses available dialect definitions to construct a program space and
searches it effectively using type constraints and equivalences. We demonstrate
its effectiveness \revi{by raising C programs} to two distinct high-level MLIR
dialects, which enables us to use existing high-level dialect specific
compilation flows. On Polybench, we show a greater coverage than previous
approaches, resulting in geomean speedups of 2.5x (Intel) and 3.4x (AMD) over
state-of-the-art compilation flows for the C programming language. mlirSynth
also enables retargetability to domain-specific accelerators, resulting in a
geomean speedup of 21.6x on a TPU
SLaDe: A Portable Small Language Model Decompiler for Optimized Assembly
Decompilation is a well-studied area with numerous high-quality tools
available. These are frequently used for security tasks and to port legacy
code. However, they regularly generate difficult-to-read programs and require a
large amount of engineering effort to support new programming languages and
ISAs. Recent interest in neural approaches has produced portable tools that
generate readable code. However, to-date such techniques are usually restricted
to synthetic programs without optimization, and no models have evaluated their
portability. Furthermore, while the code generated may be more readable, it is
usually incorrect. This paper presents SLaDe, a Small Language model Decompiler
based on a sequence-to-sequence transformer trained over real-world code. We
develop a novel tokenizer and exploit no-dropout training to produce
high-quality code. We utilize type-inference to generate programs that are more
readable and accurate than standard analytic and recent neural approaches.
Unlike standard approaches, SLaDe can infer out-of-context types and unlike
neural approaches, it generates correct code. We evaluate SLaDe on over 4,000
functions from ExeBench on two ISAs and at two optimizations levels. SLaDe is
up to 6 times more accurate than Ghidra, a state-of-the-art,
industrial-strength decompiler and up to 4 times more accurate than the large
language model ChatGPT and generates significantly more readable code than
both
- …