6,721 research outputs found
Direct -body code on low-power embedded ARM GPUs
This work arises on the environment of the ExaNeSt project aiming at design
and development of an exascale ready supercomputer with low energy consumption
profile but able to support the most demanding scientific and technical
applications. The ExaNeSt compute unit consists of densely-packed low-power
64-bit ARM processors, embedded within Xilinx FPGA SoCs. SoC boards are
heterogeneous architecture where computing power is supplied both by CPUs and
GPUs, and are emerging as a possible low-power and low-cost alternative to
clusters based on traditional CPUs. A state-of-the-art direct -body code
suitable for astrophysical simulations has been re-engineered in order to
exploit SoC heterogeneous platforms based on ARM CPUs and embedded GPUs.
Performance tests show that embedded GPUs can be effectively used to accelerate
real-life scientific calculations, and that are promising also because of their
energy efficiency, which is a crucial design in future exascale platforms.Comment: 16 pages, 7 figures, 1 table, accepted for publication in the
Computing Conference 2019 proceeding
A formally verified compiler back-end
This article describes the development and formal verification (proof of
semantic preservation) of a compiler back-end from Cminor (a simple imperative
intermediate language) to PowerPC assembly code, using the Coq proof assistant
both for programming the compiler and for proving its correctness. Such a
verified compiler is useful in the context of formal methods applied to the
certification of critical software: the verification of the compiler guarantees
that the safety properties proved on the source code hold for the executable
compiled code as well
A Domain-Specific Language and Editor for Parallel Particle Methods
Domain-specific languages (DSLs) are of increasing importance in scientific
high-performance computing to reduce development costs, raise the level of
abstraction and, thus, ease scientific programming. However, designing and
implementing DSLs is not an easy task, as it requires knowledge of the
application domain and experience in language engineering and compilers.
Consequently, many DSLs follow a weak approach using macros or text generators,
which lack many of the features that make a DSL a comfortable for programmers.
Some of these features---e.g., syntax highlighting, type inference, error
reporting, and code completion---are easily provided by language workbenches,
which combine language engineering techniques and tools in a common ecosystem.
In this paper, we present the Parallel Particle-Mesh Environment (PPME), a DSL
and development environment for numerical simulations based on particle methods
and hybrid particle-mesh methods. PPME uses the meta programming system (MPS),
a projectional language workbench. PPME is the successor of the Parallel
Particle-Mesh Language (PPML), a Fortran-based DSL that used conventional
implementation strategies. We analyze and compare both languages and
demonstrate how the programmer's experience can be improved using static
analyses and projectional editing. Furthermore, we present an explicit domain
model for particle abstractions and the first formal type system for particle
methods.Comment: Submitted to ACM Transactions on Mathematical Software on Dec. 25,
201
Relay: A New IR for Machine Learning Frameworks
Machine learning powers diverse services in industry including search,
translation, recommendation systems, and security. The scale and importance of
these models require that they be efficient, expressive, and portable across an
array of heterogeneous hardware devices. These constraints are often at odds;
in order to better accommodate them we propose a new high-level intermediate
representation (IR) called Relay. Relay is being designed as a
purely-functional, statically-typed language with the goal of balancing
efficient compilation, expressiveness, and portability. We discuss the goals of
Relay and highlight its important design constraints. Our prototype is part of
the open source NNVM compiler framework, which powers Amazon's deep learning
framework MxNet
Liveness-Driven Random Program Generation
Randomly generated programs are popular for testing compilers and program
analysis tools, with hundreds of bugs in real-world C compilers found by random
testing. However, existing random program generators may generate large amounts
of dead code (computations whose result is never used). This leaves relatively
little code to exercise a target compiler's more complex optimizations.
To address this shortcoming, we introduce liveness-driven random program
generation. In this approach the random program is constructed bottom-up,
guided by a simultaneous structural data-flow analysis to ensure that the
generator never generates dead code.
The algorithm is implemented as a plugin for the Frama-C framework. We
evaluate it in comparison to Csmith, the standard random C program generator.
Our tool generates programs that compile to more machine code with a more
complex instruction mix.Comment: Pre-proceedings paper presented at the 27th International Symposium
on Logic-Based Program Synthesis and Transformation (LOPSTR 2017), Namur,
Belgium, 10-12 October 2017 (arXiv:1708.07854
Certifying cost annotations in compilers
We discuss the problem of building a compiler which can lift in a provably
correct way pieces of information on the execution cost of the object code to
cost annotations on the source code. To this end, we need a clear and flexible
picture of: (i) the meaning of cost annotations, (ii) the method to prove them
sound and precise, and (iii) the way such proofs can be composed. We propose a
so-called labelling approach to these three questions. As a first step, we
examine its application to a toy compiler. This formal study suggests that the
labelling approach has good compositionality and scalability properties. In
order to provide further evidence for this claim, we report our successful
experience in implementing and testing the labelling approach on top of a
prototype compiler written in OCAML for (a large fragment of) the C language
BEEBS: Open Benchmarks for Energy Measurements on Embedded Platforms
This paper presents and justifies an open benchmark suite named BEEBS,
targeted at evaluating the energy consumption of embedded processors.
We explore the possible sources of energy consumption, then select individual
benchmarks from contemporary suites to cover these areas. Version one of BEEBS
is presented here and contains 10 benchmarks that cover a wide range of typical
embedded applications. The benchmark suite is portable across diverse
architectures and is freely available.
The benchmark suite is extensively evaluated, and the properties of its
constituent programs are analysed. Using real hardware platforms we show case
examples which illustrate the difference in power dissipation between three
processor architectures and their related ISAs. We observe significant
differences in the average instruction dissipation between the architectures of
4.4x, specifically 170uW/MHz (ARM Cortex-M0), 65uW/MHz (Adapteva Epiphany) and
88uW/MHz (XMOS XS1-L1)
- …