107 research outputs found
A Survey of Satisfiability Modulo Theory
Satisfiability modulo theory (SMT) consists in testing the satisfiability of
first-order formulas over linear integer or real arithmetic, or other theories.
In this survey, we explain the combination of propositional satisfiability and
decision procedures for conjunctions known as DPLL(T), and the alternative
"natural domain" approaches. We also cover quantifiers, Craig interpolants,
polynomial arithmetic, and how SMT solvers are used in automated software
analysis.Comment: Computer Algebra in Scientific Computing, Sep 2016, Bucharest,
Romania. 201
MPFR: A Multiple-Precision Binary Floating-Point Library With Correct Rounding
This paper presents a multiple-precision binary floating-point library, written in the ISO C language, and based on the GNU MP library. Its particularity is to extend ideas from the IEEE-754 standard to arbitrary precision, by providing correct rounding and exceptions. We demonstrate how these strong semantics are achieved | with no signicant slowdown with respect to other tools | and discuss a few applications where such a library can be useful
Warping Cache Simulation of Polyhedral Programs
Techniques to evaluate a program’s cache performance fall
into two camps: 1. Traditional trace-based cache simulators
precisely account for sophisticated real-world cache models
and support arbitrary workloads, but their runtime is proportional to the number of memory accesses performed by
the program under analysis. 2. Relying on implicit workload
characterizations such as the polyhedral model, analytical approaches often achieve problem-size-independent runtimes,
but so far have been limited to idealized cache models.
We introduce a hybrid approach, warping cache simulation, that aims to achieve applicability to real-world cache
models and problem-size-independent runtimes. As prior
analytical approaches, we focus on programs in the polyhedral model, which allows to reason about the sequence
of memory accesses analytically. Combining this analytical
reasoning with information about the cache behavior obtained from explicit cache simulation allows us to soundly
fast-forward the simulation. By this process of warping, we
accelerate the simulation so that its cost is often independent
of the number of memory accesses
On the logical complexity of cyclic arithmetic
We study the logical complexity of proofs in cyclic arithmetic
(), as introduced in Simpson '17, in terms of quantifier
alternations of formulae occurring. Writing for (the logical
consequences of) cyclic proofs containing only formulae, our main
result is that and prove the same
theorems, for all . Furthermore, due to the 'uniformity' of our
method, we also show that and Peano Arithmetic ()
proofs of the same theorem differ only exponentially in size.
The inclusion is obtained by proof
theoretic techniques, relying on normal forms and structural manipulations of
proofs. It improves upon the natural result that is
contained in . The converse inclusion, , is obtained by calibrating the approach of Simpson '17 with
recent results on the reverse mathematics of B\"uchi's theorem in
Ko{\l}odziejczyk, Michalewski, Pradic & Skrzypczak '16 (KMPS'16), and
specialising to the case of cyclic proofs. These results improve upon the
bounds on proof complexity and logical complexity implicit in Simpson '17 and
also an alternative approach due to Berardi & Tatsuta '17.
The uniformity of our method also allows us to recover a metamathematical
account of fragments of ; in particular we show that, for , the consistency of is provable in but not
. As a result, we show that certain versions of McNaughton's
theorem (the determinisation of -word automata) are not provable in
, partially resolving an open problem from KMPS '16
Optimisations arithmétiques et synthèse de haut niveau
High-level synthesis (HLS) tools offer increased productivity regarding FPGA programming.However, due to their relatively young nature, they still lack many arithmetic optimizations.This thesis proposes safe arithmetic optimizations that should always be applied.These optimizations are simple operator specializations, following the C semantic.Other require to a lift the semantic embedded in high-level input program languages, which are inherited from software programming, for an improved accuracy/cost/performance ratio.To demonstrate this claim, the sum-of-product of floating-point numbers is used as a case study. The sum is performed on a fixed-point format, which is tailored to the application, according to the context in which the operator is instantiated.In some cases, there is not enough information about the input data to tailor the fixed-point accumulator.The fall-back strategy used in this thesis is to generate an accumulator covering the entire floating-point range.This thesis explores different strategies for implementing such a large accumulator, including new ones.The use of a 2's complement representation instead of a sign+magnitude is demonstrated to save resources and to reduce the accumulation loop delay.Based on a tapered precision scheme and an exact accumulator, the posit number systems claims to be a candidate to replace the IEEE floating-point format.A throughout analysis of posit operators is performed, using the same level of hardware optimization as state-of-the-art floating-point operators.Their cost remains much higher that their floating-point counterparts in terms of resource usage and performance. Finally, this thesis presents a compatibility layer for HLS tools that allows one code to be deployed on multiple tools.This library implements a strongly typed custom size integer type along side a set of optimized custom operators.À cause de la nature relativement jeune des outils de synthèse de haut-niveau (HLS), de nombreuses optimisations arithmétiques n'y sont pas encore implémentées. Cette thèse propose des optimisations arithmétiques se servant du contexte spécifique dans lequel les opérateurs sont instanciés.Certaines optimisations sont de simples spécialisations d'opérateurs, respectant la sémantique du C.D'autres nécéssitent de s'éloigner de cette sémantique pour améliorer le compromis précision/coût/performance.Cette proposition est démontré sur des sommes de produits de nombres flottants.La somme est réalisée dans un format en virgule-fixe défini par son contexte.Quand trop peu d’informations sont disponibles pour définir ce format en virgule-fixe, une stratégie est de générer un accumulateur couvrant l'intégralité du format flottant.Cette thèse explore plusieurs implémentations d'un tel accumulateur.L'utilisation d'une représentation en complément à deux permet de réduire le chemin critique de la boucle d'accumulation, ainsi que la quantité de ressources utilisées. Un format alternatif aux nombres flottants, appelé posit, propose d'utiliser un encodage à précision variable.De plus, ce format est augmenté par un accumulateur exact.Pour évaluer précisément le coût matériel de ce format, cette thèse présente des architectures d'opérateurs posits, implémentés avec le même degré d'optimisation que celui de l'état de l'art des opérateurs flottants.Une analyse détaillée montre que le coût des opérateurs posits est malgré tout bien plus élevé que celui de leurs équivalents flottants.Enfin, cette thèse présente une couche de compatibilité entre outils de HLS, permettant de viser plusieurs outils avec un seul code. Cette bibliothèque implémente un type d'entiers de taille variable, avec de plus une sémantique strictement typée, ainsi qu'un ensemble d'opérateurs ad-hoc optimisés
Pipelined Asynchronous High Level Synthesis for General Programs
High-level synthesis (HLS) translates algorithms from software programming language into hardware. We use the dataflow HLS methodology to translate programs into asynchronous circuits by implementing programs using asynchronous dataflow elements as hardware building blocks. We extend the prior work in dataflow synthesis in the following aspects:i) we propose Fluid to synthesize pipelined dataflow circuits for real-world programs with complex control flows, which are not supported in the previous work; ii) we propose PipeLink to permit pipelined access to shared resources in the dataflow circuit. Dataflow circuit results in distributed control and an implicitly pipelined implementation. However, resource sharing in the presence of pipelining is challenging in this context due to the absence of a global scheduler. Traditional solutions to this problem impose restrictions on pipelining to guarantee mutually exclusive access to the shared resource, but PipeLink removes such restrictions and can generate pipelined asynchronous dataflow circuits for shared function calls, pipelined memory accesses and function pointers; iii) we apply several dataflow optimizations to improve the quality of the synthesized dataflow circuits; iv) we implement our system (Fluid + PipeLink) on the LLVM compiler framework, which allows us to take advantage of the optimization efforts from the compiler community; v) we compare our system with a widely-used academic HLS tool and two commercial HLS tools. Compared to commercial (academic) HLS tools, our system results in 12X (20X) reduction in energy, 1.29X (1.64X) improvement in throughput, 1.27X (1.61X) improvement in latency at a cost of 2.4X (1.61X) increase in the area
Accuracy and performance of the lattice Boltzmann method with 64-bit, 32-bit, and customized 16-bit number formats
Fluid dynamics simulations with the lattice Boltzmann method (LBM) are very
memory-intensive. Alongside reduction in memory footprint, significant
performance benefits can be achieved by using FP32 (single) precision compared
to FP64 (double) precision, especially on GPUs. Here, we evaluate the
possibility to use even FP16 and Posit16 (half) precision for storing fluid
populations, while still carrying arithmetic operations in FP32. For this, we
first show that the commonly occurring number range in the LBM is a lot smaller
than the FP16 number range. Based on this observation, we develop novel 16-bit
formats - based on a modified IEEE-754 and on a modified Posit standard - that
are specifically tailored to the needs of the LBM. We then carry out an
in-depth characterization of LBM accuracy for six different test systems with
increasing complexity: Poiseuille flow, Taylor-Green vortices, Karman vortex
streets, lid-driven cavity, a microcapsule in shear flow (utilizing the
immersed-boundary method) and finally the impact of a raindrop (based on a
Volume-of-Fluid approach). We find that the difference in accuracy between FP64
and FP32 is negligible in almost all cases, and that for a large number of
cases even 16-bit is sufficient. Finally, we provide a detailed performance
analysis of all precision levels on a large number of hardware
microarchitectures and show that significant speedup is achieved with mixed
FP32/16-bit.Comment: 30 pages, 20 figures, 4 tables, 2 code listing
- …