Search CORE

12 research outputs found

Codegrößenreduktion eingebetteter Systeme durch kombiniertes In- und Exlining

Author: Imhoff Peter
Publication venue
Publication date: 01/06/2005
Field of study

Eldorado - Ressourcen aus und für Lehre, Studium und Forschung

Simple optimizing JIT compilation of higher-order dynamic programming languages

Author: Saleil Baptiste
Publication venue
Publication date: 01/05/2019
Field of study

Implémenter efficacement les langages de programmation dynamiques demande beaucoup d’effort de développement. Les compilateurs ne cessent de devenir de plus en plus complexes. Aujourd’hui, ils incluent souvent une phase d’interprétation, plusieurs phases de compilation, plusieurs représentations intermédiaires et des analyses de code. Toutes ces techniques permettent d’implémenter efficacement un langage de programmation dynamique, mais leur mise en oeuvre est difficile dans un contexte où les ressources de développement sont limitées. Nous proposons une nouvelle approche et de nouvelles techniques dynamiques permettant de développer des compilateurs performants pour les langages dynamiques avec de relativement bonnes performances et un faible effort de développement. Nous présentons une approche simple de compilation à la volée qui permet d’implémenter un langage en une seule phase de compilation, sans transformation vers des représentations intermédiaires. Nous expliquons comment le versionnement de blocs de base, une technique de compilation existante, peut être étendue, sans effort de développement significatif, pour fonctionner interprocéduralement avec les langages de programmation d’ordre supérieur, permettant d’appliquer des optimisations interprocédurales sur ces langages. Nous expliquons également comment le versionnement de blocs de base permet de supprimer certaines opérations utilisées pour implémenter les langages dynamiques et qui impactent les performances comme les vérifications de type. Nous expliquons aussi comment les compilateurs peuvent exploiter les représentations dynamiques des valeurs par Tagging et NaN-boxing pour optimiser le code généré avec peu d’effort de développement. Nous présentons également notre expérience de développement d’un compilateur à la volée pour le langage de programmation Scheme, pour montrer que ces techniques permettent effectivement de construire un compilateur avec un effort moins important que les compilateurs actuels et qu’elles permettent de générer du code efficace, qui rivalise avec les meilleures implémentations du langage Scheme.Efficiently implementing dynamic programming languages requires a significant development effort. Over the years, compilers have become more complex. Today, they typically include an interpretation phase, several compilation phases, several intermediate representations and code analyses. These techniques allow efficiently implementing these programming languages but are difficult to implement in contexts in which development resources are limited. We propose a new approach and new techniques to build optimizing just-in-time compilers for dynamic languages with relatively good performance and low development effort. We present a simple just-in-time compilation approach to implement a language with a single compilation phase, without the need to use code transformations to intermediate representations. We explain how basic block versioning, an existing compilation technique, can be extended without significant development effort, to work interprocedurally with higherorder programming languages allowing interprocedural optimizations on these languages. We also explain how basic block versioning allows removing operations used to implement dynamic languages that degrade performance, such as type checks, and how compilers can use Tagging and NaN-boxing to optimize the generated code with low development effort. We present our experience of building a JIT compiler using these techniques for the Scheme programming language to show that they indeed allow building compilers with less development effort than other implementations and that they allow generating efficient code that competes with current mature implementations of the Scheme language

Dépôt Institutionnel Numérique

Design and optimisation of scientific programs in a categorical language

Author: Ashby Thomas James
Publication venue: The University of Edinburgh
Publication date: 01/01/2005
Field of study

This thesis presents an investigation into the use of advanced computer languages for scientific computing, an examination of performance issues that arise from using such languages for such a task, and a step toward achieving portable performance from compilers by attacking these problems in a way that compensates for the complexity of and differences between modern computer architectures. The language employed is Aldor, a functional language from computer algebra, and the scientific computing area is a subset of the family of iterative linear equation solvers applied to sparse systems. The linear equation solvers that are considered have much common structure, and this is factored out and represented explicitly in the lan-guage as a framework, by means of categories and domains. The flexibility introduced by decomposing the algorithms and the objects they act on into separate modules has a strong performance impact due to its negative effect on temporal locality. This necessi-tates breaking the barriers between modules to perform cross-component optimisation. In this instance the task reduces to one of collective loop fusion and array contrac

CiteSeerX

Edinburgh Research Archive

An architecture for interpreted dynamic object-oriented languages

Author: Kind Andreas
Publication venue
Publication date: 01/01/1998
Field of study

e code refers to the representation of bytecodes as constant C arrays that are located in sharable text segments after compilation. Interoperability, application start-up and dynamic memory usage benefit from this representation. Indexed code threading addresses the performance problem with virtual instruction mapping (i.e. loading, decoding and invoking) by using a fast threaded instruction transfer. Unlike with standard code threading, virtual machine code remains compact and executable also with a non-threaded virtual machine emulator. A further performance boost is achieved with optimal virtual instruction ordering. This technique helps to cluster the native code implementing virtual instructions so that native instruction cache performance is increased. Finally, the efficiency problem involved with dynamic method lookup is alleviated with an inline caching scheme that is applicable with constant bytecode vectors. The scheme exploits type locality similar to polymorphic inline cac

CiteSeerX

OPUS

Analysing and Reducing Costs of Deep Learning Compiler Auto-tuning

Author: Borowiec Damian
Publication venue: Lancaster University
Publication date: 01/03/2023
Field of study

Deep Learning (DL) is significantly impacting many industries, including automotive, retail and medicine, enabling autonomous driving, recommender systems and genomics modelling, amongst other applications. At the same time, demand for complex and fast DL models is continually growing. The most capable models tend to exhibit highest operational costs, primarily due to their large computational resource footprint and inefficient utilisation of computational resources employed by DL systems. In an attempt to tackle these problems, DL compilers and auto-tuners emerged, automating the traditionally manual task of DL model performance optimisation. While auto-tuning improves model inference speed, it is a costly process, which limits its wider adoption within DL deployment pipelines. The high operational costs associated with DL auto-tuning have multiple causes. During operation, DL auto-tuners explore large search spaces consisting of billions of tensor programs, to propose potential candidates that improve DL model inference latency. Subsequently, DL auto-tuners measure candidate performance in isolation on the target-device, which constitutes the majority of auto-tuning compute-time. Suboptimal candidate proposals, combined with their serial measurement in an isolated target-device lead to prolonged optimisation time and reduced resource availability, ultimately reducing cost-efficiency of the process. In this thesis, we investigate the reasons behind prolonged DL auto-tuning and quantify their impact on the optimisation costs, revealing directions for improved DL auto-tuner design. Based on these insights, we propose two complementary systems: Trimmer and DOPpler. Trimmer improves tensor program search efficacy by filtering out poorly performing candidates, and controls end-to-end auto-tuning using cost objectives, monitoring optimisation cost. Simultaneously, DOPpler breaks long-held assumptions about the serial candidate measurements by successfully parallelising them intra-device, with minimal penalty to optimisation quality. Through extensive experimental evaluation of both systems, we demonstrate that they significantly improve cost-efficiency of autotuning (up to 50.5%) across a plethora of tensor operators, DL models, auto-tuners and target-devices

Lancaster E-Prints

Flow-directed Inlining

Author: Andrew Wright
Suresh Jagannathan
Publication venue
Publication date: 01/01/1996
Field of study

A flow-directed inlining strategy uses information derived from control-flow analysis to specialize and inline procedures for functional and object-oriented languages. Since it uses control-flow analysis to identify candidate call sites, flowdirected inlining can inline procedures whose relationships to their call sites are not apparent. For instance, procedures defined in other modules, passed as arguments, returned as values, or extracted from data structures can all be inlined. Flow-directed inlining specializes procedures for particular call sites, and can selectively inline a particular procedure at some call sites but not at others. Finally, flow-directed inlining encourages modular implementations: control-flow analysis, inlining, and post-inlining optimizations are all orthogonal components. Results from a prototype implementation indicate that this strategy effectively reduces procedure call overhead and leads to significant reduction in execution time. 1 Introduction Functio..

CiteSeerX

Flow-directed inlining

Author: Andrew Wright
AP EL A
BARENDREGT H.
DYBVIG
GREENGARD L.
JAGANNATHAN S.
NER R.
PANDE H. D.
SHIVERS O.
SUN MICROSYSTEMS INC
Suresh Jagannathan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref