1,485 research outputs found

    Guided rewriting and constraint satisfaction for parallel GPU code generation

    Get PDF
    Graphics Processing Units (GPUs) are notoriously hard to optimise for manually due to their scheduling and memory hierarchies. What is needed are good automatic code generators and optimisers for such parallel hardware. Functional approaches such as Accelerate, Futhark and LIFT leverage a high-level algorithmic Intermediate Representation (IR) to expose parallelism and abstract the implementation details away from the user. However, producing efficient code for a given accelerator remains challenging. Existing code generators depend on the user input to choose a subset of hard-coded optimizations or automated exploration of implementation search space. The former suffers from the lack of extensibility, while the latter is too costly due to the size of the search space. A hybrid approach is needed, where a space of valid implementations is built automatically and explored with the aid of human expertise. This thesis presents a solution combining user-guided rewriting and automatically generated constraints to produce high-performance code. The first contribution is an automatic tuning technique to find a balance between performance and memory consumption. Leveraging its functional patterns, the LIFT compiler is empowered to infer tuning constraints and limit the search to valid tuning combinations only. Next, the thesis reframes parallelisation as a constraint satisfaction problem. Parallelisation constraints are extracted automatically from the input expression, and a solver is used to identify valid rewriting. The constraints truncate the search space to valid parallel mappings only by capturing the scheduling restrictions of the GPU in the context of a given program. A synchronisation barrier insertion technique is proposed to prevent data races and improve the efficiency of the generated parallel mappings. The final contribution of this thesis is the guided rewriting method, where the user encodes a design space of structural transformations using high-level IR nodes called rewrite points. These strongly typed pragmas express macro rewrites and expose design choices as explorable parameters. The thesis proposes a small set of reusable rewrite points to achieve tiling, cache locality, data reuse and memory optimisation. A comparison with the vendor-provided handwritten kernel ARM Compute Library and the TVM code generator demonstrates the effectiveness of this thesis' contributions. With convolution as a use case, LIFT-generated direct and GEMM-based convolution implementations are shown to perform on par with the state-of-the-art solutions on a mobile GPU. Overall, this thesis demonstrates that a functional IR yields well to user-guided and automatic rewriting for high-performance code generation

    Special Delivery: Programming with Mailbox Types (Extended Version)

    Full text link
    The asynchronous and unidirectional communication model supported by mailboxes is a key reason for the success of actor languages like Erlang and Elixir for implementing reliable and scalable distributed systems. While many actors may send messages to some actor, only the actor may (selectively) receive from its mailbox. Although actors eliminate many of the issues stemming from shared memory concurrency, they remain vulnerable to communication errors such as protocol violations and deadlocks. Mailbox types are a novel behavioural type system for mailboxes first introduced for a process calculus by de'Liguoro and Padovani in 2018, which capture the contents of a mailbox as a commutative regular expression. Due to aliasing and nested evaluation contexts, moving from a process calculus to a programming language is challenging. This paper presents Pat, the first programming language design incorporating mailbox types, and describes an algorithmic type system. We make essential use of quasi-linear typing to tame some of the complexity introduced by aliasing. Our algorithmic type system is necessarily co-contextual, achieved through a novel use of backwards bidirectional typing, and we prove it sound and complete with respect to our declarative type system. We implement a prototype type checker, and use it to demonstrate the expressiveness of Pat on a factory automation case study and a series of examples from the Savina actor benchmark suite.Comment: Extended version of paper accepted to ICFP'2

    Static Analysis of NumPy Programs

    Get PDF
    NumPy programs can be hard to debug. Due to the dynamic nature of Python, a bug can manifest itself after a long time of run time. This causes the computation to crash, ditching all the progress. Existing static analysis tools can't detect NumPy-specific errors. We propose a solution that uses data-flow analysis combined with symbolic execution to detect ndarray shape mismatch errors. With a dynamic set of symbols, our method tracks ndarray dimensions and constraints between them throughout the program. It uses an SMT solver to solve the constraints and locate the bug. Our implementation understands core NumPy constructs and detects some shape mismatch errors for 1D and 2D ndarrays.NumPy programy se těžko ladí. Kvůli dynamické povaze Pythonu se chyba často projeví až poté, co program delší dobu běží. Výpočet poté spadne a všechen výpočet je ztracen. Existující nástroje statické analýzy nedokážou poznat chyby specifické pro NumPy. Použili jsme data-flow analýzu zkombi- novanou se symbolickým vykonáváním programu k detekování chyb plynoucí z nevyhovujících tvarů matic. Naše metoda pomocí dynamické množiny sym- bolů sleduje ve vstupním program rozměry matic a vztahy mezi nimi. Ná- sledně pomocí SMT vyhodnotí, jestli jsou vztahy splnitelné, nebo jestli došlo k chybě a kde. Naše implementace rozumí základním NumPy konstrukcím a detekuje některé chyby pro pole a matice.Department of Distributed and Dependable SystemsKatedra distribuovaných a spolehlivých systémůFaculty of Mathematics and PhysicsMatematicko-fyzikální fakult

    Inference of Resource Management Specifications

    Full text link
    A resource leak occurs when a program fails to free some finite resource after it is no longer needed. Such leaks are a significant cause of real-world crashes and performance problems. Recent work proposed an approach to prevent resource leaks based on checking resource management specifications. A resource management specification expresses how the program allocates resources, passes them around, and releases them; it also tracks the ownership relationship between objects and resources, and aliasing relationships between objects. While this specify-and-verify approach has several advantages compared to prior techniques, the need to manually write annotations presents a significant barrier to its practical adoption. This paper presents a novel technique to automatically infer a resource management specification for a program, broadening the applicability of specify-and-check verification for resource leaks. Inference in this domain is challenging because resource management specifications differ significantly in nature from the types that most inference techniques target. Further, for practical effectiveness, we desire a technique that can infer the resource management specification intended by the developer, even in cases when the code does not fully adhere to that specification. We address these challenges through a set of inference rules carefully designed to capture real-world coding patterns, yielding an effective fixed-point-based inference algorithm. We have implemented our inference algorithm in two different systems, targeting programs written in Java and C#. In an experimental evaluation, our technique inferred 85.5% of the annotations that programmers had written manually for the benchmarks. Further, the verifier issued nearly the same rate of false alarms with the manually-written and automatically-inferred annotations

    Revisiting Language Support for Generic Programming: When Genericity Is a Core Design Goal

    Get PDF
    ContextGeneric programming, as defined by Stepanov, is a methodology for writing efficient and reusable algorithms by considering only the required properties of their underlying data types and operations. Generic programming has proven to be an effective means of constructing libraries of reusable software components in languages that support it. Generics-related language design choices play a major role in how conducive generic programming is in practice.InquirySeveral mainstream programming languages (e.g. Java and C++) were first created without generics; features to support generic programming were added later, gradually. Much of the existing literature on supporting generic programming focuses thus on retrofitting generic programming into existing languages and identifying related implementation challenges. Is the programming experience significantly better, or different when programming with a language designed for generic programming without limitations from prior language design choices?ApproachWe examine Magnolia, a language designed to embody generic programming. Magnolia is representative of an approach to language design rooted in algebraic specifications. We repeat a well-known experiment, where we put Magnolia’s generic programming facilities under scrutiny by implementing a subset of the Boost Graph Library, and reflect on our development experience.KnowledgeWe discover that the idioms identified as key features for supporting Stepanov-style generic programming in the previous studies and work on the topic do not tell a full story. We clarify which of them are more of a means to an end, rather than fundamental features for supporting generic programming. Based on the development experience with Magnolia, we identify variadics as an additional key feature for generic programming and point out limitations and challenges of genericity by property.GroundingOur work uses a well-known framework for evaluating the generic programming facilities of a language from the literature to evaluate the algebraic approach through Magnolia, and we draw comparisons with well-known programming languages.ImportanceThis work gives a fresh perspective on generic programming, and clarifies what are fundamental language properties and their trade-offs when considering supporting Stepanov-style generic programming. The understanding of how to set the ground for generic programming will inform future language design.</p

    Tools for efficient Deep Learning

    Get PDF
    In the era of Deep Learning (DL), there is a fast-growing demand for building and deploying Deep Neural Networks (DNNs) on various platforms. This thesis proposes five tools to address the challenges for designing DNNs that are efficient in time, in resources and in power consumption. We first present Aegis and SPGC to address the challenges in improving the memory efficiency of DL training and inference. Aegis makes mixed precision training (MPT) stabler by layer-wise gradient scaling. Empirical experiments show that Aegis can improve MPT accuracy by at most 4\%. SPGC focuses on structured pruning: replacing standard convolution with group convolution (GConv) to avoid irregular sparsity. SPGC formulates GConv pruning as a channel permutation problem and proposes a novel heuristic polynomial-time algorithm. Common DNNs pruned by SPGC have maximally 1\% higher accuracy than prior work. This thesis also addresses the challenges lying in the gap between DNN descriptions and executables by Polygeist for software and POLSCA for hardware. Many novel techniques, e.g. statement splitting and memory partitioning, are explored and used to expand polyhedral optimisation. Polygeist can speed up software execution in sequential and parallel by 2.53 and 9.47 times on Polybench/C. POLSCA achieves 1.5 times speedup over hardware designs directly generated from high-level synthesis on Polybench/C. Moreover, this thesis presents Deacon, a framework that generates FPGA-based DNN accelerators of streaming architectures with advanced pipelining techniques to address the challenges from heterogeneous convolution and residual connections. Deacon provides fine-grained pipelining, graph-level optimisation, and heuristic exploration by graph colouring. Compared with prior designs, Deacon shows resource/power consumption efficiency improvement of 1.2x/3.5x for MobileNets and 1.0x/2.8x for SqueezeNets. All these tools are open source, some of which have already gained public engagement. We believe they can make efficient deep learning applications easier to build and deploy.Open Acces

    Myths and Legends in High-Performance Computing

    Full text link
    In this thought-provoking article, we discuss certain myths and legends that are folklore among members of the high-performance computing community. We gathered these myths from conversations at conferences and meetings, product advertisements, papers, and other communications such as tweets, blogs, and news articles within and beyond our community. We believe they represent the zeitgeist of the current era of massive change, driven by the end of many scaling laws such as Dennard scaling and Moore's law. While some laws end, new directions are emerging, such as algorithmic scaling or novel architecture research. Nevertheless, these myths are rarely based on scientific facts, but rather on some evidence or argumentation. In fact, we believe that this is the very reason for the existence of many myths and why they cannot be answered clearly. While it feels like there should be clear answers for each, some may remain endless philosophical debates, such as whether Beethoven was better than Mozart. We would like to see our collection of myths as a discussion of possible new directions for research and industry investment

    Exploring annotations for deductive verification

    Get PDF

    Zur Plastizität von sozio-emotionalen Kompetenzen auf Verhaltens- und Gehirnebene: Eine EEG-begleitete Trainingsstudie bei Vorschulkindern mittels des computergestützten Trainingsprogramms Zirkus Empathico

    Get PDF
    Die Förderung funktionaler sozio-emotionaler Kompetenz in der Vorschulzeit (Altersspanne 3 bis 6 Jahre) ist von entscheidender Bedeutung, um der Entstehung psychischer Störungen vorzubeugen. Bislang gibt es nur wenige Studien, die die Auswirkungen digitaler Trainings auf die sozio-emotionale Entwicklung von Vorschulkindern untersuchen. Ebenso liefert die Forschung umfangreiche Informationen über typisches sozio-emotionales Verhalten bei Vorschulkindern, während weniger darüber bekannt ist, wie das Gehirn diese Funktionen umsetzt. Ziel der Dissertation war es daher, grundlegende und komplexe Aspekte der sozio-emotionalen Kompetenz von Vorschulkindern zu untersuchen, indem ihre Reife und Trainierbarkeit mit Verhaltens- und neuronalen Maßen erfasst wurden. In den Studien 1 und 2 wurden ereigniskorrelierte Potenziale und die Fast Periodic Visual Stimulation Methode eingesetzt, um neuronale Mechanismen der Emotionserkennung zu quantifizieren. Beide Studien ergaben das Vorhandensein grundlegender Mechanismen der Emotionserkennung in dieser Altersgruppe. Darüber hinaus zeigten Vorschulkinder einen Verarbeitungsvorteil von fröhlichen gegenüber ärgerlichen oder neutralen Gesichtern. Studie 3 untersuchte die Trainierbarkeit sozio-emotionaler Kompetenz anhand des digitalen Trainings Zirkus Empathico. Die Zirkus-Empathico-Gruppe zeigte im Vergleich zur Kontrollgruppe einen Anstieg sowohl der grundlegenden als auch der komplexen sozio-emotionalen Kompetenzen. Darüber hinaus ergab sich für die Zirkus-Empathico-Gruppe auf der neuronalen Ebene einen Verarbeitungsvorteil für fröhliche Gesichter. Zusammenfassend zeigt sich ein erheblicher Nutzen neuronaler Marker für das Verständnis von Mechanismen, welchen der Emotionserkennung von Vorschulkindern zugrunde liegen. Die vielversprechende Evidenz für die Wirksamkeit eines digitalen sozio-emotionalen Kompetenztrainings ermöglicht darüber hinaus weitere Überlegungen zur Nachhaltigkeit der Effekte sowie der gesellschaftlichen Bedeutung.Promoting functional socio-emotional competence in the preschool years (age range 3 to 6 years) is crucial to prevent the development of psychological disorders. To date, there are few studies examining the effects of digital training on the socio-emotional development of preschool children. Similarly, research provides extensive information on typical socio-emotional behaviors in preschool children, while less is known about how the brain implements these functions. Therefore, the goal of this dissertation was to examine fundamental and complex aspects of preschoolers' socio-emotional competence by assessing their maturity and trainability with behavioral and neuronal measures. Studies 1 and 2 used event-related potentials and the Fast Periodic Visual Stimulation method to quantify neural mechanisms of emotion recognition. Both studies revealed the presence of basic emotion recognition mechanisms in this age group. In addition, preschoolers showed a processing advantage of happy over angry or neutral faces. Study 3 investigated the trainability of socio-emotional competence using the digital training Zirkus Empathico. The Zirkus Empathico group showed an increase in both basic and complex socio-emotional competencies compared to the control group. In addition, the Zirkus Empathico group showed a processing advantage for happy faces at the neuronal level. In summary, neuronal markers show considerable utility for understanding mechanisms underlying emotion recognition in preschool children. The promising evidence for the efficacy of digital socio-emotional skills training also allows further consideration of the sustainability of the effects as well as the societal significance
    corecore