857 research outputs found

    We, the City

    Get PDF
    Given their unchecked neoliberal restructuring, Berlin and Istanbul have been exposed to various forms of political polarisation and social injustice over the last decade. As a result, the struggle for affordable housing, access to public space, fair working conditions, ecological justice and the right to different ways of life has intensified. Various forms of resistance "from below" have challenged the relationship between local governments and social movements, questioning where and how the city's political problems arise. In a mixture of dialogues, essays and critical reflections, this book explores the ways in which residents of Berlin and Istanbul experience, express and resist the physical, political and normative reorganisation of their cities. It poses the question: Who is the We in We, the City

    Guided rewriting and constraint satisfaction for parallel GPU code generation

    Get PDF
    Graphics Processing Units (GPUs) are notoriously hard to optimise for manually due to their scheduling and memory hierarchies. What is needed are good automatic code generators and optimisers for such parallel hardware. Functional approaches such as Accelerate, Futhark and LIFT leverage a high-level algorithmic Intermediate Representation (IR) to expose parallelism and abstract the implementation details away from the user. However, producing efficient code for a given accelerator remains challenging. Existing code generators depend on the user input to choose a subset of hard-coded optimizations or automated exploration of implementation search space. The former suffers from the lack of extensibility, while the latter is too costly due to the size of the search space. A hybrid approach is needed, where a space of valid implementations is built automatically and explored with the aid of human expertise. This thesis presents a solution combining user-guided rewriting and automatically generated constraints to produce high-performance code. The first contribution is an automatic tuning technique to find a balance between performance and memory consumption. Leveraging its functional patterns, the LIFT compiler is empowered to infer tuning constraints and limit the search to valid tuning combinations only. Next, the thesis reframes parallelisation as a constraint satisfaction problem. Parallelisation constraints are extracted automatically from the input expression, and a solver is used to identify valid rewriting. The constraints truncate the search space to valid parallel mappings only by capturing the scheduling restrictions of the GPU in the context of a given program. A synchronisation barrier insertion technique is proposed to prevent data races and improve the efficiency of the generated parallel mappings. The final contribution of this thesis is the guided rewriting method, where the user encodes a design space of structural transformations using high-level IR nodes called rewrite points. These strongly typed pragmas express macro rewrites and expose design choices as explorable parameters. The thesis proposes a small set of reusable rewrite points to achieve tiling, cache locality, data reuse and memory optimisation. A comparison with the vendor-provided handwritten kernel ARM Compute Library and the TVM code generator demonstrates the effectiveness of this thesis' contributions. With convolution as a use case, LIFT-generated direct and GEMM-based convolution implementations are shown to perform on par with the state-of-the-art solutions on a mobile GPU. Overall, this thesis demonstrates that a functional IR yields well to user-guided and automatic rewriting for high-performance code generation

    Design and Implementation of a Portable Framework for Application Decomposition and Deployment in Edge-Cloud Systems

    Get PDF
    The emergence of cyber-physical systems has brought about a significant increase in complexity and heterogeneity in the infrastructure on which these systems are deployed. One particular example of this complexity is the interplay between cloud, fog, and edge computing. However, the complexity of these systems can pose challenges when it comes to implementing self-organizing mechanisms, which are often designed to work on flat networks. Therefore, it is essential to separate the application logic from the specific deployment aspects to promote reusability and flexibility in infrastructure exploitation. To address this issue, a novel approach called "pulverization" has been proposed. This approach involves breaking down the system into smaller computational units, which can then be deployed on the available infrastructure. In this thesis, the design and implementation of a portable framework that enables the "pulverization" of cyber-physical systems are presented. The main objective of the framework is to pave the way for the deployment of cyber-physical systems in the edge-cloud continuum by reducing the complexity of the infrastructure and exploit opportunistically the heterogeneous resources available on it. Different scenarios are presented to highlight the effectiveness of the framework in different heterogeneous infrastructures and devices. Current limitations and future work are examined to identify improvement areas for the framework

    Erasure in dependently typed programming

    Get PDF
    It is important to reduce the cost of correctness in programming. Dependent types and related techniques, such as type-driven programming, offer ways to do so. Some parts of dependently typed programs constitute evidence of their typecorrectness and, once checked, are unnecessary for execution. These parts can easily become asymptotically larger than the remaining runtime-useful computation, which can cause linear-time algorithms run in exponential time, or worse. It would be unnacceptable, and contradict our goal of reducing the cost of correctness, to make programs run slower by only describing them more precisely. Current systems cannot erase such computation satisfactorily. By modelling erasure indirectly through type universes or irrelevance, they impose the limitations of these means to erasure. Some useless computation then cannot be erased and idiomatic programs remain asymptotically sub-optimal. This dissertation explains why we need erasure, that it is different from other concepts like irrelevance, and proposes two ways of erasing non-computational data. One is an untyped flow-based useless variable elimination, adapted for dependently typed languages, currently implemented in the Idris 1 compiler. The other is the main contribution of the dissertation: a dependently typed core calculus with erasure annotations, full dependent pattern matching, and an algorithm that infers erasure annotations from unannotated (or partially annotated) programs. I show that erasure in well-typed programs is sound in that it commutes with single-step reduction. Assuming the Church-Rosser property of reduction, I show that properties such as Subject Reduction hold, which extends the soundness result to multi-step reduction. I also show that the presented erasure inference is sound and complete with respect to the typing rules; that this approach can be extended with various forms of erasure polymorphism; that it works well with monadic I/O and foreign functions; and that it is effective in that it not only removes the runtime overhead caused by dependent typing in the presented examples, but can also shorten compilation times."This work was supported by the University of St Andrews (School of Computer Science)." -- Acknowledgement

    Analysing and Reducing Costs of Deep Learning Compiler Auto-tuning

    Get PDF
    Deep Learning (DL) is significantly impacting many industries, including automotive, retail and medicine, enabling autonomous driving, recommender systems and genomics modelling, amongst other applications. At the same time, demand for complex and fast DL models is continually growing. The most capable models tend to exhibit highest operational costs, primarily due to their large computational resource footprint and inefficient utilisation of computational resources employed by DL systems. In an attempt to tackle these problems, DL compilers and auto-tuners emerged, automating the traditionally manual task of DL model performance optimisation. While auto-tuning improves model inference speed, it is a costly process, which limits its wider adoption within DL deployment pipelines. The high operational costs associated with DL auto-tuning have multiple causes. During operation, DL auto-tuners explore large search spaces consisting of billions of tensor programs, to propose potential candidates that improve DL model inference latency. Subsequently, DL auto-tuners measure candidate performance in isolation on the target-device, which constitutes the majority of auto-tuning compute-time. Suboptimal candidate proposals, combined with their serial measurement in an isolated target-device lead to prolonged optimisation time and reduced resource availability, ultimately reducing cost-efficiency of the process. In this thesis, we investigate the reasons behind prolonged DL auto-tuning and quantify their impact on the optimisation costs, revealing directions for improved DL auto-tuner design. Based on these insights, we propose two complementary systems: Trimmer and DOPpler. Trimmer improves tensor program search efficacy by filtering out poorly performing candidates, and controls end-to-end auto-tuning using cost objectives, monitoring optimisation cost. Simultaneously, DOPpler breaks long-held assumptions about the serial candidate measurements by successfully parallelising them intra-device, with minimal penalty to optimisation quality. Through extensive experimental evaluation of both systems, we demonstrate that they significantly improve cost-efficiency of autotuning (up to 50.5%) across a plethora of tensor operators, DL models, auto-tuners and target-devices

    Supporting Custom Instructions with the LLVM Compiler for RISC-V Processor

    Full text link
    The rise of hardware accelerators with custom instructions necessitates custom compiler backends supporting these accelerators. This study provides detailed analyses of LLVM and its RISC-V backend, supplemented with case studies providing end-to-end overview of the mentioned transformations. We discuss that instruction design should consider both hardware and software design space. The necessary compiler modifications may mean that the instruction is not well designed and need to be reconsidered. We discuss that RISC-V standard extensions provide exemplary instructions that can guide instruction designers. In this study, the process of adding a custom instruction to compiler is split into two parts as Assembler support and pattern matching support. Without pattern matching support, conventional software requires manual entries of inline Assembly for the accelerator which is not scalable. While it is trivial to add Assembler support regardless of the instruction semantics, pattern matching support is on the contrary. Pattern matching support and choosing the right stage for the modification, requires the knowledge of the internal transformations in the compiler. This study delves deep into pattern matching and presents multiple ways to approach the problem of pattern matching support. It is discussed that depending on the pattern's complexity, higher level transformations, e.g. IR level, can be more maintainable compared to Instruction Selection phase.Comment: Electronics and Communication Engineering B.Sc. Graduation Project. Source can be found in https://github.com/eymay/Senior-Design-Projec
    corecore