6 research outputs found

    The Potential of Synergistic Static, Dynamic and Speculative Loop Nest Optimizations for Automatic Parallelization

    Get PDF
    Research in automatic parallelization of loop-centric programs started with static analysis, then broadened its arsenal to include dynamic inspection-execution and speculative execution, the best results involving hybrid static-dynamic schemes. Beyond the detection of parallelism in a sequential program, scalable parallelization on many-core processors involves hard and interesting parallelism adaptation and mapping challenges. These challenges include tailoring data locality to the memory hierarchy, structuring independent tasks hierarchically to exploit multiple levels of parallelism, tuning the synchronization grain, balancing the execution load, decoupling the execution into thread-level pipelines, and leveraging heterogeneous hardware with specialized accelerators. The polyhedral framework allows to model, construct and apply very complex loop nest transformations addressing most of the parallelism adaptation and mapping challenges. But apart from hardware-specific, back-end oriented transformations (if-conversion, trace scheduling, value prediction), loop nest optimization has essentially ignored dynamic and speculative techniques. Research in polyhedral compilation recently reached a significant milestone towards the support of dynamic, data-dependent control flow. This opens a large avenue for blending dynamic analyses and speculative techniques with advanced loop nest optimizations. Selecting real-world examples from SPEC benchmarks and numerical kernels, we make a case for the design of synergistic static, dynamic and speculative loop transformation techniques. We also sketch the embedding of dynamic information, including speculative assumptions, in the heart of affine transformation search spaces

    GPTVQ: The Blessing of Dimensionality for LLM Quantization

    Full text link
    In this work we show that the size versus accuracy trade-off of neural network quantization can be significantly improved by increasing the quantization dimensionality. We propose the GPTVQ method, a new fast method for post-training vector quantization (VQ) that scales well to Large Language Models (LLMs). Our method interleaves quantization of one or more columns with updates to the remaining unquantized weights, using information from the Hessian of the per-layer output reconstruction MSE. Quantization codebooks are initialized using an efficient data-aware version of the EM algorithm. The codebooks are then updated, and further compressed by using integer quantization and SVD-based compression. GPTVQ establishes a new state-of-the art in the size vs accuracy trade-offs on a wide range of LLMs such as Llama-v2 and Mistral. Furthermore, our method is efficient: on a single H100 it takes between 3 and 11 hours to process a Llamav2-70B model, depending on quantization setting. Lastly, with on-device timings for VQ decompression on a mobile CPU we show that VQ leads to improved latency compared to using a 4-bit integer format

    Violated dependence analysis

    Full text link
    The polyhedral model is a powerful framework to reason about high level loop transformations. Yet the lack of scalable algorithms and tools has deterred actors from both academia and industry to put this model to practical use. Indeed, for fundamental complexity reasons, its applicability has long been limited to simple kernels. Recent developments broke some generally accepted ideas about these limitations. In particular, new algorithms made it possible to compute the target code for full SPEC benchmarks while this code generation step was expected not to be scalable. Instancewise array dependence analysis computes a finite, intensional representation of the (statically unbounded) set of all dynamic dependences. This problem has always been considered non-scalable and/or an overkill with respect to less expressive and faster dependence tests. On the contrary, this article presents experimental evidence of its applicability to full SPEC CPU2000 benchmarks. To make this possible, we revisit the characterization of data dependences, considering relations between time dimensions of the transformed space. Beyond algorithmic benefits, this naturally leads to a novel way of reasoning about violated dependences across arbitrary transformation sequences. Reasoning about violated dependences relieves the compiler designer from the cumbersome task of implementing specific legality checks for each single transformation. It also allows, in the case of invalid transformations, to precisely determine the violated dependences that need to be corrected. Identifying these violations can in turn enable automatic correction schemes to fix an illegal transformation sequence with minimal changes

    Solving Systems of Affine (In)Equalities: PIP's User's Guide

    Full text link
    This document is the User's Manual of PIP, a software which solves Parametric Integer Programming problems. That is, PIP finds the lexicographic minimum of the set of integer points which lie inside a convex polyhedron, when that polyhedron depends linearly on one or more integral parameters

    ParameTrick: Coefficient Generalization for Faster Polyhedral Scheduling

    Full text link
    International audiencePolyhedral Schedulers have been widely used for loop optimization in general-purpose compilers and, more recently, deep-learning compilers. State-of-the-art scheduling algorithms define a vast space of loop transformations and find an optimal solution according to pre-designed cost functions. However, this Integer Linear Programming (ILP) approach can sometimes have complexity issues. The resolution time of ILP grows rapidly as the size of the problem increases. The complexity of the kernel to optimize, in terms of the number of statements, loops, and dependencies, has a significant impact on solving time. Additionally, the performance of ILP solvers can be severely impacted by big coefficients in the ILP, which mostly come from the loop bounds in the original kernel. To tackle this issue, this paper introduces a technique called "ParameTrick". This technique is used during the scheduling pipeline to replace some large coefficients with parameters, which are new variables in the ILP model. The approach is analyzed to determine its impact on the solving time, as well as the transformations that are lost and how to limit the number of unfeasible transformations. Results show that ParameTrick leads to faster solving times while the space of dropped desirable solutions is limited
    corecore