6 research outputs found
The Potential of Synergistic Static, Dynamic and Speculative Loop Nest Optimizations for Automatic Parallelization
Research in automatic parallelization of loop-centric programs started with
static analysis, then broadened its arsenal to include dynamic
inspection-execution and speculative execution, the best results involving
hybrid static-dynamic schemes. Beyond the detection of parallelism in a
sequential program, scalable parallelization on many-core processors involves
hard and interesting parallelism adaptation and mapping challenges. These
challenges include tailoring data locality to the memory hierarchy, structuring
independent tasks hierarchically to exploit multiple levels of parallelism,
tuning the synchronization grain, balancing the execution load, decoupling the
execution into thread-level pipelines, and leveraging heterogeneous hardware
with specialized accelerators. The polyhedral framework allows to model,
construct and apply very complex loop nest transformations addressing most of
the parallelism adaptation and mapping challenges. But apart from
hardware-specific, back-end oriented transformations (if-conversion, trace
scheduling, value prediction), loop nest optimization has essentially ignored
dynamic and speculative techniques. Research in polyhedral compilation recently
reached a significant milestone towards the support of dynamic, data-dependent
control flow. This opens a large avenue for blending dynamic analyses and
speculative techniques with advanced loop nest optimizations. Selecting
real-world examples from SPEC benchmarks and numerical kernels, we make a case
for the design of synergistic static, dynamic and speculative loop
transformation techniques. We also sketch the embedding of dynamic information,
including speculative assumptions, in the heart of affine transformation search
spaces
GPTVQ: The Blessing of Dimensionality for LLM Quantization
In this work we show that the size versus accuracy trade-off of neural
network quantization can be significantly improved by increasing the
quantization dimensionality. We propose the GPTVQ method, a new fast method for
post-training vector quantization (VQ) that scales well to Large Language
Models (LLMs). Our method interleaves quantization of one or more columns with
updates to the remaining unquantized weights, using information from the
Hessian of the per-layer output reconstruction MSE. Quantization codebooks are
initialized using an efficient data-aware version of the EM algorithm. The
codebooks are then updated, and further compressed by using integer
quantization and SVD-based compression. GPTVQ establishes a new state-of-the
art in the size vs accuracy trade-offs on a wide range of LLMs such as Llama-v2
and Mistral. Furthermore, our method is efficient: on a single H100 it takes
between 3 and 11 hours to process a Llamav2-70B model, depending on
quantization setting. Lastly, with on-device timings for VQ decompression on a
mobile CPU we show that VQ leads to improved latency compared to using a 4-bit
integer format
Violated dependence analysis
The polyhedral model is a powerful framework to reason about high level loop transformations. Yet the lack of scalable algorithms and tools has deterred actors from both academia and industry to put this model to practical use. Indeed, for fundamental complexity reasons, its applicability has long been limited to simple kernels. Recent developments broke some generally accepted ideas about these limitations. In particular, new algorithms made it possible to compute the target code for full SPEC benchmarks while this code generation step was expected not to be scalable. Instancewise array dependence analysis computes a finite, intensional representation of the (statically unbounded) set of all dynamic dependences. This problem has always been considered non-scalable and/or an overkill with respect to less expressive and faster dependence tests. On the contrary, this article presents experimental evidence of its applicability to full SPEC CPU2000 benchmarks. To make this possible, we revisit the characterization of data dependences, considering relations between time dimensions of the transformed space. Beyond algorithmic benefits, this naturally leads to a novel way of reasoning about violated dependences across arbitrary transformation sequences. Reasoning about violated dependences relieves the compiler designer from the cumbersome task of implementing specific legality checks for each single transformation. It also allows, in the case of invalid transformations, to precisely determine the violated dependences that need to be corrected. Identifying these violations can in turn enable automatic correction schemes to fix an illegal transformation sequence with minimal changes
Solving Systems of Affine (In)Equalities: PIP's User's Guide
This document is the User's Manual of PIP, a software which solves Parametric Integer Programming problems. That is, PIP finds the lexicographic minimum of the set of integer points which lie inside a convex polyhedron, when that polyhedron depends linearly on one or more integral parameters
ParameTrick: Coefficient Generalization for Faster Polyhedral Scheduling
International audiencePolyhedral Schedulers have been widely used for loop optimization in general-purpose compilers and, more recently, deep-learning compilers. State-of-the-art scheduling algorithms define a vast space of loop transformations and find an optimal solution according to pre-designed cost functions. However, this Integer Linear Programming (ILP) approach can sometimes have complexity issues. The resolution time of ILP grows rapidly as the size of the problem increases. The complexity of the kernel to optimize, in terms of the number of statements, loops, and dependencies, has a significant impact on solving time. Additionally, the performance of ILP solvers can be severely impacted by big coefficients in the ILP, which mostly come from the loop bounds in the original kernel. To tackle this issue, this paper introduces a technique called "ParameTrick". This technique is used during the scheduling pipeline to replace some large coefficients with parameters, which are new variables in the ILP model. The approach is analyzed to determine its impact on the solving time, as well as the transformations that are lost and how to limit the number of unfeasible transformations. Results show that ParameTrick leads to faster solving times while the space of dropped desirable solutions is limited