27 research outputs found

    Technique étendue d’allocation mémoire basée sur les réseaux entiers

    Get PDF
    This work extends lattice-based memory allocation, an earlier work on memory (array)reuse analysis. The main motivation is to handle in a better way the more general forms ofspecifications we see today, e.g., with loop tiling, pipelining, and other forms of parallelism availablein explicitly parallel languages. Our extension has two complementary aspects. We show howto handle more general specifications where conflicting constraints (those that describe the arrayindices that cannot share the same location) are specified as a (non-convex) union of polyhedra.Unlike convex specifications, this also requires to be able to choose suitable directions (or basis) ofarray reuse. For that, we extend two dual approaches, previously proposed for a fixed basis, intooptimization schemes to select suitable basis. Our final approach relies on a combination of thetwo, also revealing their links with, on one hand, the construction of multi-dimensional schedulesfor parallelism and tiling (but with a fundamental difference that we identify) and, on the otherhand, the construction of universal reuse vectors (UOV), which was only used so far in a specificcontext, for schedule-independent mapping.Ce travail étend l’allocation mémoire basée sur les réseaux entiersprécédemment proposée en analyse de réutilisation mémoire (de tableaux). Lamotivation principale est de traiter de meilleure façon les formes plus généralesde spécifications rencontrées aujourd’hui, comportant du tuilage de boucles,du pipeline, et d’autres formes de parallélisme exprimées dans les langages àparallélisme explicite. Notre extension a deux aspects complémentaires. Nousmontrons comment nous pouvons prendre en compte des spécifications plusgénérales où les contraintes de conflit (celles qui décrivent les indices de tableauxqui ne peuvent pas partager le même emplacement mémoire) sont spécifiées parune union (non-convexe) de polyèdres. Au contraire des spécifications convexes,ceci requiert d’être capable de choisir des directions (c’est-à-dire une base)adéquates de réutilisation des cases de tableaux. Pour cela, nous étendons deuxapproches duales, précédemment proposées pour une base fixée, en des schémasd’optimisation permettant de choisir des bases adaptées. Notre approche finaleconsiste en une combinaison des deux approches, révélant également des liensavec, d’une part, la construction d’ordonnancements multi-dimensionnels pour leparallélisme et le tuilage (avec une différence fondamentale que nous identifions)et, d’autre part, la construction de vecteurs de réutilisation universelle (UOV),qui étaient utilisés jusqu’à présent uniquement dans un contexte spécifique, celuides allocations valides pour tout ordonnancement

    Power-Aware Memory Allocation for Embedded Data-Intensive Signal Processing Applications

    Get PDF
    Many signal processing systems, particularly in the multimedia and telecommunication domains, are synthesized to execute data-intensive applications: their cost related aspects ­ namely power consumption and chip area ­ are heavily influenced, if not dominated, by the data access and storage aspects. This chapter presents a power-aware memory allocation methodology. Starting from the high-level behavioral specification of a given application, this framework performs the assignment of of the multidimensional signals to the memory layers ­ the on-chip scratch-pad memory and the off-chip main memory ­ the goal being the reduction of the dynamic energy consumption in the memory subsystem. Based on the assignment results, the framework subsequently performs the mapping of signals into the memory layers such that the overall amount of data storage be reduced. This software system yields a complete allocation solution: the exact storage amount on each memory layer, the mapping functions that determine the exact locations for any array element (scalar signal) in the specification, and, in addition, an estimation of the dynamic energy consumption in the memory subsystem

    Liveness Analysis in Explicitly-Parallel Programs

    Get PDF
    International audienceIn this paper, we revisit scalar and array element-wise liveness analysis for programs with parallel specifications. In earlier work on memory allocation/contraction (register allocation or intra- and inter-array reuse in the polyhedral model), a notion of ``time'' or a total order among the iteration points was used to compute the liveness of values. In general, the execution of parallel programs is not a total order, and hence the notion of time is not applicable. We first revise how conflicts are computed by using ideas from liveness analysis for register allocation, studying the structure of the corresponding conflict/interference graphs. Instead of considering the conflict between two live ranges, we only consider the conflict between a live range and a write. This simplifies the formulation from having four instances involved in the test down to three, and also improves the precision of the analysis in the general case. Then we extend the liveness analysis to work with partial orders so that it can be applied to many different parallel languages/specifications with different forms of parallelism. An important result is that the complement of the conflict graph with partial orders is directly connected to memory reuse, even in presence of races. However, programs with conditionals do not always define a partial order, and our next step will be to handle such cases with more accuracy

    Lightweight Array Contraction by Trace-Based Polyhedral Analysis

    Get PDF
    International audienceArray contraction is a compilation optimization used to reduce memory consumption, by reducing the size of temporary arrays in a program while preserving its correctness. The usual approach to this problem is to perform a static analysis of the given program, creating overhead in the compilation cycle. In this work, we take a look at exploiting execution traces of programs of the polyhedral model, in order to infer reduced sizes for the temporary arrays used during calculations. We designed a four step process to reduce the storage requirements of a temporary array of a given scheduled program, in which we used an algorithm to deduce array access functions for which bounds are modulos of affine functions of parameters of the program. Our results show memory reductions of an order of magnitude on several benchmarks examples from PolyBench, a collection of programs from the polyhedral community. Execution time is compared to a baseline implementation of a static algorithm, and results show speed-up factors up to 20

    Automatic Storage Optimization for Arrays

    Get PDF
    International audienceEfficient memory allocation is crucial for data-intensive applications as a smaller memory footprint ensures better cache performance and allows one to run a larger problem size given a fixed amount of main memory. In this paper, we describe a new automatic storage optimization technique to minimize the dimensionality and storage requirements of arrays used in sequences of loop nests with a predetermined schedule. We formulate the problem of intra-array storage optimization as one of finding the right storage partitioning hyperplanes: each storage partition corresponds to a single storage location. Our heuristic is driven by a dual objective function that minimizes both, the dimensionality of the mapping and the extents along those dimensions. The technique is dimension optimal for most codes encountered in practice. The storage requirements of the mappings obtained also are asymptotically better than those obtained by any existing schedule-dependent technique. Storage reduction factors and other results we report from an implementation of our technique demonstrate its effectiveness on several real-world examples drawn from the domains of image processing, stencil computations, high-performance computing, and the class of tiled codes in general

    Array size computation under uniform overlapping and irregular accesses

    Get PDF
    The size required to store an array is crucial for an embedded system, as it affects the memory size, the energy per memory access, and the overall system cost. Existing techniques for finding the minimum number of resources required to store an array are less efficient for codes with large loops and not regularly occurring memory accesses. They have to approximate the accessed parts of the array leading to overestimation of the required resources. Otherwise, their exploration time is increased with an increase over the number of the different accessed parts of the array. We propose a methodology to compute the minimum resources required for storing an array which keeps the exploration time low and provides a near-optimal result for regularly and non-regularly occurring memory accesses and overlapping writes and reads

    Towards Trace-Based Array Contraction

    Get PDF
    Array contraction is a compilation optimization used to reduce the memory con-sumption, by shrinking the size of temporary arrays while preserving the correctness. The usualapproach to this problem is to perform a static analysis of the given program, creating overhead inthe compilation cycle. In this report, we take a look at exploiting execution traces of programs ofthe polyhedral model, in order to infer reduced sizes for the temporary arrays used during calcu-lations. We designed a five step process to reduce the storage requirements of a temporary arrayof a given scheduled program, in which we used an algorithm to deduce array access functionsfor which bounds are modulos of affine functions of parameters and counters of the program. Ourpreliminary results show reductions of an order of magnitude on several benchmarks examples fromthe polyhedral community.La contraction de tableau est une optimisation de compilation servant à amoindrir les coûts en mémoire, en réduisant la taille des tableaux temporaires sans en altérer l’exactitude du résultat. L’approche habituelle pour ce problème est l’analyse statique du programme, ce qui engendre plus de travail dans le cycle de compilation. Nous étudions les traces d’exécution de programmes du modèle polyédrique, afin d’en inférer des tailles réduites pour ces tableaux temporaires. Nous proposons une méthode en cinq étapes pour réaliser la contraction de tableaux sur un programme déjà ordonné, comprenant l’utilisation d’un algorithme pour déduire des fonctions d’accès aux tableaux affines. Nos résultats préliminaires comprennent des réductions d’un ordre de grandeur sur plusieurs exemples de la communauté polyédrique
    corecore