370 research outputs found

    On the Complexity of Spill Everywhere under SSA Form

    Get PDF
    Compilation for embedded processors can be either aggressive (time consuming cross-compilation) or just in time (embedded and usually dynamic). The heuristics used in dynamic compilation are highly constrained by limited resources, time and memory in particular. Recent results on the SSA form open promising directions for the design of new register allocation heuristics for embedded systems and especially for embedded compilation. In particular, heuristics based on tree scan with two separated phases -- one for spilling, then one for coloring/coalescing -- seem good candidates for designing memory-friendly, fast, and competitive register allocators. Still, also because of the side effect on power consumption, the minimization of loads and stores overhead (spilling problem) is an important issue. This paper provides an exhaustive study of the complexity of the ``spill everywhere'' problem in the context of the SSA form. Unfortunately, conversely to our initial hopes, many of the questions we raised lead to NP-completeness results. We identify some polynomial cases but that are impractical in JIT context. Nevertheless, they can give hints to simplify formulations for the design of aggressive allocators.Comment: 10 page

    Analytical subcellular fractionation of cultivated mouse resident peritoneal macrophages

    Get PDF
    Resident peritoneal macrophages of the mouse, cultivated for 3 d, have been studied by quantitative subcellular fractionation using differential centrifugation and density equilibration in linear gradients of sucrose. Density equilibration experiments were carried out on untreated cytoplasmic extracts, on cytoplasmic extracts treated with digitonin or sodium pyrophosphate, and on cytoplasmic extracts derived from cells cultivated for 24 h in the presence of Triton WR-1339. The enzyme distributions obtained distinguished six typical behaviors characteristic of distinct subcellular entities. Acid α-galactosidase and other acid hydrolases displayed the highest average velocity of sedimentation and equilibrium density. Culturing in a medium that contained Triton WR-1339 markedly decreased their density, most likely as a result of Triton WR-1339 accumulation within lysosomes. Cytochrome c oxidase and the sedimentable activity of malate dehydrogenase showed a narrow density distribution centered around 1.17, very similar under all the experimental situations; their rate of sedimentation fell within the range expected for mitochondria. Catalase was particle-bound and exhibited structure-linked latency (80 percent); it was released in soluble and fully active form by digitonin, but this required a much higher concentration than in the case of lysosomal enzymes. Differences relative to all the other enzymes studied suggest the existence of a particular species of organelles, distinctly smaller than mitochondria, and possibly related to peroxisomes. Many enzymes were microsomal in the sense that the specific activities, but not the yields, were greater in microsomes than in other fractions obtained by differential centrifugation. These enzymes were distinguished in three groups by their properties in density equilibration experiments. NAD glycohydrolase, alkaline phosphodiesterase I, and 5’-nucleotidase had low equilibrium densities but became noticeably more dense after addition of digitonin. The other microsomal enzymes were not shifted by digitonin, in particular N-acetylglucosaminyltransferase and galactosyltransferase, which otherwise equilibrated at the same position in the gradient. We assign the digitonin-sensitive enzymes to plasma membranes and possibly to related endomembranes of the cells, and the two glycosyltransferases to elements derived from the Golgi apparatus. Finally, α-glucosidase, sulphatase C, NADH cytochrome c reductase, NADPH cytochrome c reductase, and mannosyltransferase, equilibrated at a relatively high density but were shifted to lower density values after addition of sodium pyrophosphate. These properties support their association with elements derived from the endoplasmic reticulum

    Program Analysis and Source-Level Communication Optimizations for High-Level Synthesis

    Get PDF
    The use of hardware accelerators, e.g., with GPGPUs or customized circuits using FPGAs, are particularly interesting for accelerating data- and compute-intensive applications. However, to get high performance, it is mandatory to restructure the application code, to generate adequate communication mechanisms, and to compile the different communicating processes so that the resulting application is highly-optimized, with full usage of the memory bandwidth. In the context of the high-level synthesis (HLS) of hardware accelerators, we show how to automatically generate such an optimized organization for an accelerator communicating to an external DDR memory. Our technique relies on loop tiling, the generation of pipelined processes (overlapping communications & computations), and the automatic design (synchronizations and sizes) of local buffers. Our first contribution is a program analysis that specifies the data to be read from and written to the external memory so as to reduce communications and reuse data as much as possible in the accelerator. This specification, which can be used in different contexts, handles the cases where data can be redefined in the accelerator and/or approximations are needed because of non-analyzable data accesses. Our second contribution is an optimized code generation scheme, entirely at source-level, that allows us to compile all the necessary glue (the communication processes) with the same HLS tool as for the computation kernel. Both contributions use advanced polyhedral techniques for program analysis and transformation. Experiments with Altera HLS tools show the correctness and efficiency of our technique.Les accélérateurs matériels, comme par exemple via l'utilisation de GPGPUs ou de circuits dédiés sur FPGAs, sont particulièrement intéressants pour accélérer les applications gourmandes en calculs et en accès aux données. En revanche, pour obtenir de bonnes performances, il est indispensable de restructurer le code de l'application, de générer des mécanismes de communication adéquats, et de compiler les différents processus communicants de sorte que l'application résultante soit hautement optimisée, avec un bon usage de la bande passante vers la mémoire. Dans le contexte de la synthèse de haut niveau (HLS) d'accélérateurs matériels, nous montrons comment générer automatiquement une telle organisation optimisée pour un accélérateur communiquant avec une mémoire externe DDR. Notre technique repose sur le tiling (calcul par bloc), la génération de processus pipelinés (en recouvrant calculs et communications), et la conception automatique (synchronisations et tailles) de buffers locaux. Notre première contribution est une analyse de programme qui spécifie les données à lire depuis la mémoire externe et à écrire dans cette mémoire de façon à réduire les communications et à réutiliser les données, autant que faire se peut, dans l'accélérateur. Cette spécification, qui peut être utilisée dans d'autres contextes, prend en compte les cas où les données peuvent être redéfinies dans l'accélérateur et/ou des approximations sont nécessaires du fait d'accès non analysables. Notre seconde contribution est un schéma de génération de code optimisé, entièrement au niveau source, qui nous permet de compiler tous les mécanismes d'optimisation (les processus communicants) avec le même outil de HLS que le noyau de calcul lui-même. Ces deux contributions utilisent des techniques polyédriques avancées d'analyse et de transformation de programme. Les expérimentations menées avec l'outil de synthèse d'Altera C2H montre la correction et l'efficacité de notre technique

    Liveness Analysis in Explicitly-Parallel Programs

    Get PDF
    International audienceIn this paper, we revisit scalar and array element-wise liveness analysis for programs with parallel specifications. In earlier work on memory allocation/contraction (register allocation or intra- and inter-array reuse in the polyhedral model), a notion of ``time'' or a total order among the iteration points was used to compute the liveness of values. In general, the execution of parallel programs is not a total order, and hence the notion of time is not applicable. We first revise how conflicts are computed by using ideas from liveness analysis for register allocation, studying the structure of the corresponding conflict/interference graphs. Instead of considering the conflict between two live ranges, we only consider the conflict between a live range and a write. This simplifies the formulation from having four instances involved in the test down to three, and also improves the precision of the analysis in the general case. Then we extend the liveness analysis to work with partial orders so that it can be applied to many different parallel languages/specifications with different forms of parallelism. An important result is that the complement of the conflict graph with partial orders is directly connected to memory reuse, even in presence of races. However, programs with conditionals do not always define a partial order, and our next step will be to handle such cases with more accuracy

    Static Analysis of OpenStream Programs

    Get PDF
    International audienceThis paper studies the applicability of polyhedral techniques to the parallel language Open- Stream [25]. When applicable, polyhedral techniques are invaluable for compile-time debugging and for generating efficient code well suited to a target architecture. OpenStream is a two-level language in which a control program directs the initialization of parallel task instances that communicate through streams, with possibly multiple writers and readers. It has a fairly complex semantics in its most general setting, but we restrict ourselves to the case where the control program is sequential, which is representative of the majority of the OpenStream applications. This restriction offers deterministic concurrency by construction, but deadlocks are still possible. We show that, if the control program is polyhedral, one may statically compute, for each task instance, the read and write indices to each of its streams, and thus reason statically about the dependences among task instances (the only scheduling constraints in this polyhedral subset). These indices may be polynomials of arbitrary degree, thus requiring to extend to polynomials the standard polyhedral techniques for dependence analysis, scheduling, and deadlock detection. Modern SMT allow to solve polynomial problems, albeit with no guarantee of success; the approach of Feautrier [10] may offer an alternative solution. We also establish two important results related to deadlocks in OpenStream: 1) a characterization of deadlocks in terms of dependence paths, which implies that streams can be safely bounded as soon as a schedule exists with such sizes, 2) the proof that deadlock detection is undecidable, even for polyhedral OpenStream

    Multi-dimensional Rankings, Program Termination, and Complexity Bounds of Flowchart Programs

    Get PDF
    International audienceProving the termination of a flowchart program can be done by exhibiting a ranking function, i.e., a function from the program states to a well-founded set, which strictly decreases at each program step. A standard method to automatically generate such a function is to compute invariants for each program point and to search for a ranking in a restricted class of functions that can be handled with linear programming techniques. Previous algorithms based on affine rankings either are applicable only to simple loops (i.e., single-node flowcharts) and rely on enumeration, or are not complete in the sense that they are not guaranteed to find a ranking in the class of functions they consider, if one exists. Our first contribution is to propose an efficient algorithm to compute ranking functions: It can handle flowcharts of arbitrary structure, the class of candidate rankings it explores is larger, and our method, although greedy, is provably complete. Our second contribution is to show how to use the ranking functions we generate to get upper bounds for the computational complexity (number of transitions) of the source program. This estimate is a polynomial, which means that we can handle programs with more than linear complexity. We applied the method on a collection of test cases from the literature. We also show the links and differences with previous techniques based on the insertion of counters

    Bounding the Computational Complexity of Flowchart Programs with Multi-dimensional Rankings

    Get PDF
    Proving the termination of a flowchart program can be done by exhibiting a ranking function, i.e., a function from the program states to a well-founded set, which strictly decreases at each program step. A standard method to automatically generate such a function is to compute invariants for each program point and to search for a ranking in a restricted class of functions that can be handled with linear programming techniques. Our first contribution is to propose an efficient algorithm to compute ranking functions: It can handle flowcharts of arbitrary structure, the class of candidate rankings it explores is larger, and our method, although greedy, is provably complete. Our second contribution is to show how to use the ranking functions we generate to get upper bounds for the computational complexity (number of transitions) of the source program, again for flowcharts of arbitrary structure. This estimate is a polynomial, which means that we can handle programs with more than linear complexity. We applied the method on a collection of test cases from the literature. We also point out important extensions, mainly to do with the scalability of the algorithm and, in particular, the integration of techniques based on cutpoints

    SSI Revisited

    Get PDF
    The static single information (SSI) form, proposed by Ananian, then in a more general form by Singer, is an extension of the static single assignment (SSA) form. The latter is a well-established compiler intermediate representation that has been successfully used for numerous compiler analysis and optimizations. Several interesting results have also been shown for SSI concerning liveness analysis and representation of live-ranges of variables, which could make SSI appealing for just-in-time compilation. Unfortunately, previous literature on the SSI form is sparse and appears to be partly incorrect. Our paper corrects some of the mistakes that have been made. Our main result is a complete proof that, even for the most general definition of SSI, basic blocks, and thus program points, can be totally ordered so that live-ranges of variables correspond to intervals. This corrects the erroneous proof of Brisk and Sarrafzadeh
    • …
    corecore