758 research outputs found

    Removing and restoring control flow with the Value State Dependence Graph

    Get PDF
    This thesis studies the practicality of compiling with only data flow information. Specifically, we focus on the challenges that arise when using the Value State Dependence Graph (VSDG) as an intermediate representation (IR). We perform a detailed survey of IRs in the literature in order to discover trends over time, and we classify them by their features in a taxonomy. We see how the VSDG fits into the IR landscape, and look at the divide between academia and the 'real world' in terms of compiler technology. Since most data flow IRs cannot be constructed for irreducible programs, we perform an empirical study of irreducibility in current versions of open source software, and then compare them with older versions of the same software. We also study machine-generated C code from a variety of different software tools. We show that irreducibility is no longer a problem, and is becoming less so with time. We then address the problem of constructing the VSDG. Since previous approaches in the literature have been poorly documented or ignored altogether, we give our approach to constructing the VSDG from a common IR: the Control Flow Graph. We show how our approach is independent of the source and target language, how it is able to handle unstructured control flow, and how it is able to transform irreducible programs on the fly. Once the VSDG is constructed, we implement Lawrence's proceduralisation algorithm in order to encode an evaluation strategy whilst translating the program into a parallel representation: the Program Dependence Graph. From here, we implement scheduling and then code generation using the LLVM compiler. We compare our compiler framework against several existing compilers, and show how removing control flow with the VSDG and then restoring it later can produce high quality code. We also examine specific situations where the VSDG can put pressure on existing code generators. Our results show that the VSDG represents a radically different, yet practical, approach to compilation

    RVSDG: An Intermediate Representation for Optimizing Compilers

    Full text link
    Intermediate Representations (IRs) are central to optimizing compilers as the way the program is represented may enhance or limit analyses and transformations. Suitable IRs focus on exposing the most relevant information and establish invariants that different compiler passes can rely on. While control-flow centric IRs appear to be a natural fit for imperative programming languages, analyses required by compilers have increasingly shifted to understand data dependencies and work at multiple abstraction layers at the same time. This is partially evidenced in recent developments such as the MLIR proposed by Google. However, rigorous use of data flow centric IRs in general purpose compilers has not been evaluated for feasibility and usability as previous works provide no practical implementations. We present the Regionalized Value State Dependence Graph (RVSDG) IR for optimizing compilers. The RVSDG is a data flow centric IR where nodes represent computations, edges represent computational dependencies, and regions capture the hierarchical structure of programs. It represents programs in demand-dependence form, implicitly supports structured control flow, and models entire programs within a single IR. We provide a complete specification of the RVSDG, construction and destruction methods, as well as exemplify its utility by presenting Dead Node and Common Node Elimination optimizations. We implemented a prototype compiler and evaluate it in terms of performance, code size, compilation time, and representational overhead. Our results indicate that the RVSDG can serve as a competitive IR in optimizing compilers while reducing complexity

    Future value based single assignment program representations and optimizations

    Get PDF
    An optimizing compiler internal representation fundamentally affects the clarity, efficiency and feasibility of optimization algorithms employed by the compiler. Static Single Assignment (SSA) as a state-of-the-art program representation has great advantages though still can be improved. This dissertation explores the domain of single assignment beyond SSA, and presents two novel program representations: Future Gated Single Assignment (FGSA) and Recursive Future Predicated Form (RFPF). Both FGSA and RFPF embed control flow and data flow information, enabling efficient traversal program information and thus leading to better and simpler optimizations. We introduce future value concept, the designing base of both FGSA and RFPF, which permits a consumer instruction to be encountered before the producer of its source operand(s) in a control flow setting. We show that FGSA is efficiently computable by using a series T1/T2/TR transformation, yielding an expected linear time algorithm for combining together the construction of the pruned single assignment form and live analysis for both reducible and irreducible graphs. As a result, the approach results in an average reduction of 7.7%, with a maximum of 67% in the number of gating functions compared to the pruned SSA form on the SPEC2000 benchmark suite. We present a solid and near optimal framework to perform inverse transformation from single assignment programs. We demonstrate the importance of unrestricted code motion and present RFPF. We develop algorithms which enable instruction movement in acyclic, as well as cyclic regions, and show the ease to perform optimizations such as Partial Redundancy Elimination on RFPF

    An abstract interpretation for SPMD divergence on reducible control flow graphs

    Get PDF
    Vectorizing compilers employ divergence analysis to detect at which program point a specific variable is uniform, i.e. has the same value on all SPMD threads that execute this program point. They exploit uniformity to retain branching to counter branch divergence and defer computations to scalar processor units. Divergence is a hyper-property and is closely related to non-interference and binding time. There exist several divergence, binding time, and non-interference analyses already but they either sacrifice precision or make significant restrictions to the syntactical structure of the program in order to achieve soundness. In this paper, we present the first abstract interpretation for uniformity that is general enough to be applicable to reducible CFGs and, at the same time, more precise than other analyses that achieve at least the same generality. Our analysis comes with a correctness proof that is to a large part mechanized in Coq. Our experimental evaluation shows that the compile time and the precision of our analysis is on par with LLVM’s default divergence analysis that is only sound on more restricted CFGs. At the same time, our analysis is faster and achieves better precision than a state-of-the-art non-interference analysis that is sound and at least as general as our analysis

    ACC Saturator: Automatic Kernel Optimization for Directive-Based GPU Code

    Full text link
    Automatic code optimization is a complex process that typically involves the application of multiple discrete algorithms that modify the program structure irreversibly. However, the design of these algorithms is often monolithic, and they require repetitive implementation to perform similar analyses due to the lack of cooperation. To address this issue, modern optimization techniques, such as equality saturation, allow for exhaustive term rewriting at various levels of inputs, thereby simplifying compiler design. In this paper, we propose equality saturation to optimize sequential codes utilized in directive-based programming for GPUs. Our approach simultaneously realizes less computation, less memory access, and high memory throughput. Our fully-automated framework constructs single-assignment forms from inputs to be entirely rewritten while keeping dependencies and extracts optimal cases. Through practical benchmarks, we demonstrate a significant performance improvement on several compilers. Furthermore, we highlight the advantages of computational reordering and emphasize the significance of memory-access order for modern GPUs

    Kennedy Meadows Community Wildfire Protection Plan

    Get PDF

    Positron Emission Tomography for the dose monitoring of intra-fractionally moving Targets in ion beam therapy

    Get PDF
    Ion beam therapy (IBT) is a promising treatment option in radiotherapy. The characteristic physical and biological properties of light ion beams allow for the delivery of highly tumour conformal dose distributions. Related to the sparing of surrounding healthy tissue and nearby organs at risk, it is feasible to escalate the dose in the tumour volume to reach higher tumour control and survival rates. Remarkable clinical outcome was achieved with IBT for radio-resistant, deep-seated, static and well fixated tumour entities. Presumably, more patients could benefit from the advantages of IBT if it would be available for more frequent tumour sites. Those located in the thorax and upper abdominal region are commonly subjected to intra-fractional, respiration related motion. Different motion compensated dose delivery techniques have been developed for active field shaping with scanned pencil beams and are at least available under experimental conditions at the GSI Helmholtzzentrum fĂĽr Schwerionenforschung (GSI) in Darmstadt, Germany. High standards for quality assurance are required in IBT to ensure a safe and precise dose application. Both underdosage in the tumour and overdosage in the normal tissue might endanger the treatment success. Since minor unexpected anatomical changes e.g. related to patient mispositioning, tumour shrinkage or tissue swelling could already lead to remarkable deviations between planned and delivered dose distribution, a valuable dose monitoring system is desired for IBT. So far, positron emission tomography (PET) is the only in vivo, in situ and non-invasive qualitative dose monitoring method applied under clinical conditions. It makes use of the tissue autoactivation by nuclear fragmentation reactions occurring along the beam path. Among others, +-emitting nuclides are generated and decay according to their half-life under the emission of a positron. The subsequent positron-electron annihilation creates two 511 keV photons which are emitted in opposite direction and can be detected as coincidence event by a dedicated PET scanner. The induced three-dimensional (3D) +- activity distribution in the patient can be reconstructed from the measured coincidences. Conclusions about the delivered dose distribution can be drawn indirectly from a comparison between two +-activity distributions: the measured one and an expected one generated by a Monte-Carlo simulation. This workflow has been proven to be valuable for the dose monitoring in IBT when it was applied for about 440 patients, mainly suffering from deep-seated head and neck tumours that have been treated with 12C ions at GSI. In the presence of intra-fractional target motion, the conventional 3D PET data processing will result in an inaccurate representation of the +-activity distribution in the patient. Fourdimensional, time-resolved (4D) reconstruction algorithms adapted to the special geometry of in-beam PET scanners allow to compensate for the motion related blurring artefacts. Within this thesis, a 4D maximum likelihood expectation maximization (MLEM) reconstruction algorithm has been implemented for the double-head scanner Bastei installed at GSI. The proper functionality of the algorithm and its superior performance in terms of suppressing motion related blurring artefacts compared to an already applied co-registration approach has been demonstrated by a comparative simulation study and by dedicated measurements with moving radioactive sources and irradiated targets. Dedicated phantoms mainly made up of polymethyl methacrylate (PMMA) and a motion table for regular one-dimensional (1D) motion patterns have been designed and manufactured for the experiments. Furthermore, the general applicability of the 4D MLEM algorithm for more complex motion patterns has been demonstrated by the successful reduction of motion artefacts from a measurement with rotating (two-dimensional moving) radioactive sources. For 1D cos2 and cos4 motion, it has been clearly illustrated by systematic point source measurements that the motion influence can be better compensated with the same number of motion phases if amplitudesorted instead of time-sorted phases are utilized. In any case, with an appropriate parameter selection to obtain a mean residual motion per phase of about half of the size of a PET crystal size, acceptable results have been achieved. Additionally, it has been validated that the 4D MLEM algorithm allows to reliably access the relevant parameters (particle range and lateral field position and gradients) for a dose verification in intra-fractionally moving targets even from the intrinsically low counting statistics of IBT-PET data. To evaluate the measured +-activity distribution, it should be compared to a simulated one that is expected from the moving target irradiation. Thus, a 4D version of the simulation software is required. It has to emulate the generation of +-emitters under consideration of the intra-fractional motion, their decay at motion state dependent coordinates and to create listmode data streams from the simulated coincidences. Such a revised and extended version that has been compiled for the special geometry of the Bastei PET scanner is presented within this thesis. The therapy control system provides information about the exact progress of the motion compensated dose delivery. This information and the intra-fractional target motion needs to be taken into account for simulating realistic +-activity distributions. A dedicated preclinical phantom simulation study has been performed to demonstrate the correct functionality of the 4D simulation program and the necessity of the additional, motionrelated input parameters. Different to the data evaluation for static targets, additional effort is required to avoid a potential misleading interpretation of the 4D measured and simulated +-activity distributions in the presence of deficient motion mitigation or data processing. It is presented that in the presence of treatment errors the results from the simulation might be in accordance to the measurement although the planned and delivered dose distribution are different. In contrast to that, deviations may occur between both distributions which are not related to anatomical changes but to deficient 4D data processing. Recommendations are given in this thesis to optimize the 4D IBT-PET workflow and to prevent the observer from a mis-interpretation of the dose monitoring data. In summary, the thesis contributes on a large scale to a potential future application of the IBT-PET monitoring for intra-fractionally moving target volumes by providing the required reconstruction and simulation algorithms. Systematic examinations with more realistic, multi-directional and irregular motion patterns are required for further improvements. For a final rating of the expectable benefit from a 4D IBT-PET dose monitoring, future investigations should include real treatment plans, breathing curves and 4D patient CT images

    Technical Design Report for the PANDA Micro Vertex Detector

    Get PDF
    This document illustrates the technical layout and the expected performance of the Micro Vertex Detector (MVD) of the PANDA experiment. The MVD will detect charged particles as close as possible to the interaction zone. Design criteria and the optimisation process as well as the technical solutions chosen are discussed and the results of this process are subjected to extensive Monte Carlo physics studies. The route towards realisation of the detector is outlined

    Compilation techniques for irregular problems on parallel machines

    Get PDF
    Massively parallel computers have ushered in the era of teraflop computing. Even though large and powerful machines are being built, they are used by only a fraction of the computing community. The fundamental reason for this situation is that parallel machines are difficult to program. Development of compilers that automatically parallelize programs will greatly increase the use of these machines.;A large class of scientific problems can be categorized as irregular computations. In this class of computation, the data access patterns are known only at runtime, creating significant difficulties for a parallelizing compiler to generate efficient parallel codes. Some compilers with very limited abilities to parallelize simple irregular computations exist, but the methods used by these compilers fail for any non-trivial applications code.;This research presents development of compiler transformation techniques that can be used to effectively parallelize an important class of irregular programs. A central aim of these transformation techniques is to generate codes that aggressively prefetch data. Program slicing methods are used as a part of the code generation process. In this approach, a program written in a data-parallel language, such as HPF, is transformed so that it can be executed on a distributed memory machine. An efficient compiler runtime support system has been developed that performs data movement and software caching
    • …
    corecore