1,451 research outputs found
Preference-Guided Register Assignment
Abstract. This paper deals with coalescing in SSA-based register allo-cation. Current coalescing techniques all require the interference graph to be built. This is generally considered to be too compile-time intensive for just-in-time compilation. In this paper, we present a biased coloring approach that gives results similar to standalone coalescers while signif-icantly reducing compile time.
Global Optimization for Future Gravitational Wave Detectors' Sites
We consider the optimal site selection of future generations of gravitational
wave detectors. Previously, Raffai et al. optimized a 2-detector network with a
combined figure of merit. This optimization was extended to networks with more
than two detectors in a limited way by first fixing the parameters of all other
component detectors. In this work we now present a more general optimization
that allows the locations of all detectors to be simultaneously chosen. We
follow the definition of Raffai et al. on the metric that defines the
suitability of a certain detector network. Given the locations of the component
detectors in the network, we compute a measure of the network's ability to
distinguish the polarization, constrain the sky localization and reconstruct
the parameters of a gravitational wave source. We further define the
`flexibility index' for a possible site location, by counting the number of
multi-detector networks with a sufficiently high Figure of Merit that include
that site location. We confirm the conclusion of Raffai et al., that in terms
of flexibility index as defined in this work, Australia hosts the best
candidate site to build a future generation gravitational wave detector. This
conclusion is valid for either a 3-detector network or a 5-detector network.
For a 3-detector network site locations in Northern Europe display a comparable
flexibility index to sites in Australia. However for a 5-detector network,
Australia is found to be a clearly better candidate than any other location.Comment: 30 pages, 23 figures, 2 table
Efficient and Reasonable Object-Oriented Concurrency
Making threaded programs safe and easy to reason about is one of the chief
difficulties in modern programming. This work provides an efficient execution
model for SCOOP, a concurrency approach that provides not only data race
freedom but also pre/postcondition reasoning guarantees between threads. The
extensions we propose influence both the underlying semantics to increase the
amount of concurrent execution that is possible, exclude certain classes of
deadlocks, and enable greater performance. These extensions are used as the
basis an efficient runtime and optimization pass that improve performance 15x
over a baseline implementation. This new implementation of SCOOP is also 2x
faster than other well-known safe concurrent languages. The measurements are
based on both coordination-intensive and data-manipulation-intensive benchmarks
designed to offer a mixture of workloads.Comment: Proceedings of the 10th Joint Meeting of the European Software
Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of
Software Engineering (ESEC/FSE '15). ACM, 201
Efficient optimization of memory accesses in parallel programs
The power, frequency, and memory wall problems have caused a major shift in mainstream computing by introducing processors that contain multiple low power cores. As multi-core processors are becoming ubiquitous, software trends in both parallel programming languages and dynamic compilation have added new challenges to program compilation for multi-core processors. This thesis proposes a combination of high-level and low-level compiler optimizations to address these challenges.
The high-level optimizations introduced in this thesis include new approaches to May-Happen-in-Parallel analysis and Side-Effect analysis for parallel programs and a novel parallelism-aware Scalar Replacement for Load Elimination transformation. A new Isolation Consistency (IC) memory model is described that permits several scalar replacement transformation opportunities compared to many existing memory models.
The low-level optimizations include a novel approach to register allocation that retains the compile time and space efficiency of Linear Scan, while delivering runtime performance superior to both Linear Scan and Graph Coloring. The allocation phase is modeled as an optimization problem on a Bipartite Liveness Graph (BLG) data structure. The assignment phase focuses on reducing the number of spill instructions by using register-to-register move and exchange instructions wherever possible.
Experimental evaluations of our scalar replacement for load elimination transformation in the Jikes RVM dynamic compiler show decreases in dynamic counts for getfield operations of up to 99.99%, and performance improvements of up to 1.76x on 1 core, and 1.39x on 16 cores, when compared with the load elimination algorithm available in Jikes RVM. A prototype implementation of our BLG register allocator in Jikes RVM demonstrates runtime performance improvements of up to 3.52x relative to Linear Scan on an x86 processor. When compared to Graph Coloring register allocator in the GCC compiler framework, our allocator resulted in an execution time improvement of up to 5.8%, with an average improvement of 2.3% on a POWER5 processor.
With the experimental evaluations combined with the foundations presented in this thesis, we believe that the proposed high-level and low-level optimizations are useful in addressing some of the new challenges emerging in the optimization of parallel programs for multi-core architectures
A Polynomial Spilling Heuristic: Layered Allocation
Register allocation is one of the most important, and one of the oldest compiler optimizations. Its purpose is to map temporary variables to either machine registers or main memory locations and explicit load/store instructions. The latter option is referred to as spilling. This paper addresses the minimization of the spill code overhead, one of the di fficult problems in register allocation. We devised a heuristic approach called layered. It is rooted in the recent advances in SSA-based register allocation. As opposed to the conventional incremental spilling approaches, our method incrementally allocates clusters of variables. We describe a new polynomial method, the layered-optimal allocator, and demonstrate its quasi-optimiality on standard benchmarks and on two architectures.L'allocation de registres est l'une des premières et des plus importantes optimisations effectuées par les compilateurs. Elle a pour but d'associer aux variables temporaires du programme des registres de la machine ou des locations mémoires et d'insérer, dans le code, des instructions de load/store explicites, appelées vidage. Dans ce papier, nous nous intéressons à la minimisation des latences mémoires dues au code de vidage, un des problèmes difficiles en allocation de registres. Nous proposons une approche heuristique d'allocation par couches. Ce travail se base sur les récentes avancées en allocation de registres sous SSA. Contrairement à l'approche conventionnelle de vidage incrémental, notre méthode alloue les variables de manière incrémentale par groupe. Nous comparons notre approche, appelée allocation-optimale par couche, aux méthodes de l'état de l'art à une approche optimale et nous montrons l'allocation-optimale par couche est quasi-optimale sur des benchmarks standard et sur deux architectures différentes
Physics, Astrophysics and Cosmology with Gravitational Waves
Gravitational wave detectors are already operating at interesting sensitivity
levels, and they have an upgrade path that should result in secure detections
by 2014. We review the physics of gravitational waves, how they interact with
detectors (bars and interferometers), and how these detectors operate. We study
the most likely sources of gravitational waves and review the data analysis
methods that are used to extract their signals from detector noise. Then we
consider the consequences of gravitational wave detections and observations for
physics, astrophysics, and cosmology.Comment: 137 pages, 16 figures, Published version
<http://www.livingreviews.org/lrr-2009-2
Optimizing Local Memory Allocation and Assignment Through a Decoupled Approach
International audienceSoftware-controlled local memories (LMs) are widely used to provide fast, scalable, power efficient and predictable access to critical data. While many studies addressed LM management, keeping hot data in the LM continues to cause major headache. This paper revisits LM management of arrays in light of recent progresses in register allocation, supporting multiple live-range splitting schemes through a generic integer linear program. These schemes differ in the grain of decision points. The model can also be extended to address fragmentation, assigning live ranges to precise offsets. We show that the links between LM management and register allocation have been underexploited, leaving much fundamental questions open and effective applications to be explored
A new approach to reversible computing with applications to speculative parallel simulation
In this thesis, we propose an innovative approach to reversible computing that shifts the focus from the operations to the memory outcome of a generic program. This choice allows us to overcome some typical challenges of "plain" reversible computing. Our methodology is to instrument a generic application with the help of an instrumentation tool, namely Hijacker, which we have redesigned and developed for the purpose. Through compile-time instrumentation, we enhance the program's code to keep track of the memory trace it produces until the end. Regardless of the complexity behind the generation of each computational step of the program, we can build inverse machine instructions just by inspecting the instruction that is attempting to write some value to memory. Therefore from this information, we craft an ad-hoc instruction that conveys this old value and the knowledge of where to replace it.
This instruction will become part of a more comprehensive structure, namely the reverse window. Through this structure, we have sufficient information to cancel all the updates done by the generic program during its execution.
In this writing, we will discuss the structure of the reverse window, as the building block for the whole reversing framework we designed and finally realized. Albeit we settle our solution in the specific context of the parallel discrete event simulation (PDES) adopting the Time Warp synchronization protocol, this framework paves the way for further general-purpose development and employment. We also present two additional innovative contributions coming from our innovative reversibility approach, both of them still embrace traditional state saving-based rollback strategy. The first contribution aims to harness the advantages of both the possible approaches. We implement the rollback operation combining state saving together with our reversible support through a mathematical model. This model enables the system to choose in autonomicity the best rollback strategy, by the mutable runtime dynamics of programs. The second contribution explores an orthogonal direction, still related to reversible computing aspects. In particular, we will address the problem of reversing shared libraries. Indeed, leading from their nature, shared objects are visible to the whole system and so does every possible external modification of their code. As a consequence, it is not possible to instrument them without affecting other unaware applications. We propose a different method to deal with the instrumentation of shared objects.
All our innovative proposals have been assessed using the last generation of the open source ROOT-Sim PDES platform, where we integrated our solutions. ROOT-Sim is a C-based package implementing a general purpose simulation environment based on the Time Warp synchronization protocol
- …