1,296 research outputs found
Recommended from our members
An efficient global resource constrained technique for exploiting instruction level parallelism
A new Global Resource-constrained Percolation (GRiP) scheduling technique is presented for exploiting instruction level parallelism. Other techniques that have been proposed either have been prohibitively expensive in terms of computation or have limited parallelism. The GRiP technique has been implemented and simulation results are presented
Modulo scheduling with reduced register pressure
Software pipelining is a scheduling technique that is used by some product compilers in order to expose more instruction level parallelism out of innermost loops. Module scheduling refers to a class of algorithms for software pipelining. Most previous research on module scheduling has focused on reducing the number of cycles between the initiation of consecutive iterations (which is termed II) but has not considered the effect of the register pressure of the produced schedules. The register pressure increases as the instruction level parallelism increases. When the register requirements of a schedule are higher than the available number of registers, the loop must be rescheduled perhaps with a higher II. Therefore, the register pressure has an important impact on the performance of a schedule. This paper presents a novel heuristic module scheduling strategy that tries to generate schedules with the lowest II, and, from all the possible schedules with such II, it tries to select that with the lowest register requirements. The proposed method has been implemented in an experimental compiler and has been tested for the Perfect Club benchmarks. The results show that the proposed method achieves an optimal II for at least 97.5 percent of the loops and its compilation time is comparable to a conventional top-down approach, whereas the register requirements are lower. In addition, the proposed method is compared with some other existing methods. The results indicate that the proposed method performs better than other heuristic methods and almost as well as linear programming methods, which obtain optimal solutions but are impractical for product compilers because their computing cost grows exponentially with the number of operations in the loop body.Peer ReviewedPostprint (published version
Recommended from our members
Percolation scheduling with resource constraints
This paper presents a new approach to resource-constrained compiler extraction of fine-grain parallelism, targeted towards VLIW supercomputers, and in particular, the IBM VLIW (Very Large Instruction Word) processor. The algorithms described integrate resource limitations into Percolation Scheduling—a global parallelization technique—to deal with resource constraints, without sacrificing the generality and completeness of Percolation Scheduling in the process. This is in sharp contrast with previous approaches which either applied only to conditional-free code, or drastically limited the parallelization process by imposing relatively local heuristic resource constraints early in the scheduling process
Swing modulo scheduling: a lifetime-sensitive approach
This paper presents a novel software pipelining approach, which is called Swing Modulo Scheduling (SMS). It generates schedules that are near optimal in terms of initiation interval, register requirements and stage count. Swing Modulo Scheduling is an heuristic approach that has a low computational cost. The paper describes the technique and evaluates it for the Perfect Club benchmark suite. SMS is compared with other heuristic methods showing that it outperforms them in terms of the quality of the obtained schedules and compilation time. SMS is also compared with an integer linear programming approach that generates optimum schedules but with a huge computational cost, which makes it feasible only for very small loops. For a set of small loops, SMS obtained the optimum initiation interval in all the cases and its schedules required only 5% more registers and a 1% higher stage count than the optimumPeer ReviewedPostprint (published version
Exploiting the Parallelism Exposed by Partial Evaluation
We describe an approach to parallel compilation that seeks to harness the vast amount of fine-grain parallelism that is exposed through partial evaluation of numerically-intensive scientific programs. We have constructed a compiler for the Supercomputer Toolkit parallel processor that uses partial evaluation to break down data abstractions and program structure, producing huge basic blocks that contain large amounts of fine-grain parallelism. We show that this fine-grain prarllelism can be effectively utilized even on coarse-grain parallel architectures by selectively grouping operations together so as to adjust the parallelism grain-size to match the inter-processor communication capabilities of the target architecture
Mitigating Branch-Shadowing Attacks on Intel SGX using Control Flow Randomization
Intel Software Guard Extensions (SGX) is a promising hardware-based
technology for protecting sensitive computations from potentially compromised
system software. However, recent research has shown that SGX is vulnerable to
branch-shadowing -- a side channel attack that leaks the fine-grained (branch
granularity) control flow of an enclave (SGX protected code), potentially
revealing sensitive data to the attacker. The previously-proposed defense
mechanism, called Zigzagger, attempted to hide the control flow, but has been
shown to be ineffective if the attacker can single-step through the enclave
using the recent SGX-Step framework.
Taking into account these stronger attacker capabilities, we propose a new
defense against branch-shadowing, based on control flow randomization. Our
scheme is inspired by Zigzagger, but provides quantifiable security guarantees
with respect to a tunable security parameter. Specifically, we eliminate
conditional branches and hide the targets of unconditional branches using a
combination of compile-time modifications and run-time code randomization.
We evaluated the performance of our approach by measuring the run-time
overhead of ten benchmark programs of SGX-Nbench in SGX environment
Indexed Labels for Loop Iteration Dependent Costs
We present an extension to the labelling approach, a technique for lifting
resource consumption information from compiled to source code. This approach,
which is at the core of the annotating compiler from a large fragment of C to
8051 assembly of the CerCo project, looses preciseness when differences arise
as to the cost of the same portion of code, whether due to code transformation
such as loop optimisations or advanced architecture features (e.g. cache). We
propose to address this weakness by formally indexing cost labels with the
iterations of the containing loops they occur in. These indexes can be
transformed during the compilation, and when lifted back to source code they
produce dependent costs.
The proposed changes have been implemented in CerCo's untrusted prototype
compiler from a large fragment of C to 8051 assembly.Comment: In Proceedings QAPL 2013, arXiv:1306.241
Exploiting the Parallelism Exposed by Partial Evaluation
We describe the key role played by partial evaluation in the Supercomputing Toolkit, a parallel computing system for scientific applications that effectively exploits the vast amount of parallelism exposed by partial evaluation. The Supercomputing Toolkit parallel processor and its associated partial evaluation-based compiler have been used extensively by scientists at MIT, and have made possible recent results in astrophysics showing that the motion of the planets in our solar system is chaotically unstable
- …