204 research outputs found
An Exact Method for Analysis of Value-based Array Data Dependences
Standard array data dependence testing algorithms give information
about the aliasing of array references. If statement 1 writes a[5],
and statement 2 later reads a[5], standard techniques
described this as a flow dependence, even if there was an intervening write.
We call a dependence between two references to the same memory
location a memory-based dependence. In contrast, if there are no
intervening writes, the references touch the same value and we call the
dependence a value-based dependence.
There has been a surge of recent work on value-based array data dependence
analysis (also referred to as computation of array data-flow dependence
information). In this paper, we describe a technique that is exact
over programs without control flow (other than loops) and non-linear
references. We compare our proposal with the technique proposed
by Paul Feautrier, which is the other technique that is complete over the same
domain as ours. We also compare our work with that of Tu and Padua, a
representative approximate scheme for array privatization.
(Also cross-referenced as UMIACS-TR-93-137
Static Analysis of Upper and Lower Bounds on Dependences and Parallelism
Existing compilers often fail to parallelize sequential code, even
when a program can be manually transformed into parallel form
by a sequence of well-understood transformations
(as is the case for many of the Perfect Club Benchmark
programs).
These failures can occur for several reasons: the code transformations
implemented in the compiler may not be sufficient to produce parallel
code, the compiler may not find the proper sequence of
transformations, or the compiler may not be able to prove that one
of the necessary transformations is legal.
When a compiler extract sufficient parallelism from a program,
the programmer extract additional parallelism.
Unfortunately, the programmer is typically left to search for
parallelism without significant assistance.
The compiler generally does not give feedback about which parts of the
program might contain additional parallelism, or about the types of
transformations that might be needed to realize this parallelism.
Standard program transformations and dependence abstractions cannot be
used to provide this feedback.
In this paper, we propose a two step approach for the search for
parallelism in sequential programs:
We first construct several sets of constraints that describe, for each
statement, which iterations of that statement can be executed
concurrently.
By constructing constraints that correspond to different assumptions
about which dependences might be eliminated through additional
analysis, transformations and user assertions, we can determine
whether we can expose parallelism by eliminating dependences.
In the second step of our search for parallelism, we examine these
constraint sets to identify the kinds of transformations that are
needed to exploit scalable parallelism.
Our tests will identify conditional parallelism and parallelism that
can be exposed by combinations of transformations that reorder the
iteration space (such as loop interchange and loop peeling).
This approach lets us distinguish inherently sequential code from code
that contains unexploited parallelism.
It also produces information about the kinds of transformations that
will be needed to parallelize the code, without worrying about the
order of application of the transformations.
Furthermore, when our dependence test is inexact,
we can identify which unresolved dependences inhibit parallelism
by comparing the effects of assuming dependence or independence.
We are currently exploring the use of this information in
programmer-assisted parallelization.
(Also cross-referenced as UMIACS-TR-94-40
Simplifying Polynomial Constraints Over Integers to Make Dependence Analysis More Precise
Why do existing parallelizing compilers and environments fail
to parallelize many realistic FORTRAN programs?
One of the reasons is that these programs contain a number
of linearized array references, such as
{\tt A(M*N*i+N*j+k)} or {\tt A(i*(i+1)/2+j)}.
Performing exact dependence analysis for these references
requires testing polynomial constraints for integer solutions.
Most existing dependence analysis systems, however,
restrict themselves to solving
affine constraints only, so they have to make worst-case
assumptions whenever they encounter a polynomial constraint.
In this paper we introduce an algorithm which
exactly and efficiently solves a class of polynomial constraints
which arise in dependence testing.
Another important application of our algorithm is
to generate code for loop transformation known
as symbolic blocking (tiling).
(Also cross-referenced as UMIACS-TR-93-68.1
The First-Order Theory of Sets with Cardinality Constraints is Decidable
We show that the decidability of the first-order theory of the language that
combines Boolean algebras of sets of uninterpreted elements with Presburger
arithmetic operations. We thereby disprove a recent conjecture that this theory
is undecidable. Our language allows relating the cardinalities of sets to the
values of integer variables, and can distinguish finite and infinite sets. We
use quantifier elimination to show the decidability and obtain an elementary
upper bound on the complexity.
Precise program analyses can use our decidability result to verify
representation invariants of data structures that use an integer field to
represent the number of stored elements.Comment: 18 page
Beyond shared memory loop parallelism in the polyhedral model
2013 Spring.Includes bibliographical references.With the introduction of multi-core processors, motivated by power and energy concerns, parallel processing has become main-stream. Parallel programming is much more difficult due to its non-deterministic nature, and because of parallel programming bugs that arise from non-determinacy. One solution is automatic parallelization, where it is entirely up to the compiler to efficiently parallelize sequential programs. However, automatic parallelization is very difficult, and only a handful of successful techniques are available, even after decades of research. Automatic parallelization for distributed memory architectures is even more problematic in that it requires explicit handling of data partitioning and communication. Since data must be partitioned among multiple nodes that do not share memory, the original memory allocation of sequential programs cannot be directly used. One of the main contributions of this dissertation is the development of techniques for generating distributed memory parallel code with parametric tiling. Our approach builds on important contributions to the polyhedral model, a mathematical framework for reasoning about program transformations. We show that many affine control programs can be uniformized only with simple techniques. Being able to assume uniform dependences significantly simplifies distributed memory code generation, and also enables parametric tiling. Our approach implemented in the AlphaZ system, a system for prototyping analyses, transformations, and code generators in the polyhedral model. The key features of AlphaZ are memory re-allocation, and explicit representation of reductions. We evaluate our approach on a collection of polyhedral kernels from the PolyBench suite, and show that our approach scales as well as PLuTo, a state-of-the-art shared memory automatic parallelizer using the polyhedral model. Automatic parallelization is only one approach to dealing with the non-deterministic nature of parallel programming that leaves the difficulty entirely to the compiler. Another approach is to develop novel parallel programming languages. These languages, such as X10, aim to provide highly productive parallel programming environment by including parallelism into the language design. However, even in these languages, parallel bugs remain to be an important issue that hinders programmer productivity. Another contribution of this dissertation is to extend the array dataflow analysis to handle a subset of X10 programs. We apply the result of dataflow analysis to statically guarantee determinism. Providing static guarantees can significantly increase programmer productivity by catching questionable implementations at compile-time, or even while programming
Maintaining security requirements of software systems using evolving crosscutting dependencies
Security requirements are concerned with protecting assets of a system from harm. Implemented as code aspects to weave protection mechanisms into the system, security requirements need to be validated when changes are made to the programs during system evolution. However, it was not clear for developers whether existing validation procedures such as test cases are sufficient for security and when the implemented aspects need to adapt. In this chapter, we propose an approach for detecting any change to the satisfaction of security requirements in three steps: (1) identify the asset variables in the systems that are only accessed by a join-point method; (2) trace these asset variables to identify both control and data dependencies between the non-aspect and aspect functions; and (3) update the test cases ac-cording to implementation of these dependencies to strengthen the protection when a change happens. These steps are illustrated by a case study of a meeting scheduling system where security is a critical concern
Lazy Array Data-Flow Dependence Analysis
Automatic parallelization of real FORTRAN programs
does not live up to users expectations yet,
and dependence analysis algorithms which either
produce too many false dependences or are too slow
contribute significantly to this.
In this paper we introduce data-flow dependence analysis algorithm
which exactly computes value-based dependence relations
for program fragments in which all subscripts, loop bounds and
IF conditions are affine.
Our algorithm also computes good affine approximations
of dependence relations for non-affine program fragments.
Actually, we do not know about any other algorithm
which can compute better approximations.
And our algorithm is efficient too, because it is lazy.
When searching for write statements that supply values used
by a given read statement, it starts with statements
which are lexicographically close to the read statement in iteration space.
Then if some of the read statement instances are not
``satisfied'' with these close writes, the algorithm broadens
its search scope by looking into more distant writes.
The search scope keeps broadening until all read instances are satisfied
or no write candidates are left.
We timed our algorithm on several benchmark programs
and the timing results suggest that our algorithm
is fast enough to be used in commercial compilers ---
it usually takes 5 to 15 percent of
f77 -O2 compilation time to analyze a program.
Most programs in the 100-line range take less than 1 second
to analyze on a SUN SparcStation IPX.
(Also cross-referenced as UMIACS-TR-93-69
A Framework for Unifying Reordering Transformations
We present a framework for unifying iteration reordering transformations
such as loop interchange, loop distribution, skewing, tiling, index set
splitting and statement reordering. The framework is based on the idea
that a transformation can be represented as a schedule that maps the
original iteration space to a new iteration space. The framework is
designed to provide a uniform way to represent and reason about
transformations. As part of the framework, we provide algorithms
to assist in the building and use of schedules. In particular, we
provide algorithms to test the legality of schedules, to align schedules
and to generate optimized code for schedules.
(Also cross-referenced as UMIACS-TR-93-134
Optimization within a Unified Transformation Framework
Programmers typically want to write scientific programs in a high level
language with semantics based on a sequential execution model. To execute
efficiently on a parallel machine, however, a program typically needs to
contain explicit parallelism and possibly explicit communication and
synchronization. So, we need compilers to convert programs from the first
of these forms to the second. There are two basic choices to be made when
parallelizing a program. First, the computations of the program need to be
distributed amongst the set of available processors. Second, the computations
on each processor need to be ordered. My contribution has been the development
of simple mathematical abstractions for representing these choices and the
development of new algorithms for making these choices. I have developed a new
framework that achieves good performance by minimizing communication between
processors, minimizing the time processors spend waiting for messages from
other processors, and ordering data accesses so as to exploit the memory
hierarchy. This framework can be used by optimizing compilers, as well as by
interactive transformation tools. The state of the art for vectorizing
compilers is already quite good, but much work remains to bring parallelizing
compilers up to the same standard. The main contribution of my work can be
summarized as improving this situation by replacing existing ad hoc
parallelization techniques with a sound underlying foundation on which future
work can be built.
(Also cross-referenced as UMIACS-TR-96-93
Finding Legal Reordering Transformations using Mappings
Traditionally, optimizing compilers attempt to improve the performance of
programs by applying source to source transformations, such as loop
interchange, loop skewing and loop distribution. Each of these
transformations has its own special legality checks and transformation rules
which make it hard to analyze or predict the effects of compositions of
these transformations. To overcome these problems we have developed a
framework for unifying iteration reordering transformations. The framework
is based on the idea that all reordering transformation can be represented
as a mapping from the original iteration space to a new iteration space.
The framework is designed to provide a uniform way to represent and reason
about transformations. An optimizing compiler would use our framework by
finding a mapping that both corresponds to a legal transformation and
produces efficient code. We present the mapping selection problem as a
search problem by decomposing it into a sequence of smaller choices.
We then characterize the set of all legal mappings by defining an implicit
search tree.
(Also cross-referenced as UMIACS-TR-94-71
- …