29 research outputs found
Data flow analysis applied to optimize generic workflow problems
The compiler process, the one that transforms a program in a high level language into assembly or binary code, is a much elaborated process that mixes several powerful technologies, some of them developed specifically for this area. Nowadays, compilers are highly developed systems that can analyze and improve quite efficiently the source code, profiting from all the potential of the new processor architectures. This paper introduces a common type of analysis - the Data Flow Analysis – that is used to compute flow-sensitive information about programs, whose results are essential to produce many code optimizations. It is also argued that the problem of analyzing the data flow in software programs has many similarities with the problems found in industrial engineering; planning and management. As consequence, it is possible to apply analysis and optimization techniques used by compilers in these areas
Applying compiler technology to solve generic
Compilers are tools that transform a high level programming languages into assem-
bly or binary code. The essential of the process is done by the interpretation and the
code generation steps, but nowadays most compilers have also a strong component
of code optimization, that explore as much as possible the potential of the computer
architectures to which the compiler must generate the code. These optimizations are
based on the information provided by several analysis processes. This paper present
some of these code analysis and optimizations, and shows how they can be used to
solve problems or improve the quality of solutions used at areas such as industrial
engineer and planning
A Domain-Specific Language for Generating Dataflow Analyzers
Dataflow analysis is a well-understood and very powerful technique for analyzing programs as part of the compilation process. Virtually all compilers use some sort of dataflow analysis as part of their optimization phase. However, despite being well-understood theoretically, such analyses are often difficult to code, making it difficult to quickly experiment with variants. To address this, we developed a domain-specific language, Analyzer Generator (AG), that synthesizes dataflow analysis phases for Microsoft's Phoenix compiler framework. AG hides the fussy details needed to make analyses modular, yet generates code that is as efficient as the hand-coded equivalent. One key construct we introduce allows IR object classes to be extended without recompiling. Experimental results on three analyses show that AG code can be one-tenth the size of the equivalent handwritten C++ code with no loss of performance. It is our hope that AG will make developing new dataflow analyses much easier
Scratchpad Sharing in GPUs
GPGPU applications exploit on-chip scratchpad memory available in the
Graphics Processing Units (GPUs) to improve performance. The amount of thread
level parallelism present in the GPU is limited by the number of resident
threads, which in turn depends on the availability of scratchpad memory in its
streaming multiprocessor (SM). Since the scratchpad memory is allocated at
thread block granularity, part of the memory may remain unutilized. In this
paper, we propose architectural and compiler optimizations to improve the
scratchpad utilization. Our approach, Scratchpad Sharing, addresses scratchpad
under-utilization by launching additional thread blocks in each SM. These
thread blocks use unutilized scratchpad and also share scratchpad with other
resident blocks. To improve the performance of scratchpad sharing, we propose
Owner Warp First (OWF) scheduling that schedules warps from the additional
thread blocks effectively. The performance of this approach, however, is
limited by the availability of the shared part of scratchpad.
We propose compiler optimizations to improve the availability of shared
scratchpad. We describe a scratchpad allocation scheme that helps in allocating
scratchpad variables such that shared scratchpad is accessed for short
duration. We introduce a new instruction, relssp, that when executed, releases
the shared scratchpad. Finally, we describe an analysis for optimal placement
of relssp instructions such that shared scratchpad is released as early as
possible.
We implemented the hardware changes using the GPGPU-Sim simulator and
implemented the compiler optimizations in Ocelot framework. We evaluated the
effectiveness of our approach on 19 kernels from 3 benchmarks suites: CUDA-SDK,
GPGPU-Sim, and Rodinia. The kernels that underutilize scratchpad memory show an
average improvement of 19% and maximum improvement of 92.17% compared to the
baseline approach
On the Convergence Rate of Linear Datalogo over Stable Semirings
Datalogo is an extension of Datalog, where instead of a program being a
collection of union of conjunctive queries over the standard Boolean semiring,
a program may now be a collection of sum-sum-product queries over an arbitrary
commutative partially ordered pre-semiring. Datalogo is more powerful than
Datalog in that its additional algebraic structure alows for supporting
recursion with aggregation. At the same time, Datalogo retains the syntactic
and semantic simplicity of Datalog: Datalogo has declarative least fixpoint
semantics. The least fixpoint can be found via the na\"ive evaluation algorithm
that repeatedly applies the immediate sequence opeator until no further change
is possible.
It was shown that, when the underlying semiring is -stable, then the naive
evaluation of any Datalogo program over the semiring converges in a finite
number of steps. However, the upper bounds on the rate of convergence were
exponential in the number of ground IDB atoms.
This paper establishes polynomial upper bounds on the convergence rate of the
na\"ive algorithm on {\bf linear} Datalogo programs, which is quite common in
practice. In particular, the main result of this paper is that the convergence
rate of linear Datalogo programs under any -stable semiring is .
Furthermore, we show a matching lower bound by constructing a -stable
semiring and a linear Datalogo program that requires iterations
for the na\"ive iteration algorithm to converge. Next, we study the convergence
rate in terms of the number of elements in the semiring for linear Datalogo
programs. When is the number of elements, the convergence rate is bounded
by . This significantly improves the convergence rate for small
. We show a nearly matching lower bound as well
Shared Memory Concurrent System Verification using Kronecker Algebra
The verification of multithreaded software is still a challenge. This comes
mainly from the fact that the number of thread interleavings grows
exponentially in the number of threads. The idea that thread interleavings can
be studied with a matrix calculus is a novel approach in this research area.
Our sparse matrix representations of the program are manipulated using a lazy
implementation of Kronecker algebra. One goal is the generation of a data
structure called Concurrent Program Graph (CPG) which describes all possible
interleavings and incorporates synchronization while preserving completeness.
We prove that CPGs in general can be represented by sparse adjacency matrices.
Thus the number of entries in the matrices is linear in their number of lines.
Hence efficient algorithms can be applied to CPGs. In addition, due to
synchronization only very small parts of the resulting matrix are actually
needed, whereas the rest is unreachable in terms of automata. Thanks to the
lazy implementation of the matrix operations the unreachable parts are never
calculated. This speeds up processing significantly and shows that this
approach is very promising. Various applications including data flow analysis
can be performed on CPGs. Furthermore, the structure of the matrices can be
used to prove properties of the underlying program for an arbitrary number of
threads. For example, deadlock freedom is proved for a large class of programs.Comment: 31 page
A type-checking preprocessor for Cilk 2, a multithreaded C language
Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1995.Includes bibliographical references (p. 37-38).by Robert C. Miller.M.Eng
Data-Dependency Formalism for Developing Peer-to-Peer Applications
Developing peer-to-peer (P2P) applications became increasingly important in software development. Nowadays, a large number of organizations from many different sectors and sizes depend more and more on collaboration between actors to perform their tasks. These P2P applications usually have a recursive behavior that many modeling approaches cannot describe and analyze (e.g. finite-state approaches). In this paper, we present an approach that combines component-based development with well-understood methods and techniques from the field of Attribute Grammars and Data-Flow Analysis in order to construct an abstract representation (i.e. Data-Dependency Graph) for P2P applications, and then perform data-flow analyzes on it. This approach embodies a formalism called DDF (Data-Dependency Formalism) to capture the behavior of P2P applications and construct their Data-Dependency Graphs. Various properties can be inferred and computed at the proposed level of data abstraction, including some properties that model checking cannot compute if the system presents a recursive behavior. As examples, we present two algorithms: one to resolve the deadlock problem and another for dominance analysis