29 research outputs found

    Data flow analysis applied to optimize generic workflow problems

    Get PDF
    The compiler process, the one that transforms a program in a high level language into assembly or binary code, is a much elaborated process that mixes several powerful technologies, some of them developed specifically for this area. Nowadays, compilers are highly developed systems that can analyze and improve quite efficiently the source code, profiting from all the potential of the new processor architectures. This paper introduces a common type of analysis - the Data Flow Analysis – that is used to compute flow-sensitive information about programs, whose results are essential to produce many code optimizations. It is also argued that the problem of analyzing the data flow in software programs has many similarities with the problems found in industrial engineering; planning and management. As consequence, it is possible to apply analysis and optimization techniques used by compilers in these areas

    Applying compiler technology to solve generic

    Get PDF
    Compilers are tools that transform a high level programming languages into assem- bly or binary code. The essential of the process is done by the interpretation and the code generation steps, but nowadays most compilers have also a strong component of code optimization, that explore as much as possible the potential of the computer architectures to which the compiler must generate the code. These optimizations are based on the information provided by several analysis processes. This paper present some of these code analysis and optimizations, and shows how they can be used to solve problems or improve the quality of solutions used at areas such as industrial engineer and planning

    A Domain-Specific Language for Generating Dataflow Analyzers

    Get PDF
    Dataflow analysis is a well-understood and very powerful technique for analyzing programs as part of the compilation process. Virtually all compilers use some sort of dataflow analysis as part of their optimization phase. However, despite being well-understood theoretically, such analyses are often difficult to code, making it difficult to quickly experiment with variants. To address this, we developed a domain-specific language, Analyzer Generator (AG), that synthesizes dataflow analysis phases for Microsoft's Phoenix compiler framework. AG hides the fussy details needed to make analyses modular, yet generates code that is as efficient as the hand-coded equivalent. One key construct we introduce allows IR object classes to be extended without recompiling. Experimental results on three analyses show that AG code can be one-tenth the size of the equivalent handwritten C++ code with no loss of performance. It is our hope that AG will make developing new dataflow analyses much easier

    Scratchpad Sharing in GPUs

    Full text link
    GPGPU applications exploit on-chip scratchpad memory available in the Graphics Processing Units (GPUs) to improve performance. The amount of thread level parallelism present in the GPU is limited by the number of resident threads, which in turn depends on the availability of scratchpad memory in its streaming multiprocessor (SM). Since the scratchpad memory is allocated at thread block granularity, part of the memory may remain unutilized. In this paper, we propose architectural and compiler optimizations to improve the scratchpad utilization. Our approach, Scratchpad Sharing, addresses scratchpad under-utilization by launching additional thread blocks in each SM. These thread blocks use unutilized scratchpad and also share scratchpad with other resident blocks. To improve the performance of scratchpad sharing, we propose Owner Warp First (OWF) scheduling that schedules warps from the additional thread blocks effectively. The performance of this approach, however, is limited by the availability of the shared part of scratchpad. We propose compiler optimizations to improve the availability of shared scratchpad. We describe a scratchpad allocation scheme that helps in allocating scratchpad variables such that shared scratchpad is accessed for short duration. We introduce a new instruction, relssp, that when executed, releases the shared scratchpad. Finally, we describe an analysis for optimal placement of relssp instructions such that shared scratchpad is released as early as possible. We implemented the hardware changes using the GPGPU-Sim simulator and implemented the compiler optimizations in Ocelot framework. We evaluated the effectiveness of our approach on 19 kernels from 3 benchmarks suites: CUDA-SDK, GPGPU-Sim, and Rodinia. The kernels that underutilize scratchpad memory show an average improvement of 19% and maximum improvement of 92.17% compared to the baseline approach

    On the Convergence Rate of Linear Datalogo over Stable Semirings

    Full text link
    Datalogo is an extension of Datalog, where instead of a program being a collection of union of conjunctive queries over the standard Boolean semiring, a program may now be a collection of sum-sum-product queries over an arbitrary commutative partially ordered pre-semiring. Datalogo is more powerful than Datalog in that its additional algebraic structure alows for supporting recursion with aggregation. At the same time, Datalogo retains the syntactic and semantic simplicity of Datalog: Datalogo has declarative least fixpoint semantics. The least fixpoint can be found via the na\"ive evaluation algorithm that repeatedly applies the immediate sequence opeator until no further change is possible. It was shown that, when the underlying semiring is pp-stable, then the naive evaluation of any Datalogo program over the semiring converges in a finite number of steps. However, the upper bounds on the rate of convergence were exponential in the number of ground IDB atoms. This paper establishes polynomial upper bounds on the convergence rate of the na\"ive algorithm on {\bf linear} Datalogo programs, which is quite common in practice. In particular, the main result of this paper is that the convergence rate of linear Datalogo programs under any pp-stable semiring is O(pn3)O(pn^3). Furthermore, we show a matching lower bound by constructing a pp-stable semiring and a linear Datalogo program that requires Ω(pn3)\Omega(pn^3) iterations for the na\"ive iteration algorithm to converge. Next, we study the convergence rate in terms of the number of elements in the semiring for linear Datalogo programs. When LL is the number of elements, the convergence rate is bounded by O(pnlogL)O(pn \log L). This significantly improves the convergence rate for small LL. We show a nearly matching lower bound as well

    Shared Memory Concurrent System Verification using Kronecker Algebra

    Full text link
    The verification of multithreaded software is still a challenge. This comes mainly from the fact that the number of thread interleavings grows exponentially in the number of threads. The idea that thread interleavings can be studied with a matrix calculus is a novel approach in this research area. Our sparse matrix representations of the program are manipulated using a lazy implementation of Kronecker algebra. One goal is the generation of a data structure called Concurrent Program Graph (CPG) which describes all possible interleavings and incorporates synchronization while preserving completeness. We prove that CPGs in general can be represented by sparse adjacency matrices. Thus the number of entries in the matrices is linear in their number of lines. Hence efficient algorithms can be applied to CPGs. In addition, due to synchronization only very small parts of the resulting matrix are actually needed, whereas the rest is unreachable in terms of automata. Thanks to the lazy implementation of the matrix operations the unreachable parts are never calculated. This speeds up processing significantly and shows that this approach is very promising. Various applications including data flow analysis can be performed on CPGs. Furthermore, the structure of the matrices can be used to prove properties of the underlying program for an arbitrary number of threads. For example, deadlock freedom is proved for a large class of programs.Comment: 31 page

    A type-checking preprocessor for Cilk 2, a multithreaded C language

    Get PDF
    Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1995.Includes bibliographical references (p. 37-38).by Robert C. Miller.M.Eng

    Data-Dependency Formalism for Developing Peer-to-Peer Applications

    Get PDF
    Developing peer-to-peer (P2P) applications became increasingly important in software development. Nowadays, a large number of organizations from many different sectors and sizes depend more and more on collaboration between actors to perform their tasks. These P2P applications usually have a recursive behavior that many modeling approaches cannot describe and analyze (e.g. finite-state approaches). In this paper, we present an approach that combines component-based development with well-understood methods and techniques from the field of Attribute Grammars and Data-Flow Analysis in order to construct an abstract representation (i.e. Data-Dependency Graph) for P2P applications, and then perform data-flow analyzes on it. This approach embodies a formalism called DDF (Data-Dependency Formalism) to capture the behavior of P2P applications and construct their Data-Dependency Graphs. Various properties can be inferred and computed at the proposed level of data abstraction, including some properties that model checking cannot compute if the system presents a recursive behavior. As examples, we present two algorithms: one to resolve the deadlock problem and another for dominance analysis
    corecore