57 research outputs found

    Open predicate path expressions for distributed environments: notation, implementation, and extensions

    Get PDF
    This dissertation introduces open predicate path expressions --a non-procedural, very-high-level language notation for the synchronization of concurrent accesses to shared data in distributed computer systems. The target environment is one in which resource modules (totally encapsulated instances of abstract data types) are the basic building blocks in a network of conventional, von Neumann computers or of functional, highly parallel machines. Each resource module will contain two independent submodules: a synchronization submodule which coordinates requests for access to the resource\u27s data and an access-mechanism submodule which localizes the code for operations on that data;Open predicate path expressions are proposed as a specification language for the synchronization submodule and represent a blend of two existing path notations: open path expressions and predicate path expressions. Motivations for the adoption of this new notation are presented, and an implementation semantics for the notation is presented in the form of dataflow graphs;An algorithm is presented which will automatically synthesize an open predicate path expression into a dataflow graph, which is then implemented by a network of communicating submodules written in either a sequential or an applicative language. Finally, an extended notation for the synchronization submodule is proposed, the purpose of which is to provide greater expressive power for certain synchronization problems which are difficult to specify using path expressions alone

    A parallel process model and architecture for a Pure Logic Language

    Get PDF
    The research presented in this thesis has been concerned with the use of parallel logic systems for the implementation of large knowledge bases. The thesis describes proposals for a parallel logic system based on a new logic programming language, the Pure Logic Language. The work has involved the definition and implementation of a new logic interpreter which incorporates the parallel execution of independent OR processes, and the specification and design of an appropriate non shared memory multiprocessor architecture. The Pure Logic Language which is under development at JeL, Bracknell, differs from Prolog in its expressive powers and implementation. The resolution based Prolog approach is replaced by a rewrite rule technique which successively transforms expressions according to logical axioms and user defined rules until no further rewrites are possible. A review of related work in the field of parallel logic language systems is presented. The thesis describes the different forms of parallelism within logic languages and discusses the decision to concentrate on the efficient implementation of OR parallelism. The parallel process model for the Pure Logic Language uses the same execution technique of rule rewriting but has been adapted to implement the creation of independent OR processes and the required message passing operations. The parallelism in the system is implemented automatically and, unlike many other parallel logic systems there are no explicit program annotations for the control of parallel execution. The spawning of processes involves computational overheads within the interpreter: these have been measured and results are presented. The functional requirements of a multiprocessor architecture are discussed: shared memory machines are not scalable for large numbers of processing elements, but, with no shared memory, data needed by offspring processors must be copied from the parent or else recomputed. The thesis describes an optimised format for the copying of data between processors. Because a one-to-many communication pattern exits between parent and offspring processors a broadcast architecture is indicated. The development of a system based on the broadcasting of data packets represents a new approach to the parallel execution of logic languages and has led to the design of a novel bus based multiprocessor architecture. A simulation of this multiprocessor architecture has been produced and the parallel logic interpreter mapped onto it: this provides data on the predicted performance of the system. A detailed analysis of these results is presented and the implications for future developments to the proposed system are discussed.</p

    Optimizing SIMD execution in HW/SW co-designed processors

    Get PDF
    SIMD accelerators are ubiquitous in microprocessors from different computing domains. Their high compute power and hardware simplicity improve overall performance in an energy efficient manner. Moreover, their replicated functional units and simple control mechanism make them amenable to scaling to higher vector lengths. However, code generation for these accelerators has been a challenge from the days of their inception. Compilers generate vector code conservatively to ensure correctness. As a result they lose significant vectorization opportunities and fail to extract maximum benefits out of SIMD accelerators. This thesis proposes to vectorize the program binary at runtime in a speculative manner, in addition to the compile time static vectorization. There are different environments that support runtime profiling and optimization support required for dynamic vectorization, one of most prominent ones being: 1) Dynamic Binary Translators and Optimizers (DBTO) and 2) Hardware/Software (HW/SW) Co-designed Processors. HW/SW co-designed environment provides several advantages over DBTOs like transparent incorporations of new hardware features, binary compatibility, etc. Therefore, we use HW/SW co-designed environment to assess the potential of speculative dynamic vectorization. Furthermore, we analyze vector code generation for wider vector units and find out that even though SIMD accelerators are amenable to scaling from the hardware point of view, vector code generation at higher vector length is even more challenging. The two major factors impeding vectorization for wider SIMD units are: 1) Reduced dynamic instruction stream coverage for vectorization and 2) Large number of permutation instructions. To solve the first problem we propose Variable Length Vectorization that iteratively vectorizes for multiple vector lengths to improve dynamic instruction stream coverage. Secondly, to reduce the number of permutation instructions we propose Selective Writing that selectively writes to different parts of a vector register and avoids permutations. Finally, we tackle the problem of leakage energy in SIMD accelerators. Since SIMD accelerators consume significant amount of real estate on the chip, they become the principle source of leakage if not utilized judiciously. Power gating is one of the most widely used techniques to reduce leakage energy of functional units. However, power gating has its own energy and performance overhead associated with it. We propose to selectively devectorize the vector code when higher SIMD lanes are used intermittently. This selective devectorization keeps the higher SIMD lanes idle and power gated for maximum duration. Therefore, resulting in overall leakage energy reduction.Postprint (published version

    Using static program analysis to compile fast cache simulators

    Get PDF
    This thesis presents a generic approach towards compiling fast execution-driven simulators, and applies this to cache simulation of programs. The resulting cache simulation method reduces the time needed for cache performance evaluations without losing the accuracy of the results. Fast cache simulators are needed in the performance analysis of software systems. To properly understand the cache behavior caused by a program, simulations must be performed with a sufficient number of inputs. Traditional simulation of memory operations of a program can be orders of magnitude slower than the execution of the program. This leads to simulation times that are often infeasible in software development. The approach of this thesis is based on using static cache analysis to guide partial evaluation and slicing of simulators. Because of redundancy in memory access patterns of typical programs, an execution-driven cache simulator program can be partially evaluated during its compilation. Program slicing can be used to remove the computations that have no effect on the simulation result. The static cache analysis presented in this thesis is generic. The analysis is designed especially for programs that use dynamic addressing. The thesis assumes an address analysis that gives the cache analysis static information about cache aliases and cache conflicts between accessed memory lines. To determine the memory references that always cause cache hits or cache misses, the thesis describes both must and may analyses of cache states. The cache state analysis is built by using abstract interpretation. Based on the use of abstract interpretation, the soundness of the analysis is proved. The potential performance of the method was experimentally evaluated. The thesis describes both a tool set implementing the cache analysis method and experiments done with the tool set. The experiments indicate that a simple implementation is capable of significantly speeding up the simulations.reviewe
    • …
    corecore