34 research outputs found

    The JCilk multithreaded language

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2005.Includes bibliographical references (p. 103-107).JCilk is a Java-based multithreaded programming language which extends Java to provide a dynamic threading model. Specifically, JCilk imports Cilk's fork-join primitives spawn and sync into Java to provide procedure-call semantics for concurrent subcomputations. More importantly, JCilk integrates exception handling with multi-threading by defining semantics consistent with Java's existing semantics of exception handling. JCilk's strategy of integrating multithreading with Java's exception semantics yields some surprising semantic synergies. In particular, JCilk extends Java's exception semantics to allow exceptions to be passed from a spawned method to its parent in a natural way that obviates the need for Cilk's inlet and abort constructs. This extension is "faithful" in that it obeys Java's ordinary serial semantics when executed on a single processor. When executed in parallel, however, an exception thrown by a JCilk computation signals its sibling computations to abort, yielding a clean semantics in which only a single exception from the enclosing try block is handled. To minimize the complexity of reasoning about aborts, JCilk signals them "semisynchronously" so that abort signals do not interrupt ordinary serial code. Because JCilk uses Java's normal exception mechanism to propagate an abort throughout a subcomputation, the programmer can handle clean-up by simply catching a thrown CilkAbort exception. This thesis documents in detail the designed semantics, the linguistic decisions we made, and their justifications. This thesis also describes the structure of JCilk compiler and how it supports the exception semantics.(cont.) Specifically, the JCilk compiler performs a two-stage compilation process to support the continuation mechanism required by the runtime system's work-stealing algorithm. By performing static analysis, the compiler generates code to support the "catchlet" and "finallet" mechanisms for handling exceptions. The design of JCilk represents joint research with John S. Danaher and Charles E. Leiserson.by I-Ting Angelina Lee.S.M

    An environment for workflow applications on wide-area distributed systems

    Get PDF
    ©2001 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.Workflow techniques are emerging as an important approach for the specification and management of complex processing tasks. This approach is especially powerful for utilising distributed data and processing resources in widely-distributed heterogeneous systems. We describe our DISCWorld distributed workflow environment for composing complex processing chains, which are specified as a directed acyclic graph of operators. Users of our system can formulate processing chains using either graphical or scripting tools. We have deployed our system for image processing applications and decision support systems. We describe the technologies we have developed to enable the execution of these processing chains across wide-area computing systems. In particular, we present our Distributed Job Placement Language (based on XML) and various Java interface approaches we have developed for implementing the workflow metaphor. We outline a number of key issues for implementing a high-performance, reliable, distributed workflow management system.James, H.A.; Hawick, K.A.; Coddington, P.D

    SICStus MT - A Multithreaded Execution Environment for SICStus Prolog

    Get PDF
    The development of intelligent software agents and other complex applications which continuously interact with their environments has been one of the reasons why explicit concurrency has become a necessity in a modern Prolog system today. Such applications need to perform several tasks which may be very different with respect to how they are implemented in Prolog. Performing these tasks simultaneously is very tedious without language support. This paper describes the design, implementation and evaluation of a prototype multithreaded execution environment for SICStus Prolog. The threads are dynamically managed using a small and compact set of Prolog primitives implemented in a portable way, requiring almost no support from the underlying operating system

    Using program behaviour to exploit heterogeneous multi-core processors

    Get PDF
    Multi-core CPU architectures have become prevalent in recent years. A number of multi-core CPUs consist of not only multiple processing cores, but multiple different types of processing cores, each with different capabilities and specialisations. These heterogeneous multi-core architectures (HMAs) can deliver exceptional performance; however, they are notoriously difficult to program effectively. This dissertation investigates the feasibility of ameliorating many of the difficulties encountered in application development on HMA processors, by employing a behaviour aware runtime system. This runtime system provides applications with the illusion of executing on a homogeneous architecture, by presenting a homogeneous virtual machine interface. The runtime system uses knowledge of a program's execution behaviour, gained through explicit code annotations, static analysis or runtime monitoring, to inform its resource allocation and scheduling decisions, such that the application makes best use of the HMA's heterogeneous processing cores. The goal of this runtime system is to enable non-specialist application developers to write applications that can exploit an HMA, without the developer requiring in-depth knowledge of the HMA's design. This dissertation describes the development of a Java runtime system, called Hera-JVM, aimed at investigating this premise. Hera-JVM supports the execution of unmodified Java applications on both processing core types of the heterogeneous IBM Cell processor. An application's threads of execution can be transparently migrated between the Cell's different core types by Hera-JVM, without requiring the application's involvement. A number of real-world Java benchmarks are executed across both of the Cell's core types, to evaluate the efficacy of abstracting a heterogeneous architecture behind a homogeneous virtual machine. By characterising the performance of each of the Cell processor's core types under different program behaviours, a set of influential program behaviour characteristics is uncovered. A set of code annotations are presented, which enable program code to be tagged with these behaviour characteristics, enabling a runtime system to track a program's behaviour throughout its execution. This information is fed into a cost function, which Hera-JVM uses to automatically estimate whether the executing program's threads of execution would benefit from being migrated to a different core type, given their current behaviour characteristics. The use of history, hysteresis and trend tracking, by this cost function, is explored as a means of increasing its stability and limiting detrimental thread migrations. The effectiveness of a number of different migration strategies is also investigated under real-world Java benchmarks, with the most effective found to be a strategy that can target code, such that a thread is migrated whenever it executes this code. This dissertation also investigates the use of runtime monitoring to enable a runtime system to automatically infer a program's behaviour characteristics, without the need for explicit code annotations. A lightweight runtime behaviour monitoring system is developed, and its effectiveness at choosing the most appropriate core type on which to execute a set of real-world Java benchmarks is examined. Combining explicit behaviour characteristic annotations with those characteristics which are monitored at runtime is also explored. Finally, an initial investigation is performed into the use of behaviour characteristics to improve application performance under a different type of heterogeneous architecture, specifically, a non-uniform memory access (NUMA) architecture. Thread teams are proposed as a method of automatically clustering communicating threads onto the same NUMA node, thereby reducing data access overheads. Evaluation of this approach shows that it is effective at improving application performance, if the application's threads can be partitioned across the available NUMA nodes of a system. The findings of this work demonstrate that a runtime system with a homogeneous virtual machine interface can reduce the challenge of application development for HMA processors, whilst still being able to exploit such a processor by taking program behaviour into account

    A real-time distributed analysis automation for hurricane surface wind observations

    Get PDF
    From 1993 until 1999, the Hurricane Research Division of the National Oceanic and Atmospheric Administration (NOAA) produced real-time analyses of surface wind observations to help determine a storm\u27s wind intensity and extent. Limitations of the real-time analysis system included platform and filesystem dependency, lacking data integrity and feasibility for Internet deployment. In 2000, a new system was developed, built upon a Java prototype of a quality control graphical client interface for wind observations and an object-relational database. The objective was to integrate them in a distributed object approach with the legacy code responsible for the actual real-time wind analysis and image product generation. Common Object Request Broker Architecture (CORBA) was evaluated, but Java Remote Method Invocation (AMI) offered important advantages in terms of reuse and deployment. Even more substantial, though, were the efforts towards object-oriented redesign, implementation and testing of the quality control interface and its database performance interaction. As a result, a full-featured application can now be launched from the Web, potentially accessible by tropical cyclone forecast and warning centers worldwide

    Scheduling Macro-DataFlow Programs on Task-Parallel Runtime Systems

    Get PDF
    Though multicore systems are ubiquitous, parallel programming models for these systems are generally not accessible to a wide programmer community. The macro-dataflow model is an attractive stepping stone to implicit parallelism for domain experts who are not the target audience for explicit parallel programming models. We use Intel's Concurrent Collections (CnC) programming model as a concrete exemplar of the macro-dataflow model in this work. CnC is a high level coordination language that can be implemented on top of lower-level task-parallel frameworks. In this thesis, we study an implementation of CnC, based on Habanero-Java as the underlying task-parallel runtime system. A unique feature of CnC, first-class decoupling of data and control dependences, allows us to experiment with schedulers by taking these data and control dependences into account for better scheduling decisions. Our observations led to the proposal and implementation of a new task-parallel synchronization construct for Habanero-Java, namely Data-Driven Futures. We obtained two kinds of experimental results from our implementation. First, we compare the effectiveness of task scheduling policies for CnC programs. Secondly, we show that data-driven futures not only reduce execution time but also shrink memory footprint. In summary, this thesis shows a macro-dataflow programming model can deliver productivity and performance on modern multicore processors

    Building a generics stress tool for server performance and behaviour analysis

    Get PDF
    We present the Tester, a 100% Java benchmarking tool designed to accept a custom representation of a load schedule and reproduce it using distributed client machines in order to test a service. Our implementation is strongly based on a set of interfaces and pluggable implementations allowing for custom extensions. This project was realized at the Conrad Zuse Institut in Berlin, in a CoreGrid Research Exchange Program. One of the features of our application is the addition of the ZIB's Sminer Project Preferences library, a hierarchical data structure extracted from an XML configuration file, allowing for multiple customisation possibilities

    Efficient optimization of memory accesses in parallel programs

    Get PDF
    The power, frequency, and memory wall problems have caused a major shift in mainstream computing by introducing processors that contain multiple low power cores. As multi-core processors are becoming ubiquitous, software trends in both parallel programming languages and dynamic compilation have added new challenges to program compilation for multi-core processors. This thesis proposes a combination of high-level and low-level compiler optimizations to address these challenges. The high-level optimizations introduced in this thesis include new approaches to May-Happen-in-Parallel analysis and Side-Effect analysis for parallel programs and a novel parallelism-aware Scalar Replacement for Load Elimination transformation. A new Isolation Consistency (IC) memory model is described that permits several scalar replacement transformation opportunities compared to many existing memory models. The low-level optimizations include a novel approach to register allocation that retains the compile time and space efficiency of Linear Scan, while delivering runtime performance superior to both Linear Scan and Graph Coloring. The allocation phase is modeled as an optimization problem on a Bipartite Liveness Graph (BLG) data structure. The assignment phase focuses on reducing the number of spill instructions by using register-to-register move and exchange instructions wherever possible. Experimental evaluations of our scalar replacement for load elimination transformation in the Jikes RVM dynamic compiler show decreases in dynamic counts for getfield operations of up to 99.99%, and performance improvements of up to 1.76x on 1 core, and 1.39x on 16 cores, when compared with the load elimination algorithm available in Jikes RVM. A prototype implementation of our BLG register allocator in Jikes RVM demonstrates runtime performance improvements of up to 3.52x relative to Linear Scan on an x86 processor. When compared to Graph Coloring register allocator in the GCC compiler framework, our allocator resulted in an execution time improvement of up to 5.8%, with an average improvement of 2.3% on a POWER5 processor. With the experimental evaluations combined with the foundations presented in this thesis, we believe that the proposed high-level and low-level optimizations are useful in addressing some of the new challenges emerging in the optimization of parallel programs for multi-core architectures

    Enhancements to jml and its extended static checking technology

    Get PDF
    Formal methods are useful for developing high-quality software, but to make use of them, easy-to-use tools must be available. This thesis presents our work on the Java Modeling Language (JML) and its static verification tools. A main contribution is Offline User-Assisted Extended Static Checking (OUA-ESC), which is positioned between the traditional, fully automatic ESC and interactive Full Static Program Verification (FSPV). With OUA-ESC, automated theorem provers are used to discharge as many Verification Conditions (VCs) as possible, then users are allowed to provide Isabelle/HOL proofs for the sub-VCs that cannot be discharged automatically. Thus, users are able to take advantage of the full power of Isabelle/HOL to manually prove the system correct, if they so choose. Exploring unproven sub-VCs with Isabelle's ProofGeneral has also proven very useful for debugging code and their specifications. We also present syntax and semantics for monotonic non-null references, a common category that has not been previously identified. This monotonic non-null modifier allows some fields previously declared as nullable to be treated like local variables for nullity flow analysis. To support this work, we developed JML4, an Eclipse-based Integration Verification Environment (IVE) for the Java Modeling Language. JML4 provides integration of JML into all of the phases of the Eclipse JDT's Java compiler, makes use of external API specifications, and provides native error reporting. The verification techniques initially supported include a Non-Null Type System (NNTS), Runtime Assertion Checking (RAC), and Extended Static Checking (ESC); and verification tools to be developed by other researchers can be incorporated. JML4 was adopted by the JML4 community as the platform for their combined research efforts. ESC4, JML4's ESC component, provides other novel features not found before in ESC tools. Multiple provers are used automatically, which provides a greater coverage of language constructs that can be verified. Multi-threaded generation and distributed discharging of VCs, as well as a proof-status caching strategy, greatly speed up this CPU-intensive verification technique. VC caches are known to be fragile, and we developed a simple way to remove some of that fragility. These features combine to form the first IVE for JML, which will hopefully bring the improved quality promised by formal methods to Java developer
    corecore