22 research outputs found

    Dynamic Assignment of Scoped Memory Regions in the Translation of Java to Real-Time Java

    Get PDF
    Advances in middleware, operating systems, and popular, general-purpose languages have brought the ideal of reasonably-bound execution time closer to developers who need such assurances for real-time and embedded systems applications. Extensions to the Java libraries and virtual machine have been proposed in a real-time Java standard, which provides for specification of release times, execution costs, and deadlines for a restricted class of threads. To use such features, the programmer is required to use unwieldy code constructs to create region-like areas of storage, associate them with execution scopes, and allocate objects from them. Further, the developer must ensure that they do not violate strict inter-region reference rules. Unfortunately, it is difficult to determine manually how to map object instantiations to execution scopes. Moreover, if ordinary Java code is modified to effect instantiations in scopes, the resulting code is difficult to read, maintain, and reuse. We present a dynamic approach to determining proper placement of objects within scope-bounded regions, and we employ a procedure that utilizes aspect-oriented programming to instrument the original program, realizing the program’s scoped memory concerns in a modular fashion. Using this approach, Java programs can be converted into region-aware Java programs automatically

    Contention elimination by replication of sequential sections in distributed shared memory programs

    Get PDF
    In shared memory programs contention often occurs at the transition between a sequential and a parallel section of the code. As all threads start executing the parallel section, they often access data just modified by the thread that executed the sequential section, causing a flurry of data requests to converge on that processor.We address this problem in a software distributed shared memory system by replicating the execution of the sequential sections on all processors. Communication during this replicated sequential execution is reduced by using multicast.We have implemented replicated sequential execution with multicast support in OpenMP/NOW, a version of of OpenMP that runs on networks of workstations. We do not rely on compile-time data analysis, and therefore we can handle irregular and pointer-based applications. We show significant improvement for two pointer-based applications that suffer from severe contention without replicated sequential execution

    Reference idempotency analysis: A framework for optimizing speculative execution

    Get PDF
    Recent proposals for multithreaded architectures allow threads with unknown dependences to execute speculatively in parallel. These architectures use hardware speculative storage to buffer uncertain data, track data dependences and roll back incorrect executions. Because all memory references access the speculative storage, current proposals implement this storage using small memory structures for fast access. The limited capacity of the speculative storage causes considerable performance loss due to speculative storage overflow whenever a thread's speculative state exceeds the storage capacity. Larger threads exacerbate the over-flow problem but are preferable to smaller threads, as larger threads uncover more parallelism. In this paper, we discover a new program property called memory reference idempotency. Idempotent references need not be tracked in the speculative storage, and instead can directly access non-speculative storage (i.e., the conventional memory hierarchy). Thus, we reduce the demand fo r speculative storage space. We define a formal framework for reference idempotency and present a novel compiler-assisted speculative execution model. We prove the necessary and sufficient conditions for reference idempotency using our model. We present a compiler algorithm to label idempotent memory references for the hardware. Experimental results show that for our benchmarks, over 60% of the references in non-parallelizable program sections are idempotent

    A Type-Theoretic Foundation of Delimited Continuations

    Get PDF
    International audienceThere is a correspondence between classical logic and programming language calculi with first-class continuations. With the addition of control delimiters, the continuations become composable and the calculi become more expressive. We present a fine-grained analysis of control delimiters and formalise that their addition corresponds to the addition of a single dynamically-scoped variable modelling the special top-level continuation. From a type perspective, the dynamically-scoped variable requires effect annotations. In the presence of control, the dynamically-scoped variable can be interpreted in a purely functional way by applying a store-passing style. At the type level, the effect annotations are mapped within standard classical logic extended with the dual of implication, namely subtraction. A continuation-passing-style transformation of lambda-calculus with control and subtraction is defined. Combining the translations provides a decomposition of standard CPS transformations for delimited continuations. Incidentally, we also give a direct normalisation proof of the simply-typed lambda-calculus with control and subtraction

    Cautiously Optimistic Program Analyses for Secure and Reliable Software

    Full text link
    Modern computer systems still have various security and reliability vulnerabilities. Well-known dynamic analyses solutions can mitigate them using runtime monitors that serve as lifeguards. But the additional work in enforcing these security and safety properties incurs exorbitant performance costs, and such tools are rarely used in practice. Our work addresses this problem by constructing a novel technique- Cautiously Optimistic Program Analysis (COPA). COPA is optimistic- it infers likely program invariants from dynamic observations, and assumes them in its static reasoning to precisely identify and elide wasteful runtime monitors. The resulting system is fast, but also ensures soundness by recovering to a conservatively optimized analysis when a likely invariant rarely fails at runtime. COPA is also cautious- by carefully restricting optimizations to only safe elisions, the recovery is greatly simplified. It avoids unbounded rollbacks upon recovery, thereby enabling analysis for live production software. We demonstrate the effectiveness of Cautiously Optimistic Program Analyses in three areas: Information-Flow Tracking (IFT) can help prevent security breaches and information leaks. But they are rarely used in practice due to their high performance overhead (>500% for web/email servers). COPA dramatically reduces this cost by eliding wasteful IFT monitors to make it practical (9% overhead, 4x speedup). Automatic Garbage Collection (GC) in managed languages (e.g. Java) simplifies programming tasks while ensuring memory safety. However, there is no correct GC for weakly-typed languages (e.g. C/C++), and manual memory management is prone to errors that have been exploited in high profile attacks. We develop the first sound GC for C/C++, and use COPA to optimize its performance (16% overhead). Sequential Consistency (SC) provides intuitive semantics to concurrent programs that simplifies reasoning for their correctness. However, ensuring SC behavior on commodity hardware remains expensive. We use COPA to ensure SC for Java at the language-level efficiently, and significantly reduce its cost (from 24% down to 5% on x86). COPA provides a way to realize strong software security, reliability and semantic guarantees at practical costs.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/170027/1/subarno_1.pd

    Defining interfaces between hardware and software: Quality and performance

    Get PDF
    One of the most important interfaces in a computer system is the interface between hardware and software. This interface is the contract between the hardware designer and the programmer that defines the functional behaviour of the hardware. This thesis examines two critical aspects of defining the hardware-software interface: quality and performance. The first aspect is creating a high quality specification of the interface as conventionally defined in an instruction set architecture. The majority of this thesis is concerned with creating a specification that covers the full scope of the interface; that is applicable to all current implementations of the architecture; and that can be trusted to accurately describe the behaviour of implementations of the architecture. We describe the development of a formal specification of the two major types of Arm processors: A-class (for mobile devices such as phones and tablets) and M-class (for micro-controllers). These specifications are unparalleled in their scope, applicability and trustworthiness. This thesis identifies and illustrates what we consider the key ingredient in achieving this goal: creating a specification that is used by many different user groups. Supporting many different groups leads to improved quality as each group finds different problems in the specification; and, by providing value to each different group, it helps justify the considerable effort required to create a high quality specification of a major processor architecture. The work described in this thesis led to a step change in Arm's ability to use formal verification techniques to detect errors in their processors; enabled extensive testing of the specification against Arm's official architecture conformance suite; improved the quality of Arm's architecture conformance suite based on measuring the architectural coverage of the tests; supported earlier, faster development of architecture extensions by enabling animation of changes as they are being made; and enabled early detection of problems created from architecture extensions by performing formal validation of the specification against semi-structured natural language specifications. As far as we are aware, no other mainstream processor architecture has this capability. The formal specifications are included in Arm's publicly released architecture reference manuals and the A-class specification is also released in machine-readable form. The second aspect is creating a high performance interface by defining the hardware-software interface of a software-defined radio subsystem using a programming language. That is, an interface that allows software to exploit the potential performance of the underlying hardware. While the hardware-software interface is normally defined in terms of machine code, peripheral control registers and memory maps, we define it using a programming language instead. This higher level interface provides the opportunity for compilers to hide some of the low-level differences between different systems from the programmer: a potentially very efficient way of providing a stable, portable interface without having to add hardware to provide portability between different hardware platforms. We describe the design and implementation of a set of extensions to the C programming language to support programming high performance, energy efficient, software defined radio systems. The language extensions enable the programmer to exploit the pipeline parallelism typically present in digital signal processing applications and to make efficient use of the asymmetric multiprocessor systems designed to support such applications. The extensions consist primarily of annotations that can be checked for consistency and that support annotation inference in order to reduce the number of annotations required. Reducing the number of annotations does not just save programmer effort, it also improves portability by reducing the number of annotations that need to be changed when porting an application from one platform to another. This work formed part of a project that developed a high-performance, energy-efficient, software defined radio capable of implementing the physical layers of the 4G cellphone standard (LTE), 802.11a WiFi and Digital Video Broadcast (DVB) with a power and silicon area budget that was competitive with a conventional custom ASIC solution. The Arm architecture is the largest computer architecture by volume in the world. It behooves us to ensure that the interface it describes is appropriately defined

    Compiler and Runtime Optimizations for Fine-Grained Distributed Shared Memory Systems

    Get PDF
    Bal, H.E. [Promotor
    corecore