40,802 research outputs found

    Simple and Effective Type Check Removal through Lazy Basic Block Versioning

    Get PDF
    Dynamically typed programming languages such as JavaScript and Python defer type checking to run time. In order to maximize performance, dynamic language VM implementations must attempt to eliminate redundant dynamic type checks. However, type inference analyses are often costly and involve tradeoffs between compilation time and resulting precision. This has lead to the creation of increasingly complex multi-tiered VM architectures. This paper introduces lazy basic block versioning, a simple JIT compilation technique which effectively removes redundant type checks from critical code paths. This novel approach lazily generates type-specialized versions of basic blocks on-the-fly while propagating context-dependent type information. This does not require the use of costly program analyses, is not restricted by the precision limitations of traditional type analyses and avoids the implementation complexity of speculative optimization techniques. We have implemented intraprocedural lazy basic block versioning in a JavaScript JIT compiler. This approach is compared with a classical flow-based type analysis. Lazy basic block versioning performs as well or better on all benchmarks. On average, 71% of type tests are eliminated, yielding speedups of up to 50%. We also show that our implementation generates more efficient machine code than TraceMonkey, a tracing JIT compiler for JavaScript, on several benchmarks. The combination of implementation simplicity, low algorithmic complexity and good run time performance makes basic block versioning attractive for baseline JIT compilers

    Automatically Discovering Hidden Transformation Chaining Constraints

    Get PDF
    Model transformations operate on models conforming to precisely defined metamodels. Consequently, it often seems relatively easy to chain them: the output of a transformation may be given as input to a second one if metamodels match. However, this simple rule has some obvious limitations. For instance, a transformation may only use a subset of a metamodel. Therefore, chaining transformations appropriately requires more information. We present here an approach that automatically discovers more detailed information about actual chaining constraints by statically analyzing transformations. The objective is to provide developers who decide to chain transformations with more data on which to base their choices. This approach has been successfully applied to the case of a library of endogenous transformations. They all have the same source and target metamodel but have some hidden chaining constraints. In such a case, the simple metamodel matching rule given above does not provide any useful information

    Energy-efficient and high-performance lock speculation hardware for embedded multicore systems

    Full text link
    Embedded systems are becoming increasingly common in everyday life and like their general-purpose counterparts, they have shifted towards shared memory multicore architectures. However, they are much more resource constrained, and as they often run on batteries, energy efficiency becomes critically important. In such systems, achieving high concurrency is a key demand for delivering satisfactory performance at low energy cost. In order to achieve this high concurrency, consistency across the shared memory hierarchy must be accomplished in a cost-effective manner in terms of performance, energy, and implementation complexity. In this article, we propose Embedded-Spec, a hardware solution for supporting transparent lock speculation, without the requirement for special supporting instructions. Using this approach, we evaluate the energy consumption and performance of a suite of benchmarks, exploring a range of contention management and retry policies. We conclude that for resource-constrained platforms, lock speculation can provide real benefits in terms of improved concurrency and energy efficiency, as long as the underlying hardware support is carefully configured.This work is supported in part by NSF under Grants CCF-0903384, CCF-0903295, CNS-1319495, and CNS-1319095 as well the Semiconductor Research Corporation under grant number 1983.001. (CCF-0903384 - NSF; CCF-0903295 - NSF; CNS-1319495 - NSF; CNS-1319095 - NSF; 1983.001 - Semiconductor Research Corporation

    The Resource constrained shortest path problem implemented in a lazy functional language

    Get PDF
    The resource constrained shortest path problem is an NP-hard problem for which many ingenious algorithms have been developed. These algorithms are usually implemented in FORTRAN or another imperative programming language. We have implemented some of the simpler algorithms in a lazy functional language. Benefits accrue in the software engineering of the implementations. Our implementations have been applied to a standard benchmark of data files, which is available from the Operational Research Library of Imperial College, London. The performance of the lazy functional implementations, even with the comparatively simple algorithms that we have used, is competitive with a reference FORTRAN implementation

    The second law and beyond in microscopic quantum setups

    Full text link
    The Clausius inequality (CI) is one of the most versatile forms of the second law. Although it was originally conceived for macroscopic steam engines, it is also applicable to quantum single particle machines. Moreover, the CI is the main connecting thread between classical microscopic thermodynamics and nanoscopic quantum thermodynamics. In this chapter, we study three different approaches for obtaining the CI. Each approach shows different aspects of the CI. The goals of this chapter are: (i) To show the exact assumptions made in various derivations of the CI. (ii) To elucidate the structure of the second law and its origin. (iii) To discuss the possibilities each approach offers for finding additional second-law like inequalities. (iv) To pose challenges related to the second law in nanoscopic setups. In particular, we introduce and briefly discuss the notions of exotic heat machines (X machines), and "lazy demons".Comment: As a chapter of: F. Binder, L. A. Correa, C. Gogolin, J. Anders, and G. Adesso (eds.), "Thermodynamics in the quantum regime - Recent Progress and Outlook", (Springer International Publishing). v1 does not include references to other book chapter

    Structural Optimisation: Biomechanics of the Femur

    Full text link
    A preliminary iterative 3D meso-scale structural model of the femur was developed, in which bar and shell elements were used to represent trabecular and cortical bone respectively. The cross-sectional areas of the bar elements and the thickness values of the shell elements were adjusted over successive iterations of the model based on a target strain stimulus, resulting in an optimised construct. The predicted trabecular architecture, and cortical thickness distribution showed good agreement with clinical observations, based on the application of a single leg stance load case during gait. The benefit of using a meso-scale structural approach in comparison to micro or macro-scale continuum approaches to predictive bone modelling was achievement of the symbiotic goals of computational efficiency and structural description of the femur.Comment: Accepted by Engineering and Computational Mechanics (Proceedings of the ICE

    Managing Communication Latency-Hiding at Runtime for Parallel Programming Languages and Libraries

    Full text link
    This work introduces a runtime model for managing communication with support for latency-hiding. The model enables non-computer science researchers to exploit communication latency-hiding techniques seamlessly. For compiled languages, it is often possible to create efficient schedules for communication, but this is not the case for interpreted languages. By maintaining data dependencies between scheduled operations, it is possible to aggressively initiate communication and lazily evaluate tasks to allow maximal time for the communication to finish before entering a wait state. We implement a heuristic of this model in DistNumPy, an auto-parallelizing version of numerical Python that allows sequential NumPy programs to run on distributed memory architectures. Furthermore, we present performance comparisons for eight benchmarks with and without automatic latency-hiding. The results shows that our model reduces the time spent on waiting for communication as much as 27 times, from a maximum of 54% to only 2% of the total execution time, in a stencil application.Comment: PREPRIN

    The significance of the reflective practitioner in blended learning

    Get PDF
    This is a case study paper concerned with the introduction of blended learning on a part-time higher education programme for mature students. The interpretive work draws on four action research cycles conducted over two years with two student cohorts. Discussion is based on observations, staff and student focus groups and interviews examining the students’ expectations and experiences. The initial focus of the action research was on the introduction of technology into the teaching and learning experience. The great advantage of an interpretive approach, however, is to allow the findings to determine the course of the research. During the first action research cycles, the focus of the research soon changed from the use of technology in blended learning to the role of the practitioners involved. We advocate the key role of reflective practitioners in facilitating blended learning and suggest that action research is a useful framework to develop this. Keywords: Educational Technology Implementation, Electronic Learning (E-Learning), Teacher Improvement, Teacher Preparation, Action Research, Asynchronous Communications, blended learning, part-time student, reflective practice
    • 

    corecore